VOTable Format Definition

Official bibliographic entry for published version [VOTable1.5].

Status:

VOTable 1.5 REC 2025-01-16

1 Introduction

The VOTable format is an XML standard for the interchange of data represented as a set of tables. In this context, a table is an unordered set of rows, each of a uniform structure, as specified in the table description (the table metadata). Each row in a table is a sequence of table cells, and each of these contains either a primitive data type, or an array of such primitives. VOTable is derived from the Astrores format [], itself modeled on the FITS Table format []; VOTable was designed to be close to the FITS Binary Table format.

1.1 Why VOTable?

Astronomers have always been at the forefront of developments in information technology, and funding agencies across the world have recognized this by supporting the Virtual Observatory movement, in the hopes that other sciences and business can follow their lead in making online data both interoperable and scalable.

VOTable is designed as a flexible storage and exchange format for tabular data, with particular emphasis on astronomical tables.

Interoperability is encouraged through the use of standards (XML). The XML fabric allows applications to easily validate an input document, as well as facilitating transformations through XSLT (eXtensible Style Language Transformation) engines.

Grid Computing

VOTable has built-in features for big-data and Grid computing. It allows metadata and data to be stored separately, with the remote data linked. Processes can then use metadata to ‘get ready’ for their input data, or to organize third-party or parallel transfers of the data. Remote data allow the metadata to be sent in email and referenced in documents without pulling the whole dataset with it: just as we are used to the idea of sending a pointer to a document (URL) in place of the document, so we can now send metadata-rich pointers to data tables in place of the tables themselves. The remote data is referenced with the URL syntax protocol://location, meaning that arbitrarily complex protocols are allowed.

When we are working with very large tables in a distributed-computing environment (“the Grid”), the data stream between processors, with flows being filtered, joined, and cached in different geographic locations. It would be very difficult if the number of rows of the table were required in the header – we would need to stream in the whole table into a cache, compute the number of rows, then stream it again for the computation. In the Grid-data environment, the component in short supply is not the computers, but rather these very large caches. Furthermore, these remote data streams may be created dynamically by another process or cached in temporary storage: for this reason VOTable can express that remote data may not be available after a certain time (expires). Data on the net may require authentication for access, so VOTable allows expression of password or other identity information (the ‘rights’ attribute).

Data Storage: Flexible and Efficient

The data part in a VOTable may be represented using one of four different formats: TABLEDATA, FITS, BINARY and BINARY2. TABLEDATA is a pure XML format so that small tables can be easily handled in their entirety by XML tools. The FITS binary table format is well-known to astronomers, and VOTable can be used either to encapsulate such a file, or to re-encode the metadata; unfortunately it is difficult to stream FITS, since the dataset size is required in the header (NAXIS2 keyword), and FITS requires a specification up front of the maximum size of its variable-length arrays. The BINARY and BINARY2 formats are supported for efficiency and ease of programming: no FITS library is required, and the streaming paradigm is supported.

VOTable can be used in different ways, as a data storage and transport format, and also as a way to store metadata alone (table structure only). In the latter case, a VOTable structure can be sent to a server, which can then open a high-bandwidth connection to receive the actual data, using the previously-digested structure as a way to interpret the stream of bytes from the data socket.

VOTable can be used for small numbers of small records (pure XML tables), or for large numbers of simple records (streaming data), or it can be used for small numbers of larger objects. In the latter case, there will be software to spread large data blocks among multiple processors on the Grid. Currently the most complex structure that can be in a VOTable Cell is a multidimensional array.

1.2 XML Conventions

VOTable is constructed with XML (extensible Markup Language), a powerful standard for structured data throughout the Internet industries. It derives from SGML, a standard used in the publishing industry and for technical documentation for many years. XML consists of elements and payload, where an element consists of a start tag (the part in angle brackets), the payload, and an end tag (with angle brackets and a slash). Elements can contain other elements. Elements can also bear attributes (keyword-value combinations).

The payload may be in two forms: parsed or unparsed character data. Examples are:

<text>Fran&#231;ois</text>
<text><![CDATA[ a & (b <= c) ]]></text>

In the first example, the sequence &#231; is interpreted as part of the ISO/IEC 10646 character set (Unicode), and translates to an accented character, so that the text is “François”. The second example uses the special CDATA sequence so that the characters <, >, and & can be used without interpretation; in this case, any ASCII characters are allowed except the terminating sequence ]]>. For more information, see any book on XML.

1.3 Syntax Policy

Following the general XML rule, element and attribute names are case-sensitive and have to be used with the specified capitalisation. For VOTable, we have adopted the convention that element names are spelled in uppercase and attribute names in lowercase (with an exception for the ID attribute). Element and attribute names are further distinguished in this paper by being typed with a red fixed-width font, and the values of the attributes by being "``.

1.4 VOTable in the VO Architecture

image1

VOTable is a core IVOA standard. Figure VOTable:fig:archdiag shows the role this document plays within the IVOA architecture.

Wherever tabular data is transferred between Virtual Observatory components, VOTable provides the preferred serialization format. Since tables are used to list available resources as well as to represent science data which is itself tabular, this means that VOTable is used pervasively in the definitions of the Data Access protocols (e.g. SCS, SIA, SSA, TAP), and hence for exchange of data and metadata between user layer applications and data-providing services. VOTable is also used as a serialization format for some of the IVOA Data Models.

In order to represent semantically rich metadata, VOTable relies on the other IVOA standards UCD, Utype, VOUnits, and DALI. This document explains how information structured according to those standards are managed within the VOTable framework.

2 Data Model

In this section we define the data model of a VOTable, and in the next sections its syntax when expressed as XML. The data model of VOTable can be expressed as:

VOTable

=

hierarchy of Metadata + associated TableData, arranged as a set of Tables

Metadata

=

Parameters + Infos + Descriptions + :raw-latex:`color`Links + Fields + Groups

Table

=

list of Fields + TableData

TableData

=

stream of Rows

Row

=

list of Cells

Cell

=

:math:`left{
begin{tabular}{l}

{bf Primitive} \

or variable-length list of {bf Primitives} \ or multidimensional array of {bf Primitives}\ end{tabular} right.`

Primitive

=

integer, character, float, floatComplex, etc (see Table  below).

Metadata is divided into that which concerns the table itself (parameters), and the definitions of the fields (or column attributes) of the table. Each FIELD represents the metadata that can be found at the top of the column in a paper version of the table: in the example introduced in section VOTable:example1 below, the first FIELD has its name attribute set to "``. The Field can be thought of as a class definition, and the table cells below it are the instances of that class.

A parameter (PARAM) is similar to a FIELD, except that it has a value attribute. Parameters can be seen as “constant columns”, containing for instance FITS keywords or any other information pertaining to the table itself or its environment, such as the Telescope parameter in the example of section VOTable:example1.

An informative parameter (INFO) (see section 4.8 INFO Element) is a restricted form of the PARAM — it is always understood as a string (i.e. datatype="char" and arraysize="*" are implied).

The ordered list of Fields at the top of the table thus provides a template for a Row object (also called a record). The template allows interpretation of the data in the Row. The record is a set of Cells, with the number and order of Cells the same for each Row, and the same as the number of Fields defined in the Metadata.

In VOTable, there is generally no advance specification of the number of rows in the table: this is to allow streaming of large tables, as discussed above. However, if the number of rows is known, it may be specified in a dedicated nrows attribute.

From Version 1.1, columns may be logically grouped, so that it is possible to define table substructures made of column associations. Such an association is declared as a GROUP, which typically contains column references (FIELDref) and associated parameters (PARAM).

2.1 Primitives

Table 2 )*

datatype

Meaning

FITS

Bytes

"”``

Logical

"”``

1

"”``

Bit

"”``

*

"”``

Byte (0 to 255)

"”``

1

"”``

Short Integer

"”``

2

"”``

Integer

"”``

4

"”``

Long integer

"”``

8

"”``

ASCII Character

"”``

1

"”``

Unicode Character

2

"”``

Floating point

"”``

4

"”``

Double

"”``

8

"”``

Float Complex

"”``

8

"”``

Double Complex

"”``

16

Each Cell is composed from Primitives, each of which is a datatype of fixed-length binary representation, as listed in Table 2.1 Primitives. Cells may consist of a single Primitive (this is the default), or of an array (which may be multidimensional) of Primitives (see section 2.2 Columns as Arrays).

Except for the Bit type, each primitive has the fixed length in bytes given in Table 2.1 Primitives. Bit scalars and arrays are stored in the minimum number of bytes feasible (so that \(b\) bits take the integer part of \((b+7)/8\) bytes). These primitives are described in more detail in section 6 Definitions of Primitive Datatypes.

VOTables support two kinds of characters: ASCII 1-byte characters and Unicode (UCS-2) 2-byte characters. Unicode is a way to represent characters that is an alternative to ASCII. It uses two bytes per character instead of one, it is strongly supported by XML tools, and it can handle a large variety of international alphabets. Therefore VOTable supports not only ASCII strings (datatype="char"), but also Unicode (datatype="unicodeChar").

Note that strings are not a primitive type: strings are represented in VOTable as an array of characters.

2.2 Columns as Arrays

A table cell can contain an array of a given primitive type, with a fixed or variable number of elements; the array may even be multidimensional. For instance, the position of a point in a 3D space can be defined by the following:

<FIELD``\ :raw-latex:`\color{DarkRed}`\ ``ID="point_3D"``\ :raw-latex:`\color{DarkRed}`\ ``datatype="double"``\ :raw-latex:`\color{DarkRed}`\ ``arraysize="3"/>

:raw-latex:`noindent `and each cell corresponding to that definition must contain exactly 3 numbers. An asterisk () may be appended to indicate a variable number of elements in the array, as in:

<FIELD``\ :raw-latex:`\color{DarkRed}`\ ``ID="values"``\ :raw-latex:`\color{DarkRed}`\ ``datatype="int"``\ :raw-latex:`\color{DarkRed}`\ ``arraysize="100*"/>

simeq 2times10^9` elements).

A table cell can also contain a multidimensional array of a given primitive type. This is specified by a sequence of dimensions separated by the x character, with the first dimension changing fastest; as in the case of a simple array, the last dimension may be variable in length. As an example, the following definition declares a table cell which may contain a set of up to 10 images, each of 64x64 bytes:

<FIELD``\ :raw-latex:`\color{DarkRed}`\ ``ID="thumbs"``\ :raw-latex:`\color{DarkRed}`\ ``datatype="unsignedByte"``\ :raw-latex:`\color{DarkRed}`\ ``arraysize="64x64x10*"/>

Strings, which are defined as a set of characters, can therefore be represented in VOTable as a fixed- or variable-length array of characters:

<FIELD``\ :raw-latex:`\color{DarkRed}`\ ``name="unboundedString"``\ :raw-latex:`\color{DarkRed}`\ ``datatype="char"``\ :raw-latex:`\color{DarkRed}`\ ``arraysize="*"/>

A 1D array of strings can be represented as a 2D array of characters, but given the logic above, it is possible to define a variable-length array of fixed-length strings, but not a fixed-length array of variable-length strings. A convention to express an array of variable-length strings exists (see section A.3 Arrays of Variable-Length Strings) but is not part of this standard.

Note: arraysize should be present if, and only if, each table cell for the FIELD is intended to be treated as an array. arraysize="1" should not be used, as it is interpreted differently by different clients at this point. If a future VOTable specification re-encourages its use, arraysize="1" will mean “array of length 1”.

2.3 Compatibility with FITS Binary Tables

VOTable is closely compatible with the FITS Binary Table format. Henceforth, we shall abbreviate “FITS Binary Table and its Conventions” simply by the word “FITS”. Given a FITS file that represents a binary table, the header may be converted to VOTable, with a pointer to the original file, or with the original file included directly in VOTable. Since the original file is still present, it is clear that no data has been lost. A PARAM element can be used to hold any FITS keyword with its value and comment string.

We might ask two more significant questions, about how much of the FITS header and data can be represented in VOTable. The answer is that there is considerable overlap.

For instance, the recommended formatting of the data for an edition of the data is expressed by the non-mandatory TDISP keyword: for example F12.4 means 12 characters are to be used, and 4 decimal places. This has been converted in VOTable as the attributes width and precision which, connected with :raw-latex:`color{DarkRed}```datatype``, are semantically identical to the TDISP keyword.

What can FITS do but not VOTable?

FITS has complex semantics, with many conventions (see e.g. the Registry of FITS Conventions [1]) which have been developed mainly to be able to cope with the increasing complexity of astronomical instrumentation. In the frame of the Virtual Observatory the complexity is described by means of data models, and from its version 1.1, VOTable can refer to these data models by means of the utype attribute described in section 4.6 The utype Attribute.

What can VOTable do but not FITS?

VOTable supports separating of data from metadata and the streaming of tables, and other ideas from modern distributed computing. It bridges two ways to express structured data: XML and FITS. It uses UCDs – see section 4.5 Unified Content Descriptors) to formally express the semantic content of a parameter or field. It has the hierarchy and flexibility of XML: using GROUP elements introduced in version 1.1, columns in a VOTable can be grouped in arbitrarily complex hierarchies; and the ID attribute can be used in XML to enable what are essentially pointers. FITS does not handle Unicode (extended alphabet) characters.

3 The VOTable Document Structure

The overall VOTable document structure is described and controlled by its XML Schema []. The schema for VOTable version is given in appendix B The VOTable version XML Schema of this document. It can also be retrieved from http://www.ivoa.net/xml/VOTable/votable-1.5.xsd.

A VOTable document consists of a single all-containing element called VOTABLE, which contains descriptive elements and global definitions (DESCRIPTION, GROUP, PARAM, INFO), followed by one or more RESOURCE elements. Each Resource element contains zero or more TABLE elements, and possibly other RESOURCE elements.

The TABLE element, the actual heart of VOTable, contains a description of the columns and parameters (described in section 4 ``FIELD``s and ``PARAM``eters) followed by the data values (described in section 5 Data Content).

As the root element, VOTABLE has attributes which specify the VOTable version number and XML namespaces used in the document. For VOTable , the VOTABLE element MUST define version="". All VOTable elements come from the namespace http://www.ivoa.net/xml/VOTable/v1.3. It is recommended to bind the empty namespace prefix to this URI, as in

\[\hbox{{\tt{\color{DarkRed}xmlns}="{\color{DarkPurple}http://www.ivoa.net/xml/VOTable/v1.3}"}~,}\]

but instance documents are free to use whatever namespace prefix is convenient for them.

Note that starting with VOTable 1.3, the namespace URI for VOTable will remain fixed at http://www.ivoa.net/xml/VOTable/v1.3 until the next major version as discussed in Harrison et al. [XMLVers1.0]. As per IVOA recommendations, this namespace URI will always redirect to the latest recommended schema for VOTable version 1.x.

VOTable consumers doing schema validation are free to use either this latest recommended schema or the version-specific schema relevant to the VOTable version. So, while instance documents may include the schemaLocation attribute, consumers are not required to honor it.

Documents claiming to represent VOTables must validate against the relevant version of the VOTable schema without error. Notice that the validation is a necessary, but not sufficient, condition for correctness.

3.1 Example

This simple example of a VOTable document lists 3 galaxies with their position, velocity and error, and their estimated distance.

This simple VOTABLE document shows a single RESOURCE made of a single TABLE; the table is made of 6 columns, each described by a FIELD, and has one additional PARAM parameter (the Telescope). The actual rows are listed in the DATA part of the table, here in XML format (introduced by TABLEDATA); each cell is marked by the TD element, and follow the same order as their FIELD description: RA, Dec, Name, RVel, e_RVel, R.

3.2 name, ID and ref attributes

Most of the elements defined by VOTable may have or have to have names, like a RESOURCE, a TABLE, a PARAM or a FIELD. The content of the name attribute is defined as a token XML type, that is a string of characters where the blanks and spaces are not meaningful (no leading or trailing spaces, no multiple spaces): name="NVSS flux(1.4GHz)" represents therefore a valid name.

The ID and ref attributes are defined as XML types ID and IDREF respectively. This means that the contents of ID is an identifier which must be unique throughout a VOTable document, and that the contents of the ref attribute represents a reference to an identifier which must exist in the VOTable document. In other terms, if ref="myStar" is found in one element, there must exist an element in the same document with the ID="myStar" attribute. The XML standard moreover specifies that an ID type is a string beginning with a letter or underscore (_), followed by a sequence of Unicode letters, digits, or any of the punctuation characters . (dot), - (dash) or _ (underscore). The : (colon) is reserved for namespace use and should be avoided. Therefore ID="1" is not valid, but ID="_1" or ID="ref.1" are both valid.

The ID attribute is therefore required in the elements which have to be referenced, but the elements having an ID attribute do not need to be referenced.

The relative position of an ID and its corresponding reference(s) may have an impact on the performance or functionality of streaming parsers. For this reason, earlier versions of VOTable recommended placing the ID attribute before any references to it, but there may be cases where the opposite is more appropriate. In practical terms, no requirement has ever been placed on the ordering of an ID and its references, so VOTable creators are free to use either order and parsers/consumers should handle either order.

While the ID attribute has to be unique in a VOTable document, the name attribute need not. It is however recommended, as a good practice, to assign unique names within a TABLE element. This recommendation means that, between a TABLE and its corresponding closing TABLE tag, name attributes of FIELD, PARAM and optional GROUP elements should be all different.

3.3 VOTABLE Element

The VOTABLE element may contain definitions consisting of a DESCRIPTION, followed by any mixture of parameters and informative notes eventually structured in groups. These elements represent values which are meaningful over all tables included in a VOTABLE document — definitions specific to a RESOURCE (section VOTable:elem:RESOURCE) or a TABLE (section 3.8 TABLE Element) are better placed within their most appropriate element.

Note that version 1.0 of VOTable required the usage of a DEFINITIONS element holding the VOTable global definitions — this usage is deprecated since version 1.1.

3.4 COOSYS Element

The COOSYS element defines a celestial coordinate system, to which the components of a position on the celestial sphere refer. It has the following attributes, all of them (syntactically) optional:

ID

Required if the COOSYS element has to be referred to via the ref attribute of the position components, which is generally the case.

system

Specifies the reference frame of the coordinates. The value must be taken from the IVOA refframe vocabulary (http://www.ivoa.net/rdf/refframe). At the time of writing, this vocabulary includes the terms: AZ_EL, BODY, ecl_FK4, ECLIPTIC, EQUATORIAL, FK4, FK5, GALACTIC, GALACTIC_I, GENERIC_GALACTIC, ICRS, SUPER_GALACTIC, UNKNOWN

As that vocabulary can be extended at any time, clients should fail gracefully when they encounter unknown reference frames. Up through VOTable 1.4, these identifiers were defined in the VOTable schema, and some systems had different identifiers. These legacy identifiers are still in the vocabulary, but they are deprecated. Clients should use the vocabulary (see sect. 3 of Demleitner et al. [Vocabularies2.1] for how to do that without RDF tooling) to map the legacy identifiers to current identifiers.

equinox

Fixes the equatorial or ecliptic systems (as e.g., "J2000" as the default for "FK5" or "B1950", as the default for "FK4").

epoch

Specifies the epoch of the positions if necessary, again as an astroYear (i.e, prefixed with J or B depending on whether Julian or Besselian years are used). COOSYS only supports time specifications in Julian or Besselian years.

refposition

The reference position for which the positions are given. The values of this attribute should be taken from the IVOA refposition vocabulary (http://www.ivoa.net/rdf/refposition). At the time of writing, this vocabulary includes the terms BARYCENTER, EMBARYCENTER, GEOCENTER, HELIOCENTER, TOPOCENTER, UNKNOWN

Also note that COOSYS may be deprecated in the future in favor of a more generic way of describing the conventions used to define the positions of the objects studied in the enclosed tables.

A COOSYS element referenced via a ref attribute SHOULD appear before the element that references it.

3.5 TIMESYS Element

The TIMESYS element (introduced in VOTable 1.4) defines metadata for temporal coordinates. To reference the time system defined by a TIMESYS element, FIELD``s (and possibly :raw-latex:`\color{DarkRed}`\ ``PARAM``s) MUST reference the :raw-latex:`\color{DarkRed}`\ ``TIMESYS using the VOTable ref attribute.

If a FIELD or PARAM represents a time-like quantity but does not reference a TIMESYS element, then no assertion is made about its time system. A TIMESYS element referenced via a ref attribute MUST appear before the element that references it.

TIMESYS has the following attributes:

ID

This attribute is used to reference TIMESYS elements from the elements using the time system.

timeorigin

This is the time origin of the time coordinate, given as a Julian Date for the time scale and reference point defined. It is usually given as a floating point literal; for convenience, the magic strings MJD-origin (standing for 2400000.5) and JD-origin (standing for 0) are also allowed. The timeorigin attribute MUST be given unless the time’s representation contains a year of a calendar era, in which case it MUST NOT be present. In VOTables, these representations currently are Gregorian calendar years with xtype="timestamp", or years in the Julian or Besselian calendar when a column has yr, a or Ba as its unit and no time origin is given. When using calendar epochs written in julian or besselian years, note that conventionally Julian years are tied to the TDB timescale and Besselian years to the ET (equivalent to TT) timescale [].

timescale

This is the time scale used. Values SHOULD be taken from the IVOA timescale vocabulary (http://www.ivoa.net/rdf/timescale). At the time of writing, this vocabulary includes the terms: GPS, TAI, TCB, TCG, TDB, TT, UNKNOWN, UT, UTC

This attribute is mandatory.

refposition

The reference position again is a simple string. As with the COOSYS attribute of this name, the values SHOULD be taken from the IVOA refposition vocabulary (http://www.ivoa.net/rdf/refposition) which at the time of writing includes the terms BARYCENTER, EMBARYCENTER, GEOCENTER, HELIOCENTER, TOPOCENTER, UNKNOWN

This attribute is mandatory.

The example below shows a VOTable in which each row would have an observation time, a flux, and a magnitude. The observation time values are given in days since Julian Date 2455197.5 (the time origin for the Gaia observatory) in the Barycentric Coordinate Time (TCB) time scale, with the reference position being the barycenter of the solar system.

In the example, the TIMESYS element describes that time system. The TIMESYS ID value needs to be unique within the document so that it can be referenced by FIELD``s or :raw-latex:`\color{DarkRed}`\ ``PARAM``s. Then the :raw-latex:`\color{DarkRed}`\ ``obs_time FIELD indicates that its values should be interpreted in that time system by referring back to the TIMESYS element using ref="time_frame".

Similarly, the COOSYS element defines the coordinate system, and is referred to by the ra and dec PARAM elements. Note that since the sky position is defined by PARAM``s instead of :raw-latex:`\color{DarkRed}`\ ``FIELD``s, the same sky position applies to each row of the :raw-latex:`\color{DarkRed}`\ ``TABLE without the values appearing in TD elements.

Further (non-normative) information on best practices and usage patterns for TIMESYS can be found in .

3.6 RESOURCE Element

A VOTable document contains one or more RESOURCE elements, each of these providing a description and the data values of some logically independent data structure.

Each RESOURCE may include the descriptive element DESCRIPTION, followed by a mixture of INFO, GROUP and PARAM elements; it may also contain LINK elements to provide URL-type pointers that give further information.

The main component of a RESOURCE is typically one or more TABLE elements – in other words a RESOURCE is basically a set of related tables. The RESOURCE is recursive (it can contain other RESOURCE elements), which means that the set of tables making up a RESOURCE may become a tree structure.

A RESOURCE may have one or both of the name or ID attributes (see section 3.2 name, ID and ref attributes); it may also be qualified by type="meta", meaning that the resource is descriptive only, i.e. does not contain any actual data: no DATA element should exist in any of its sub-elements. A RESOURCE without this attribute may however have no DATA sub-element.

For example, a RESOURCE qualified by type="meta" can be used to transmit MIVOT (Model Instances in VOTables) annotations [MIVOT1.0]. MIVOT defines a syntax to map VOTable contents to any model serizalized in VO-DML [VODML1.0]. It operates as a bridge between models and data that associates VOTable metadata to data model entities, possibly adding advanced metadata not representable in plain VOTable. MIVOT annotations have their own XMLnamespace. The VOTable schema allows MIVOT, and any elements from a foreign namespace, in a RESOURCE.

Finally, the RESOURCE element may have a utype attribute to link the element to some external data model (introduced in version 1.1, see section 4.6 The utype Attribute).

3.8 TABLE Element

The TABLE element represents the basic data structure in VOTable; it comprises a description of the table structure (the metadata) essentially in the form of PARAM and FIELD elements (detailed in section 4 ``FIELD``s and ``PARAM``eters), followed by the values of the described fields in a DATA element (detailed in section 5 Data Content).

The TABLE element is always contained in a RESOURCE element: in other words any TABLE element has a single parent made of the RESOURCE element in which the table is embedded.

The TABLE element contains a DESCRIPTION element for descriptive remarks, followed by a mixed collection of PARAM, FIELD or GROUP elements which describe a parameter (constant column), a field (column) or a group of columns respectively. PARAM and FIELD elements are detailed in section 4 ``FIELD``s and ``PARAM``eters, and the GROUP element is presented in section 4.9 ``GROUP``ing ``FIELD``s and ``PARAM``eters.

Furthermore the TABLE element may contain LINK elements that provide URL-type pointers, exactly like the LINK elements existing within a RESOURCE element (see section 3.7 LINK Element).

The last element included in a TABLE is the optional DATA element (see section 5 Data Content): a table without any actual data is quite valid, and is typically used to supply a complete description of an existing resource e.g. for query purposes.

The TABLE element may have the naming attributes name and/or ID (see section 3.2 name, ID and ref attributes). A TABLE may also have a ref attribute referencing the ID of another table previously described, which is interpreted as defining a table having a structure identical to the one referenced: this facility avoids a repetition of the definition of tables which may be present many times in a VOTable document. It is recommended that the ref attribute references an empty table (i.e. a table without a DATA part), which avoids any ambiguity about the referencing.

Finally, the TABLE element may have a utype and ucd attribute to specify the table semantics, similarly to the FIELD and PARAM elements (see section 4.1 Summary of Attributes).

4 ``FIELD``s and ``PARAM``eters

The atoms of the table structure are represented by FIELD and PARAM elements, where FIELD represents the description of an actual table column, while PARAM supplies a value attached to the table, like the Telescope in the example of section VOTable:example1. A PARAM may be viewed as a FIELD which keeps a constant value over all the rows of a table, and the only difference in the set of attributes of the two elements is the existence of a value attribute in a PARAM which does not exist in a FIELD.

The FIELD elements describe the actual columns of the table; the order in which the ``FIELD``s are declared is important, as this order must be the same one as the order of the columns in section 5 Data Content.

A FIELD or PARAM element may have several sub-elements, including the informational DESCRIPTION and LINK elements (several descriptions and titles are possible, see appendix A.7 Additional Descriptions and Titles); it may also include a VALUES element that can express limits and ranges of the values that the corresponding cell can contain, such as minimum (MIN), maximum (MAX), or enumeration of possible values (OPTION).

4.1 Summary of Attributes

The valid attributes of a FIELD or PARAM are:

  • The name and/or ID. The ID attribute is required if the field has to be referenced (see section 3.2 name, ID and ref attributes). It may help to include the ordinal number of the column in the table in the value of the ID attribute as e.g. ID="col3" when a single table is involved: the connection to the corresponding column would become more obvious, especially in the FITS data serialization which uses the ordinal column number in the keywords containing the metadata related to that column.

  • The datatype, which expresses the nature of the data that is described as one of the permitted primitives (see Table 2.1 Primitives and their exact meaning in section 6 Definitions of Primitive Datatypes). This attribute determines how data are read and stored internally; it is required.

  • The arraysize attribute exists when the corresponding table cell contains more than one of the specified datatype, as explained in section VOTable:sec:dim. Note that strings are not a primitive type, and have to be described as an array of characters. The arraysize attribute should be omitted unless the corresponding table cell contents is intended to be understood as an array (see also section VOTable:sec:dim).

  • color{DarkRed}`width and precision attributes define the numerical accuracy associated with the data (see section 4.2 Numerical Accuracy).

  • The xtype attribute, added in VOTable 1.2, specifies an extended (or external) datatype. It is meant to give details about the column contents beyond the primitive datatype, like timestamps.

  • The unit attribute specifies the units in which the values of the corresponding column are expressed (see section 4.4 Units)

  • The ucd attribute supplies a standardized classification of the physical quantity expressed in the column (see section 4.5 Unified Content Descriptors).

  • The utype attribute, introduced in VOTable 1.1, is meant to express the role of the column in the context of an external data model (see section 4.6 The utype Attribute).

  • The ref attribute is used to quote another element of the document in the definition of a FIELD or PARAM. It is used in the example of section VOTable:example1 to indicate the coordinate system in which the coordinates are expressed (reference to the COOSYS element which specifies the coordinate frame).

  • The type attribute is not part of this standard, but is reserved for future extensions (see appendix A.1 VOTable LINK substitutions, appendix A.2 VOTable Query Extension and appendix A.4 FIELDs as Data Pointers).

In addition, in the PARAM element only:

  • the value attribute which explicits the PARAM``eter’s value; :raw-latex:`\color{DarkRed}`\ ``value is a required attribute of the PARAM element.

4.2 Numerical Accuracy

The VOTable format is meant for transferring, storing, and processing tabular data, and is not intended for presentation purposes: therefore (in contrast to Astrores) we generally avoid giving rules on presentation, such as formatting. Inevitably however at least some of the data will be presented – either as actual tables, or in forms or graphs, etc. Two attributes were retained for this purpose:

  • The width attribute is meant to indicate to the application the number of characters to be used for input or output of the quantity.

  • The precision attribute is meant to express the number of significant digits, either as a number of decimal places (e.g. precision="F2" or equivalently precision="2" to express 2 significant figures after the decimal point), or as a number of significant figures (e.g. precision="E5" indicates a relative precision of \(10^{-5}\)).

The existence and presentation of the special null value of a field (when the actual value of the field is unknown) is another aspect of the numerical accuracy, which is part of the VALUES sub-element (see section 4.7 VALUES Element).

4.3 Extended Datatype xtype

The xtype attribute expands the basic datatype primitives (in Table 2.1 Primitives) representing the storage units which are valid in any of the VOTable serializations, and corresponds therefore exactly to the FITS definitions. It fills the gap between the datatypes known by FITS and those required to express queries (Astronomical Data Query Language or ADQL, ADQL) and their results in tabular form (Table Access Protocol or TAP, Dowler et al. [TAP1.0]).

As an example, setting xtype="timestamp" instructs VOTable parsers to interpret a string as a timestamp (an instant in an absolute time frame), materialized by a UTC date/time string following the ISO-8601 standard (YYYY-MM-DDThh:mm:ss eventually followed by a decimal point and fractions of seconds). Supporting parsers might then expose the corresponding values in whatever way appropriate for such timestamps in the host language. VOTables software does not need to interpret xtypes, but it should preserve them when doing round-trips.

The IVOA recommendation Data Access Layer Interface DALI ref defines the common values of xtype and the literals of conforming column values.

4.4 Units

The quantities in a column of the table may be expressed in some physical unit, which is specified by the unit attribute of the FIELD. The syntax of the unit string SHOULD conform to the VOUnits specification, Gray et al. [VOUnits1.1]; this requires a string without blanks or spaces where multiplication is indicated by the symbol “.”, division by the symbol “/” and exponentiation by the symbol “*”. Examples are unit="m**2" for m\(^2\), unit="cm**-2.s**-1.keV**-1" for cm\(^{-2}\)s\(^{-1}\)keV\(^{-1}\), or unit="m/s" for m s\(^{-1}\).

4.5 Unified Content Descriptors

The Unified Content Descriptors (UCD) can be viewed as a hierarchical glossary of the scientific meanings of the data contained in the astronomical tables. Two versions of UCDs have been developed: the initial version (UCD1) created at CDS, which uses atomic words separated by underscores (e.g. POS_EQ_RA_MAIN); and a more flexible one, UCD1+ [UCD1+1.5], developed in the frame of the IVOA Semantics Working Group, which uses a reduced vocabulary of dot-separated atoms which can be combined with semi-colons (e.g. pos.eq.ra;meta.main). UCD1+ usage is recommended, but applications using the older vocabulary are still acceptable in this version of VOTable.

:raw-latex:`noindent `A few typical examples of UCD1+ definitions are:

"”``

Blue magnitude

"”``

Orbital eccentricity

"”``

Median Value of the Period

"”``

Detector’s Quantum Efficiency

4.6 The utype Attribute

In many contexts, it is important to specify that FIELD``s or :raw-latex:`\color{DarkRed}`\ ``PARAM``eters convey the values defined in an external *data model*. For instance, it can be fundamental for an application to be aware that a given :raw-latex:`\color{DarkRed}`\ ``FIELD expresses the surface brightness measured with a specific filter and within a \(12\times6\,\textrm{arcsec}\) elliptical aperture. None of the other name, ID or ucd attributes can fill this role, and the utype (usage-specific or unique type) attribute was introduced in VOTable 1.1 to fill this gap. By extension, most elements may refer to some external data model, and the utype attribute is also legal in RESOURCE, TABLE and GROUP elements.

Note that the utype attribute is not an XML QName. This means that even when utypes are written with colons (e.g., adhoc:service), whatever is in front of the colon has no relationship to XML namespace URIs. In other words, utypes are opaque strings (except, where defined that way by standards using them, for case-folding).

4.7 VALUES Element

The VALUES element of the FIELD is designed to hold subsidiary information about the domain of the data. For instance, in the example (section VOTable:example1) we could rewrite the RA field definition as:

<FIELD name="RA" ID="col1" ucd="pos.eq.ra;meta.main"
       datatype="float" width="6" precision="2" unit="deg">
  <VALUES ID="RAdomain">
    <MIN value="0"/>
    <MAX value="360" inclusive="no"/>
  </VALUES>
</FIELD>

color{DarkRed}`VALUES element (and by its MIN, MAX and OPTION sub-elements) can be qualified by type="actual", if it is valid only for the data enclosed in the parent TABLE; the default type="legal" qualification specifies the generic domain of valid values, as in the RAdomain in the example above where the interval \([0,360[\) is specified.

The VALUES element may contain MIN and MAX elements, and it may contain OPTION elements; the latter may itself contain more OPTION elements, so that a hierarchy of keyword-values pairs can be associated with each field. Note that only a single pair MIN / MAX is possible, whereas many OPTION elements may be used to qualify the domain described by the VALUES element. The domain may therefore be defined as a single interval, or as a set of individual values. Although the schema does not forbid all three MIN, MAX and OPTION sub-elements simultanesouly, such usage is considered as bad practice and is discouraged.

All three MIN, MAX and OPTION sub-elements store their value corresponding to the minimum, maximum, or “special value” in a value attribute. MIN and MAX elements can have an inclusive attribute to specify whether the value quoted belongs to the domain or not, and the OPTION element can have a name attribute to describe the “special” quoted value.

Unless a FIELD has a nonempty xtype, the value of the value attribute always is a single TABLEDATA literal of the datatype and gives a global limit for all cells of the array; arrays are conceptually homogeneous in VOTable. When the parent of a VALUES element does have an xtype, special rules apply; clients should only try to parse limits of xtyped fields when they know the xtype. For instance, with:

<FIELD name="flux" datatype="float" unit="Jy">
  <VALUES><MIN value="0"/><MAX value="1e-4"/></VALUES>
</FIELD>

<FIELD name="fluxes" datatype="float" arraysize="30" unit="Jy">
  <VALUES><MIN value="0"/><MAX value="1e-4"/></VALUES>
</FIELD>

<PARAM name="CIRCLE" datatype="float" arraysize="3" xtype="circle">
  <VALUES><MAX value="312.5 -41 2"/></VALUES>
</PARAM>

, and clients could, for instance, raise warnings if they are not. In the last example, CIRCLE, clients not familiar with xtype="circle" would ignore the MAX declaration. Clients familiar with this xtype’s particular interpretation of MAX would learn about a spatial coverage of a spherical circle with radius two degrees around the point \((312.5^\circ,-41^\circ)\); see the SODA specification [SODA1.0] for the context of this particular example.

The VALUES element may also have a null attribute to define a non-standard value that is used to specify “non-existent data” – for example null="-32768". When this value is found in the corresponding data, it is assumed that no data exists for that table cell; the parser may also choose to use this when unparsable data is found, and the null value will be substituted instead. The value of the null attribute must follow the same rules as the TABLEDATA serialization for the appropriate datatype described in section 6 Definitions of Primitive Datatypes, and may never contain an array value. This mechanism is only intended for use with integer types; it should not be used for floating point types, which can use NaN instead.

This mechanism for representing null values is required for integer columns in the BINARY serialization. Since VOTable 1.3 however other mechanisms are available for representing null values in the TABLEDATA and BINARY2 serializations. Representation of nulls using the VALUES element and otherwise is discussed further in section 5.5 Null values.

Finally the ref attribute of a VALUES element can be used to avoid a repetition of the domain definition, by referring to a previously defined VALUES element having the referenced ID attribute. When specified, the ref attribute completely defines the domain without any other element or attribute, e.g. <VALUES``\ :raw-latex:`\color{DarkRed}`\ ``ref="RAdomain"/>.

4.8 INFO Element

The INFO element is a PARAM element restricted to be of type string (i.e. datatype="char" and arraysize="*" are implied). It must also have a name attribute, and may have the other attributes allowed in a PARAM: ID, ref, unit, ucd and utype. But unlike PARAM, INFO does not accept sub-elements: only text is acceptable in INFO’s body. This limitation ensures full compatibility with the previous versions of VOTable.

INFO is meant to convey informative details about the generation of the VOTABLE document. It may be present at the beginning or end of VOTABLE or RESOURCE elements, or at the end of a TABLE. Typical uses of INFO include error reports, or explanations about choices made by the data processing system which generates the VOTable document.

4.9 ``GROUP``ing ``FIELD``s and ``PARAM``eters

The GROUP element is used to group together a set of FIELD``s and :raw-latex:`\color{DarkRed}`\ ``PARAM``s which are logically connected, like a value and its error. The :raw-latex:`\color{DarkRed}`\ ``FIELD``s are always defined *outside* any group, and the :raw-latex:`\color{DarkRed}`\ ``GROUP designates its member fields via FIELDref elements.

A simple example of a group made of the velocity and its error, based on the example of section VOTable:example1, can be the following:

<GROUP name="Velocity">
  <DESCRIPTION>Velocity and its error</DESCRIPTION>
  <FIELDref ref="col4"/>
  <FIELDref ref="col5"/>
</GROUP>

The GROUP element can have the name, ID, ucd, utype and ref attributes. It can include a DESCRIPTION, and any mixture of FIELDref``erences, :raw-latex:`\color{DarkRed}`\ ``PARAM``eters, :raw-latex:`\color{DarkRed}`\ ``PARAMref``erences and other :raw-latex:`\color{DarkRed}`\ ``GROUP``s. :raw-latex:`\color{DarkRed}`\ ``PARAMref is a logical definition of a parameter that refers to a PARAM element defined elsewhere in the parent TABLE or RESOURCE; similarly the FIELDref element defined by referring to a FIELD element defined elsewhere in the parent TABLE. The recursivity of the GROUP element enables a definition of arbitrarily complex structures.

The possibility of adding ``PARAM``eters in groups also introduces a possibility of associating parameter(s) to accurately describe the context of the data stored in the table. For instance, it is possible to associate the actual frequency of a radio survey with a table of flux measurements using the following declaration:

<FIELD name="Flux" ID="col4" ucd="phot.flux;em.radio.200-400MHz"
       datatype="float" width="6" precision="1" unit="mJy"/>
<FIELD name="e_Flux" ID="col5" datatype="float" width="4" precision="1"
       ucd="stat.error;phot.flux;em.radio.200-400MHz" unit="mJy"/>
<GROUP name="Flux" ucd="phot.flux;em.radio.200-400MHz">
  <DESCRIPTION>Flux measured at 352MHz</DESCRIPTION>
  <PARAM name="Freq" ucd="em.freq" unit="MHz" datatype="float"
         value="352"/>
  <FIELDref ref="col4"/>
  <FIELDref ref="col5"/>
</GROUP>

Similarly, GROUP can be used to associate several parameters to one or several ``FIELD``s. For example, a filter may be characterized by the central wavelength and the FWHM of its transmission curve, or several parameters of an instrument setup may be described.

4.10 The Relational Context

With a simple naming convention, the GROUP element may also specify some properties of the tables included in a VOTable document when a TABLE is viewed as a relation (part of a a relational database):

  • A GROUP element having the name="primaryKey" attribute defines the primary key of the relation by enumerating the ordered list of ``FIELDref``s that make up the primary key of the table;

  • A GROUP element having the name="foreignKey" attribute, with a ref="" reference of the table having the associated primary ley, similarly enumerates the ``FIELDref``s of the foreign key;

  • A GROUP element having the name="order" attribute may specify how the data are ordered.

:raw-latex:`noindent `Similar conventions could be added for the existence of indexes, unique values, etc.

5 Data Content

While the bulk of the metadata of a VOTable document is in the FIELD elements, the data content of the table is in a single DATA element. The data is organized in “reading” order, so that the content of each row appears in the same order as the order of the FIELD definitions.

Each DATA part of the VOTable document can be viewed as a stream coming out of a pipeline. The abstract table is first serialized by one of several methods, then it may be encoded for compression or other reasons. The result may be embedded in the XML file (local data), or it may be remote data.

Figure VOTable:fig:serialization shows how the abstract table is rendered into the VOTable document. First the data is serialized, either as XML, a FITS binary table, or the VOTable Binary format. This data stream may then be encoded, perhaps for compression or to convert binary to text. Finally, the data stream may be put in a remote file with a URL-type pointer in the VOTable document; or the table data may be embedded in the VOTable.

The serialization elements and their attributes are described in the next sections.

5.1 TABLEDATA Serialization

The TABLEDATA element is a way to build the table in pure XML, and has the advantage that XML tools can manipulate and present the table data directly. The TABLEDATA element contains TR elements, which in turn contain TD elements — i.e. the same conventions as in HTML. The number of TD elements in each TR element must be equal to the number of FIELD elements declaring the table. An example is contained in section VOTable:example1, surrounded by the <TABLEDATA> and <TABLEDATA> delimiters.

Each item in the TD tag contains a value which must be compatible with the datatype attribute of the corresponding FIELD definition. If the value is the same as the null value for that field (see section 4.7 VALUES Element) then the item is assumed to contain no data. Valid representations of values in a cell, depending on their datatype, are detailed in section 6 Definitions of Primitive Datatypes. If the TD element is empty (<TD/> or <TD></TD>) the cell is considered to contain no data, i.e. to be null.

If a cell contains an array of numbers or a complex number, it should be encoded as multiple numbers separated by whitespace. However in the case of character and Unicode strings (declared in the corresponding FIELD as an array of char or unicodeChar datatype), no separator should exist. Here is an example of a two-row table that has arrays in the table cells:

<TABLE>
  <FIELD name="aString" datatype="char" arraysize="10"/>
  <FIELD name="aShort"  datatype="short"/>
  <FIELD name="varInts" datatype="int"  arraysize="*"/>
  <FIELD name="Floats"  datatype="float"arraysize="3"/>
  <DATA><TABLEDATA>
    <TR> <TD>Apple</TD>  <TD/>       <TD>1 2 4 8 16</TD> <TD>1.62 4.56 3.44</TD> </TR>
    <TR> <TD>Orange</TD> <TD>15</TD> <TD>23 -11 9</TD>   <TD>2.33 4.66 9.53</TD> </TR>
  </TABLEDATA></DATA>
</TABLE>

The first entry is a fixed-length array of 10 characters; since the value being presented (Apple) has 5 characters, this is padded with trailing blanks. The second cell is a short integer but has a null value, as indicated by the empty TD element. The third cell contains a variable-length array of integers. The last cell contains a fixed-length array of three floats.

A special notice should be mentioned about the significance of white space in a table cell (the term white space designates the characters space [x20], tab [x09], newline [x0a], carriage-return [x0d]): while for numeric data types the amount of white spaces does not matter (the elements of an array of numbers may for instance be written on several lines), the white space is significant for "”`` or "”`` datatypes, and for instance <TD>Apple</TD> and <TD> Apple</TD> are not identical.

5.2 FITS Serialization

The FITS format for binary tables [] is in widespread use in astronomy, and its structure has had a major influence on the VOTable specification. Metadata is stored in a header section, followed by the data. The metadata is essentially equivalent to the metadata of the VOTable format. One important difference is that VOTable does not require specification of the number of rows in the table, an important advantage if the table is being created dynamically from a stream.

The VOTable specification does not define the behavior of parsers with respect to this doubling of the metadata. A parser may ignore the FITS metadata, or it may compare it with the VOTable metadata for consistency, or other possibilities.

The following code shows a fragment that might have been created by a FITS-to-VOTable converter. Each FITS keyword has been converted to a PARAM, and the data itself is remotely stored and gzipped at an FTP site:

:raw-latex:`bgroup `

<RESOURCE>
Original Epoch of the coordinates:raw-latex:color{blue}<DESCRIPTION>
<RESOURCE>

The FITS file may contain many data objects (known as extensions, numbered from 1 up, the main header being numbered 0), and the extnum attribute allows the VOTable to point to one of these.

5.3 BINARY Serialization

The binary format is intended to be easy to read by parsers, so that additional libraries are not required. It is just a sequence of bytes with the length of each sequence corresponding to the datatype and arraysize attributes of the FIELD elements in the metadata. The binary format consists of a sequence of records with no header bytes, no alignment considerations, and no block sizes. The order of the bytes in multi-byte primitives (e.g. integers, floating-point numbers) is Most Significant Byte first, i.e., it follows the FITS convention.

Table cells may contain arrays of primitive types, each of which may be of fixed or variable length. In the former case, the number of bytes is the same for each instance of the item, as specified by the arraysize attribute of the FIELD. If all the fields have a fixed arraysize, then each record of the binary format has the same length (the sum of arraysize times the length in bytes of the corresponding datatype).

Variable-length arrays of primitives are preceded by a 4-byte integer containing the number of items of the array. The parser can then compute the number of bytes taken by the variable-length array by multiplying the size and number of the primitives.

The way the stream of bytes is arranged for the data of the example in section VOTable:example2 is illustrated in Figure VOTable:fig:bin. In this case the second column must be declared like this:

<FIELD name="aShort" datatype="short">
  <VALUES null="99"/>
</FIELD>

to indicate a magic value representing nulls, since no equivalent of the empty TD element is available for the BINARY serialization (see section 5.5 Null values).

The BINARY serialization has been available in all versions of VOTable. From VOTable 1.3 however, the alternative BINARY2 serialization is an alternative, providing more straightforward null-flagging capabilities. In VOTable 1.3 and above BINARY remains a legal serialization, but for most purposes VOTable producers are advised to use BINARY2 instead.

5.4 BINARY2 Serialization

The BINARY2 format, introduced at VOTable 1.3, is the same as BINARY, but with null entries flagged explicitly rather than identified by their values. The byte stream contains one additional bit for each table cell indicating whether that cell’s value is to be considered null or not.

The byte content for each row consists of zero or more bytes containing a null value flag for each cell in the row, followed by the bytes for the BINARY serialization as described in the previous subsection. The null flags are stored as exactly one bit per table column, and the number of flag bytes is the smallest required for this purpose; the number of flag bytes per row for an \(N\)-column table will therefore be the integer part of \((N+7)/8\). The most significant bit of the first flag byte corresponds to the first column, the second most significant bit of the first flag byte to the second column, the most significant bit of the second flag byte to the eighth column, and so on. A set (1) bit indicates that the corresponding cell is null, and an unset (0) bit indicates that its value is not null. Unused bits will be at the less-significant end of the final flag byte, and shall be unset (0).

It is recommended, but not required, that a cell value flagged as null is filled with the NaN value for floating point or complex datatypes, and zero-valued bytes for other datatypes. It is particularly recommended that a variable length array cell value flagged as null is represented as 4 zero-valued bytes, indicating a zero-length value.

The way the stream of bytes is arranged for the data of the example in section VOTable:example2 is illustrated in Figure VOTable:fig:bin2.

5.5 Null values

VOTable provides two approaches to representing null values in data.

The first approach makes use of the VALUES element’s null attribute to indicate that whenever a particular “magic” value is encountered in a column’s data, that entry should be considered as null. This magic value must represent a legal scalar value for the column’s datatype, for instance in the case of datatype="unsignedByte" it must be an integer in the range 0–255. This approach, inherited from FITS, works in the same way for all of the defined VOTable serializations. However it can present difficulties when generating VOTables, since the magic value must be distinct from all actual data values in the column, and must be chosen before the column data has been written, since the FIELD element precedes the DATA.

The second approach, introduced at VOTable 1.3, is to mark null values using some mechanism external to the data itself, and it works differently for the different serializations. In the TABLEDATA serialization an empty TD element signals a null value, and in the BINARY2 serialization a separate null-ness flag is provided for each cell. The BINARY and FITS serializations do not support this approach at all. It should be noted that when using this approach, unlike with magic values, the different serializations do not have identical capabilities for representing data, so that lossless round-tripping between serializations is not always possible.

Some other subtleties concerning null values should also be mentioned:

  • The only way to mark as null individual elements of an array-valued cell is by use of the magic value mechanism, which operates on a per-element basis. Although the magic value approach can mark individual elements of an array as null, it cannot mark a whole multi-element array as null.

  • In TABLEDATA array-valued columns, a null value and a zero-length array are not distinguished. Since strings are represented as arrays of characters, this also means that empty and null strings are not distinguished.

  • In either approach, floating point values not formally marked as nulls may take the value NaN (not-a-number), represented by the string “NaN” or by a suitable IEEE bit pattern as appropriate. This option is suitable for scalar, complex, and array-valued columns. For most purposes, the distinction between NaN and null is not significant, and VOTable implementations are not required to distinguish these cases. However, the BINARY2 encoding does provide the option to represent them differently for specialised applications where that is desirable.

  • The magic value mechanism, as in FITS, is only intended for integer values. Historically it has not been explicitly forbidden for floating point values, but such use is strongly deprecated in favour of the use of NaN.

  • Combining the two approaches is not encouraged, and use of the VALUES null attribute is deprecated where it can be avoided (marking null cells in TABLEDATA and BINARY2 serializations in VOTable 1.3 and above). However, if it is present, the VALUES null attribute must always be respected.

  • The boolean datatype has its own arrangements for representing null which do not require use of either of the special approaches above.

5.6 Data Encoding

As a result of the serialization, the table has been converted to a byte stream, either text or binary. If the TABLEDATA serialization is used, then the table is represented as XML tags directly embedded in the document, and conventional tools can be used to encode the entire XML document. However, VOTable also provides limited encoding of its own. A VOTable document may point to a remote data resource that is compressed; rather than decompressing before sending on the wire, it can be dynamically decoded by the VOTable reader. We might also use the encoding facilities to convert a binary file to text (through base64 encoding), so that binary data can be used in the XML document.

In this version of VOTable, it is not possible to encode individual columns of the table: the whole table must be encoded in the same way. However, the possibility of encoding selected table cells is being examined for future versions of VOTable (see appendix A.5 Encoding Individual Table Cells).

In order to use an encoding of the data, it must be enclosed in a STREAM element, whose attributes define the nature of the encoding. The encoding attribute is a string that should indicate to the parser how to undo the encoding that has been applied. Parsers should understand and interpret at least the following values:

  • encoding="gzip" [RFC1952] implies that the data following has been compressed with the gzip filter, so that gunzip or similar should be applied.

  • encoding="base64" [RFC2045] implies that the base64 filter has been applied, to convert binary to text.

  • encoding="dynamic" implies that the data is in a remote resource (see below), and the encoding will be delivered with the header of the data. This occurs with the http protocol, where the MIME header indicates the type of encoding that has been used.

:raw-latex:`noindent `The default value of the encoding attribute is the null string, meaning that no encoding has been applied. In future releases, we might allow more complex strings in the encoding attribute, allowing combinations of encoding filters and a way for the parser to find the software needed for the decoding.

Note that for inline streamed data (a STREAM with no href attribute) it is effectively required to use encoding="base64", since of the available options only base64 will ensure that binary data is encoded as legal XML content.

5.7 Remote Data

If the encoding of the data produces text, or if the serialization is naturally text-based, then it can be directly embedded into the XML document: :raw-latex:`bgroup `


\(\cdots\cdots\cdots\cdots\cdots\cdots\cdots\cdots\)

However, if the data stream is very large, it may be preferable to keep the data separate from the metadata. The href attribute of the STREAM element, if present, provides the location of the data in a URL-type syntax, for example:

:raw-latex:`bgroup `

<STREAM``\ :raw-latex:`\color{DarkRed}`\ ``href="ftp://server.com/mydata.dat"/>

<STREAM``\ :raw-latex:`\color{DarkRed}`\ ``href="ftp://server.com/mydata.dat"``\ :raw-latex:`\color{DarkRed}`\ ``expires="2004-02-29T23:59:59"/>

<STREAM``\ :raw-latex:`\color{DarkRed}`\ ``href="httpg://server.com/mydata.dat"``\ :raw-latex:`\color{DarkRed}`\ ``actuate="onLoad"/>

<STREAM``\ :raw-latex:`\color{DarkRed}`\ ``href="file:///usr/home/me/mydata.dat"/>

The examples are the well-known anonymous FTP and HTTP protocols. "”`` is an example of a Grid-based access to data through HTTPG; finally, "”`` is a reference to a local file. VOTable parsers are not required to understand arbitrary protocols, but are required to understand the three common protocols ", ``"”`` and "``.

There are further attributes of the STREAM element that may be useful. The expires attribute indicates the expiration time of the data; this is useful when data are dynamically created and stored on some staging disk where files only persist for a specified lifetime and are then automatically deleted. The expires attribute expresses when a remote resource ceases to become valid, and is expressed in Universal Time in the same way as the FITS specification, itself conforming to the ISO 8601 standard.

The rights attribute expresses authentication information that may be necessary to access the remote resource. If the VOTable document is suitably encrypted, this attribute could be used to store a password.

The actuate attribute is borrowed from the XML Xlink specification, expressing when the remote link should be actuated. The default is ", meaning that the data is only fetched when explicitly requested (like a link on an HTML page), and the ``"”`` value means that data should be fetched as soon as possible (like an embedded image on an HTML page).

6 Definitions of Primitive Datatypes

This section describes the primitives summarized in Table 2.1 Primitives and their representations in the BINARY/BINARY2 and TABLEDATA serializations (see section 5 Data Content). In the following, the term “hexadigit” designates the ASCII numbers "”`` to ", or the ASCII lower- or upper-case letters ``"”`` to "”`` (i.e. a digit in a hexadecimal representation of a number).

The representation of null values is discussed in section 5.5 Null values.

  • Logicalcolor{DarkRed}`datatype attribute specifies data type ", the contents of the field shall consist of the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization of ASCII ", ``", or ``"”`` indicating true, and ASCII ", ``", or ``"”`` indicating false. The null value is indicated by an ascii NULL [0x00], a space [0x20] or a question mark "?" [0x3f]. The acceptable representations in the TABLEDATA serialization also include any capitalisation variation of the strings "”`` and "”`` (e.g. "”`` or "``).

  • Bit Array color{DarkRed}`datatype attribute specifies data type ", the contents of the field in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of a sequence of bits starting with the most significant bit; the bits following shall be in order of decreasing significance, ending with the least significant bit. A bit field shall be composed of the smallest number of bytes that can accommodate the number of elements in the field. Padding bits shall be 0. The representation of a bit array in the TABLEDATA serialization is made by a sequence of ASCII "”`` and "”`` characters.

  • Bytecolor{DarkRed}`datatype attribute specifies data type ", the field shall contain in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization a byte (8-bits) representing a number in the range 0 to 255. In the case of an array of bytes (arraysize="*"), also known as a “blob”, the bytes are stored consecutively. The representation of a byte in the TABLEDATA serialization can be its decimal representation (a number between 0 and 255) or its hexadecimal representation when starting with 0x and followed by one or two hexadigits, (e.g. 0xff), separated by at least one space from the next one in the case of an array of bytes.

  • Charactercolor{DarkRed}`datatype attribute specifies data type ", the field shall contain in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization an ASCII (7-bit) character. The arraysize attribute indicates a character string composed of ASCII text. The BINARY/BINARY2 serialization follows the FITS rules for character strings, and a character string may therefore be terminated by an ASCII NULL [0x00] before the length specified in the arraysize attribute. In this case characters after the first ASCII NULL are not defined, and a string having the number of characters identical to the arraysize value is not NULL terminated. Characters should be represented in the TABLEDATA serialization using the normal rules for encoding XML text: the ampersand (&) can be written &amp; (symbolic representation) or &#38; (decimal representation) or &#x26; (hexadecimal representation); the less-than (<) and greater-than (>) symbols should be coded &lt; and &gt; or &#x3C; and &#x3E;. Also note also the significance of the white space characters in the TABLEDATA serialization (section VOTable:elem:TD)

  • Unicode Charactercolor{DarkRed}`datatype attribute specifies data type ", the field shall contain a Unicode character. The :raw-latex:`\color{DarkRed}`\ ``arraysize attribute indicates a string composed of Unicode text, which enables representation of text in many non-Latin alphabets. Each Unicode character is represented in the BINARY/BINARY2 serialization by two bytes, using the big-endian UCS-2 encoding (ISO-10646-UCS-2). The representation of a Unicode character in the TABLEDATA serialization follows the XML specifications, and e.g. the Cyrillic uppercase “Ya” can be written &#x042F; in UTF-8. Also note the significance of the white space characters in the TABLEDATA serialization (section VOTable:elem:TD)

  • 16-Bit Integercolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of big-endian twos-complement signed 16-bit integers (the most significant byte first). The representation of a Short Integer in the TABLEDATA serialization is either its decimal representation between -32768 and 32767 made of an optional - or + sign followed by digits, or its hexadecimal representation when starting with 0x and followed by 1 to 4 hexadigits.

  • 32-Bit Integer color{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of big-endian twos-complement signed 32-bit integer contained in four bytes, with the most significant first, and subsequent bytes in order of decreasing significance. The representation of an Integer in the TABLEDATA serialization is either its decimal representation between -2147483648 and 2147483647 made of an optional - or + sign followed by digits, or its hexadecimal representation when starting with 0x and followed by 1 to 8 hexadigits;

  • 64-Bit Integercolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of big-endian twos-complement signed 64-bit integers contained in eight bytes, with the most significant byte first, and subsequent bytes in order of decreasing significance. The representation of a Long Integer in the TABLEDATA serialization is either its decimal representation between -9223372036854775808 and 9223372036854775807 made of an optional - or + sign followed by digits, or its hexadecimal representation when starting with 0x and followed by 1 to 16 hexadigits;

  • Single Precision Floating Pointcolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of ANSI/IEEE-754 32-bit floating point numbers in big-endian order. All IEEE special values including NaN are recognized. The representation of a Floating Point number in the TABLEDATA serialization is made of an optional - or +, followed by the ASCII representation of a positive decimal number, and followed eventually by the ASCII letter "”`` or "”`` introducing the base-10 exponent made of an optional - or + followed by 1 or 2 digits. The number must be within the limits of the IEEE floating-point definition (around \(\pm3.4\cdot10^{38}\); numbers with absolute value less than about \(1.4\cdot10^{-45}\) are equated to zero). The special values ", ``"-Inf", and "”`` are accepted.

  • Double Precision Floating Pointcolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of ANSI/IEEE-754 64-bit double precision floating point numbers in big-endian order. All IEEE special values including NaN are recognized. The representation of a Double number in the TABLEDATA serialization is made of an optional - or +, followed by the ASCII representation of a positive decimal number, and followed eventually by the ASCII letter "”`` or "”`` introducing the base-10 exponent made of an optional - or + followed by 1 to 3 digits. The number must be within the limits of the IEEE floating-point definition (around \(\pm1.7\cdot10^{308}\); numbers with absolute value less than about \(5\cdot10^{-324}\) are equated to zero). The special values ", ``"-Inf", and "”`` are accepted.

  • Single Precision Complexcolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of a sequence of pairs of 32-bit single precision floating point numbers in big-endian order. The first member of each pair shall represent the real part of a complex number and the second member shall represent the imaginary part of that complex number. The representation of a Floating Complex number in the TABLEDATA serialization is made of two representations of a Single Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively.

  • Double Precision Complexcolor{DarkRed}`datatype attribute specifies datatype ", the data in the :raw-latex:`\color{DarkRed}`\ ``BINARY/BINARY2 serialization shall consist of a sequence of pairs of 64-bit double precision floating point numbers in big-endian order. The first member of each pair shall represent the real part of a complex number and the second member of the pair shall represent the imaginary part of that complex number. The representation of a Double Complex number in the TABLEDATA serialization is made of two representations of a Double Precision Floating Point numbers separated by whitespace, representing the real and imaginary part respectively.

7 A Simplified View of the VOTable Schema

The XML Schema defining a VOTable document is available from http://www.ivoa.net/xml/VOTable/votable-1.5.xsd as well as in appendix B The VOTable version XML Schema of this document. In this section we illustrate this XML Schema by a set of boxes describing the structure of a VOTable, and the list of attributes of each VOTable element.

Note that in case of discrepancies between the XML Schema and these diagrams, the schema is definitive.

7.1 Element Hierarchy

The hierarchy of the elements existing in VOTable is illustrated below; it uses the following conventions:

  • italicized text represents optional elements;

  • \(\oplus\) indicates that the order of the elements is mandatory, while

  • circ` (open bullet) indicates that the corresponding elements may occur in any order;

  • \(\mapsto\) marks a choice between alternatives.

  • \(\dagger\) marks a deprecated element.

  • \(\cdots\) (dots) indicate that an element may be repeated.

  • underlined elements may contain sub-elements, and are therefore explained in a dedicated box of the figure.

7.2 Attribute Summary

The list of the attributes is summarized in the table below, with the following conventions:

  • Attributes written in bold are :raw-latex:`color{DarkBlue}```required attributes``

  • Attributes written in a fixed font are optional.

  • Attributes written in italics are not part of VOTable , but are reserved for possible extensions (mentioned in an Appendix).

8 MIME Type

A VOTable document should be introduced by a Multipurpose Internet Mail Extensions media type, or MIME type. MIME type syntax is described in RFC 2045 section 5.1, and its semantics in RFC 2046. Associating a MIME type to a document enables the data consumer (an application or a web browser) to launch the desired application (e.g. a visualisation tool).

In the HTTP protocol (RFC 2616), the MIME type is the value specified by the Content-Type: header. The recommended MIME type describing a VOTable document is ": a VOTable document is ``": the **x-** prefix indicates an experimental type, and is required for non-registered media types; and the **+xml** suffix (defined by RFC 3023 section 7) indicates that the type describes a specialization of XML. This type may be accompanied by an optional parameter ``"”`` with a value specifying the serialization type used for table data within the document, one of TABLEDATA, FITS, BINARY or BINARY2, interpreted case-insensitively. In the absence of this parameter, any of the serializations may be encountered. If multiple different serializations are used in the same document, this parameter must not be used.

Alternatively the "”`` MIME type is acceptable for services delivering data which are expected to be visualized by humans in a browser; this MIME type would preferably be associated with an XSL style sheet, for a presentation of well-formatted tables. It is expected that a few typical XSL style sheets will be accessible from the IVOA site. Note that use of the text top-level media type means that line breaks must be represented as a CRLF sequence (RFC 2046, section 4.1.1).

For both of these MIME types, RFC 3023 also defines the optional parameter "``. If this parameter is not supplied, US-ASCII is assumed.

Any of the following Content-Type header values may therefore be used by a service producing VOTables with the TABLEDATA serialization:

  • text/xml

  • text/xml; charset=”iso-8859-1”

  • application/x-votable+xml

  • application/x-votable+xml; serialization=tabledata

  • application/x-votable+xml; serialization=TABLEDATA; charset=iso-8859-1

9 Version History

9.1 Differences Between Versions 1.1 and 1.2

The differences between version 1.2 of VOTable and the preceding version 1.1 are:

  • COOSYS is deprecated, in favor of a reference to the Space-Time Coordinate (STC) data model (see section 4.6 The utype Attribute and the IVOA note Referencing STC in VOTable)

  • GROUP may appear as a direct child of VOTABLE and RESOURCE (where COOSYS was acceptable)

  • The usage of UCD1+ is recommended (section 4.5 Unified Content Descriptors)

  • The xtype attribute was added (see section 4.3 Extended Datatype xtype)

  • The INFO element (section 4.8 INFO Element) is made more similar to the PARAM element, but with datatype="char" and arraysize="*" (i.e. is a String); it may have attributes utype, ucd, ref, unit

  • The INFO element may occur before the closing tags /TABLE and /RESOURCE and /VOTABLE (enables post-operational diagnostics)

  • The FIELDref and PARAMref elements may have a utype and ucd attribute.

  • Naming conventions of GROUP elements which specify some properties of a relational schema (see section 4.10 The Relational Context).

  • The recommended and acceptable mime types have been made explicit (section 8 MIME Type)

  • The representation of arrays in cells has been made explicit (section VOTable:sec:dim)

  • Detailed and clarified the conventions and recommendations concerning name, ID and ref attributes

  • Appendix A7 was a proposition for additional utype attributes in groups and tables; it is now included in VOTable 1.2. appendix A.7 Additional Descriptions and Titles now contains a new proposal (May/June 2009) for multiple descriptions and titles.

9.2 Differences Between Versions 1.2 and 1.3

The differences between version 1.3 of VOTable and the preceding version 1.2 are:

  • The BINARY2 serialization has been introduced (section 5.4 BINARY2 Serialization). BINARY is mildly deprecated.

  • The usage and semantics of an empty TD element in the TABLEDATA serialization have changed. In VOTable 1.3, an empty TD element is legal for any datatype (previously it was illegal for integer types) and it always denotes a null value (previously it indicated NaN for floating point types). This change means that the different serializations no longer have exactly the same capabilities for data representation.

  • A new section 5.5 Null values has been added to clarify usage and encoding for null values.

  • In view of the new options for flagging null values introduced by the BINARY2 and TABLEDATA changes above, use of the VALUES null attribute is now deprecated in most cases. It is in any case explicitly deprecated for floating point values, in favour of NaN.

  • The description of the LINK element (section 3.7 LINK Element) has been clarified, and a new content-role="type" example added, with discussion of its application to SKOS concepts.

  • The schema datatype declaration of the LINK content-type attribute has been changed from NMTOKEN to token. The NMTOKEN datatype in VOTable 1.2 was a mistake, since it would not have permitted the MIME type example in the text.

  • The MIME type section (now section 8 MIME Type) now describes the new serialization parameter that can be used to specify serialization type.

  • The representation of STC information in section VOTable:example1 and section A.2 VOTable Query Extension has been modified to reflect the recommended usage from the STC in VOTable Note. This usage is recommended even for VOTable 1.2, so this change to the VOTable document represents an update of advice rather than a change to the normative part of the VOTable standard. Additionally, text has been added encouraging declaration of the STC metadata where possible.

  • A new section 1.4 VOTable in the VO Architecture has been added explaining the place of VOTable in the IVOA Architecture.

9.3 Differences Between Versions 1.3 and 1.4

The differences between version 1.4 of VOTable and the preceding version 1.3 are:

  • Applying erratum VOTable 1.3-1, un-deprecating COOSYS.

  • Applying erratum VOTable 1.3-2, permitting F0 in precision.

  • Applying erratum VOTable 1.3-3, clarifying the meaning of arraysize="1".

  • Add a new TIMESYS element as a simple means for supplying metadata for time values in the VOTable.

9.4 Differences Between Versions 1.4 and 1.5

The differences between version 1.5 of VOTable and the preceding version 1.4 are:

  • COOSYS now has a refposition attribute analogous to TIMESYS.

  • The frame identifiers (system attribute) in COOSYS are now taken from the refframe IVOA vocabulary.

  • Clarifications and rewording on:

    • the meaning of MIN and MAX value attributes for array types.

    • removing the recommendation to use xmlns to do utype prefix binding.

    • timescales for calendar epochs.

    • positioning advice for ID and corresponding references.

    • noting that RESOURCE elements can contain MIVOT blocks.

    • unit attribute SHOULD conform to VOUnits, and correct examples accordingly.

Appendices

A Possible VOTable extensions

The definitions enclosed in this appendix are not part of the VOTable standard, but are considered as candidates for VOTable improvements.

A.2 VOTable Query Extension

:raw-latex:`color{DarkBlue}` The metadata part included in a :raw-latex:`color{DarkRed}```RESOURCE`` contains all the details necessary to create a *form for querying the resource. The addition of a link having the action attribute can turn VOTable into a powerful query interface.*

, the details on the input parameters available in queries are described by the PARAM and FIELD elements, and the syntax used to generate the actual query is described in the ASU procotol []: the FIELD or PARAM elements are paired in the form name=value, where name is the contents of the name attribute of a FIELD or PARAM, and value represents a constraint written with the ASU conventions (e.g. "”`` or "”`` which denotes a range of values). Such pairs are appended to the action specified in the LINK element contained in the RESOURCE, separated by the ampersand (&) symbol – in a way quite similar to the HTML syntax used to describe a FORM.

A special type="no_query" attribute of the PARAM or FIELD elements marks the fields which are not part of the form, i.e. are ignored in the collection of name=value pairs.

The following is an example of a transformation of the VOTable in section VOTable:example1 into a form interface:

color{DarkRed}`RESOURCE displaying the parameters accessible for a query has the type="meta" attribute; it is also assumed that only one LINK having the content-role="query" attribute together with an action attribute exists within the current RESOURCE. The PARAM with name="-out.max" has been added in this example to control the size of the result.

A valid query generated by this VOTable could be:

myQuery?-source=myGalaxies&-out.max=50&R=10..100

A.3 Arrays of Variable-Length Strings

Following the FITS conventions, strings are defined as arrays of characters. This definition raises problems for the definition of arrays of strings, which have then to be defined as 2D-arrays of characters – but in this case only the slowest-varying dimension (i.e. the number of strings) can be variable. This limitation becomes severe when a table column contains a set of remarks, each being made of a variable number of characters as occurs in practice.

FITS invented the Substring Array convention (defined in an appendix, i.e. not officially approved) which defines a separator character used to denote the end of a string and the beginning of the next one. In this convention (\(r\)A:SSTR\(w\)/\(ccc\)) the total size of the character array is specified by \(r\), \(w\) defines the maximum length of one string, and \(ccc\) defines the separator character as its ASCII equivalent value. The possible values for the separator includes the space and any printable character, but excludes the control characters.

Such arrays of variable-length strings are frequently useful e.g. to enumerate a list of properties of an observed source, each property being represented by a variable-length string. A convention similar to the FITS one could be introduced in VOTable in the arraysize attribute, using the s followed by the separator character; an example can be arraysize="100s," indicating a string made of up to 100 characters, where the comma is used to separate the elements of the array.

A.4 FIELDs as Data Pointers

Rather than requiring that all data described in the set of FIELD``s are contained in a single stream which follows the metadata part, it would be possible to let the :raw-latex:`\color{DarkRed}`\ ``FIELD act as a pointer to the actual data, either in the form of a URI or of a reference to a component of a multipart document.

Each component of the data described by a FIELD may effectively have different requirements: while text data or small lists of numbers are quite efficiently represented in pure XML, long lists like spectra or images generate poor performances if these are converted to XML. The method available to gain efficiency is to use a binary representation of the whole data stream by means of the STREAM element – at the price of delivering data in a totally non-human readable format.

The following options would allow more flexibility in the way the various ``FIELD``s can be accessed:

  • a FIELD can be declared as being a pointer with the addition of a type="location" value, meaning that the field contains a way to access the data, and not the actual data;

  • a FIELD can contain a LINK element marked type="location" which contains in its href attribute the partial URI to which the contents of the column cell is appended in order to generate a fully qualified URI.

Note that the LINK is not required – a FIELD declared with type="location" and containing no LINK element is assumed to contain URIs.

An example of a table describing a set of spectra could look like the following:

<TABLE name="SpectroLog">
  <FIELD name="Target" ucd="meta.id" datatype="char" arraysize="30*"/>
  <FIELD name="Instr" ucd="instr.setup" datatype="char" arraysize="5*"/>
  <FIELD name="Dur" ucd="time.expo" datatype="int" width="5" unit="s"/>
  <FIELD name="Spectrum" ucd="meta.ref.url" datatype="float" arraysize="*"
         unit="mW/m2/nm" type="location">
    <DESCRIPTION>Spectrum absolutely calibrated</DESCRIPTION>
    <LINK  content-role="location"
        href="http://ivoa.spectr/server?obsno="/>
  </FIELD>
  <DATA><TABLEDATA>
    <TR><TD>NGC6543</TD><TD>SWS06</TD><TD>2028</TD><TD>01301903</TD></TR>
    <TR><TD>NGC6543</TD><TD>SWS07</TD><TD>2544</TD><TD>01302004</TD></TR>
  </TABLEDATA></DATA>
</TABLE>

The reading program has therefore to retrieve the data for this first row by resolving the URI :raw-latex:`bgroup `

http://ivoa.spectr/server?obsno=01301903

The same method could also be immediately applicable to Content-IDs which designate elements of a multipart message, using the protocol prefix cid: [RFC2111]

Note that the VOTable LINK substitution proposed in section A.1 VOTable LINK substitutions fills a similar functionality: generate a pointer which can incorporate in its address components from the DATA part for the VOTable.

A.5 Encoding Individual Table Cells

Accessing binary data improves quite significantly the efficiency both in storage and CPU usage, especially when one compares with the XML-encoded data stream. But binary data cannot be included in the same stream as the metadata description, unless a dedicated coding filter is applied which converts the binary data into an ASCII representation. The base64 is the most commonly used filter for this conversion, where 3 bytes of data are coded as 4 ASCII characters, which implies an overhead of 33% in storage, and some (small) computing time necessary for the reverse transformation.

In order to keep the full VOTable document in a unique stream, VOTable 1.0 introduced the encoding attribute in the STREAM element, meaning that the data, stored as binary records, are converted into some ASCII representation compatible with the XML definitions. One drawback of this method is that the entire data contents become non human-readable.

The addition of the encoding attribute in the TD element allows the data server to decide, at the cell level, whether it is more efficient to distribute the data as binary-encoded or as edited values. The result may look like the following:

<TABLE name="SpectroLog">
  <FIELD name="Target" ucd="meta.id" datatype="char" arraysize="30*"/>
  <FIELD name="Instr" ucd="instr.setup" datatype="char" arraysize="5*"/>
  <FIELD name="Dur" ucd="time.expo" datatype="int" width="5" unit="s"/>
  <FIELD name="Spectrum" ucd="phot.flux;em.opt" datatype="float" arraysize="*"
         unit="mW/m2/nm" precision="E3"/>
  <DATA><TABLEDATA>
    <TR><TD>NGC6543</TD><TD>SWS06</TD><TD>2028</TD><TD encoding="base64">
    QJKPXECHvndAgMScQHul40CSLQ5ArocrQLxiTkC3XClAq0OWQKQIMUCblYFAh753QGij10BT
    Em9ARKwIQExqf0BqbphAieuFQJS0OUCJWBBAhcrBQJMzM0CmRaJAuRaHQLWZmkCyhytAunbJ
    QLN87kC26XlA1KwIQOu+d0DsWh1A5an8QN0m6UDOVgRAxO2RQM9Lx0Din75A3o9cQMPfO0C/
    dLxAvUeuQKN87kCXQ5ZAjFodQH0vG0B/jVBAgaHLQI7Ag0CiyLRAqBBiQLaXjUDYcrBA8p++
    QPcKPUDg7ZFAwcKPQLafvkDDlYFA1T99QM2BBkCs3S9AjLxqQISDEkCO6XlAmlYEQKibpkC5
    wo9AvKPXQLGBBkCs9cNAuGp/QL0euEC4crBAuR64QL6PXEDOTdNA2987QN9T+EDoMSdA8mZm
    QOZumEDDZFpAmmZmQGlYEEBa4UhAivGqQLel40Dgan9A4WBCQLNcKUCIKPZAk1P4QNWRaEEP
    kWhBKaHLQTkOVkFEan9BUWBCQVyfvg==
    </TD></TR>
  </TABLEDATA></DATA>
</TABLE>

When decoded, the contents of the last column is the binary representation of the spectrum, as defined in section 5.3 BINARY Serialization; no length prefix is required here, the total length of the array being implicitly defined by the length of the encoded text.

A.6 Very Large Arrays

The BINARY and BINARY2 serializations of variable-length arrays (section 5.3 BINARY Serialization, section 5.4 BINARY2 Serialization) uses a 4-byte prefix containg the number of items of the array. This convention imposes an absolute maximal number of \(2^{31}-1\) elements. This limit could be releaved with a new arrayprefix attribute.

A.7 Additional Descriptions and Titles

The same table may be used in several contexts, and it was for instance expressed a wish to include in TABLE and FIELD descriptions and titles (captions) in a form suitable for a publication (latex) in addition to the ascii-only descriptions currently acceptable. The following example is an illustration of this extension:

<TABLE name="Model_A">
  <DESCRIPTION>Star luminosities in Model A</DESCRIPTION>
  <DESCRIPTION context="latex">$L(T_{eff})$ in Model {\bf A}</DESCRIPTION>
  <FIELD name="Teff" datatype="float" unit="K" ucd="phys.temperature.effective">
     <DESCRIPTION>Effective temperature</DESCRIPTION>
     <TITLE context="latex">$T_{eff}$</TITLE>
  </FIELD>
  <FIELD name="Lum" datatype="float" unit="Lsun" ucd="phys.luminosity">
     <DESCRIPTION>Corresponding luminosity in Model A</DESCRIPTION>
     <DESCRIPTION context="latex">$L(T_{eff})$</DESCRIPTION>
     <TITLE context="latex">$L/L_\odot$</TITLE>
  </FIELD>
</TABLE>

In practice this extension would mean that, wherever a DESCRIPTION element is currently acceptable, a set of DESCRIPTION and TITLE elements would become acceptable, each with an optional context additional attribute. The new TITLE element would have the role of expliciting the column header in a field or parameter, or to supply a caption of a table or a set of tables (resource) in addition to its description.

Providing descriptions in several languages would be another obvious advantage of this extension.

A.8 A New XMLDATA Serialization

In order to facilitate the use of standard XML query tools which usually require each parameter to have its own individual tag, the XMLDATA serialization introduces the designation of each FIELD by a dedicated tag. An example could look like the following:

<TABLE name="Messier">
  <FIELD name="Number" ID="M" ucd="meta.record" datatype="int" >
    <DESCRIPTION>Messier Number</DESCRIPTION>
  </FIELD>
  <FIELD name="R.A.2000" ID="RA" ucd="pos.eq.ra;meta.main"
         unit="deg" datatype="float" width="5" precision="1" />
  <FIELD name="Dec.2000" ID="DE" ucd="pos.eq.dec;meta.main"
         unit="deg" datatype="float" width="5" precision="1" />
  <FIELD name="Name" ID="N" ucd="meta.id" datatype="char" arraysize="*">
    <DESCRIPTION>Common name used to designate the Messier object</DESCRIPTION>
  </FIELD>
  <FIELD ID="T" name="Classification" datatype="char" arraysize="10*"
         ucd="src.class">
     <DESCRIPTION>Classification (galaxy, glubular cluster, etc)</DESCRIPTION>
  </FIELD>
  <DATA><XMLDATA>
    <TR>
      <M>3</M>
      <RA>205.5</RA>
      <DE>+28.4</DE>
      <N/>
      <T>Globular Cluster</T>
    </TR>
    <TR>
      <M>31</M>
      <RA>010.7</RA>
      <DE>+41.3</DE>
      <N>Andromeda Galaxy</N>
      <T>Galaxy</T>
    </TR>
  </XMLDATA></DATA>
</TABLE>

color{DarkRed}`M, RA, DE, N and T; these being derived directly from the ID attribute of the FIELD element, their definition can be generated automatically from the set of FIELD definitions.

B The VOTable version XML Schema

The XML Schema of VOTable is included here as a reference. This schema includes a couple of extra optional attributes which are not part of VOTable- (ID in TR and encoding in TD), but proved to be useful to fix some problems encountered in the usage of some code generators.