ObsCore Metadata Extension for Time Properties#
- Status:
ObscoreTimeExtension 1.0 WD 2024-07-17
Acknowledgments#
This work has been supported by various national projects related to the development of the Virtual Observatory. We acknowledge support of the ESCAPE project (the European Science Cluster of Astronomy and Particle Physics ESFRI Research Infrastructures) funded by the EU Horizon 2020 research and innovation program under the Grant Agreement n.824064. Thanks to fruitful discussion with people involved in the VESPA project and EPNCore specification. Additional funding was provided by the INSU (Action Spécifique Observatoire Virtuel, ASOV), the Action Fédératrice CTA at the Observatoire de Paris and the Paris Astronomical Data Centre (PADC).
1 Introduction#
Time domain astronomy studies astrophysical phenomenae that vary in different time stamps and hence, in order to study the different physical underlying mechanisms a user might need to collect and analyse data from different missions and of different nature. Therefore she/he needs to search across various archives based on time related criteria. ObsCore and ObsTAP ObsCore have proven their efficiency for the discovery of astronomical data sets in the IVOA. In this specification we consider how the ObsCore metadata profile can be extended to include time-related properties of the data, specific to time series and not yet covered.
In this specification we examine how to enhance data discovery and data selection of time sampled data sets in the context of the ObsCore data model and its TAP implementations. The ObsCore Specification ObsCore proposes a set of features to describe the data present in a data set as well as metadata about its acquisition, creation and publication (curation). The physical properties in terms of spatial, spectral, temporal, polarimetry, and observable measure are also described by a group of features dedicated to each axis, considered independent from others. The idea is to provide a physical feature profile for each axis with coverage, sampling, resolution, etc. Search criteria in ObsTAP are based on these features.
We describe in section 3.3 Time parameters defined in ObsCore v1.1 how the set of time parameters already present in ObsCore v1.1 can be used for time series discovery. In section 4 Time parameters proposed for ObsCore Extension we consider specific time related uses cases and propose new parameters to be included for the tables extension in ObsCore. The extension mechanism in TAP is discussed in section 5 Extension mechanism in ObsTAP with user queries examples.
1.1 Role within the VO Architecture#
Fig. ObscoreTimeExtension:fig:archdiag shows the role this document plays within the IVOA architecture [IVOAArchitecture2.0]. This specification builds up a metadata profile that must be used in a TAP service based on the ObsTAP TAP schema. It relies on fundamental standards like TAP, VOTable, UCD, VOUnits and Vocabularies defined in the IVOA for data product type and TimeReference systems for instance.
2 Time Series#
In this section we describe what Time Series data is in a wide context, describing the most relevant parameters that define it. We describe the common requirements of the different science use cases collected by the Science Priority Committee [SPC_UC]. A common frame for time is defined with the minimum set of parameters taken from and compatible with the definition of SpaceTime coordinates and Coords DM. We then compare the defined fields describing time with the fields content of ObsCore and EPNcore.
2.1 Definition#
Time Series can be defined in a very large sense as a collection of any kind of data over time for a particular source (e. g. star, binary, QSO) or part of a source (e. g. sun spots), independent on the type of data (images, light-curves, radial velocity, polarisation states or degrees, positions, number of sunspots, densities,…), the duration of the signal integration or the cadence. To clarify the vocabulary here we consider a time series as a sequence of signal integrations, or snap-shots observing an object or phenomenon over time, so different observations over time.
Considering how observations in general can be spanned along the time axis, we can sketch Time Series data as shown in Fig. ObscoreTimeExtension:fig:time-series. Time Series data is composed of a set of observations (n_observations = 3 in this example), each with a different exposure or integration time (t_exp).
Although in some cases the cadence or time span between each signal integration (delta_t) is fixed, in the general case it can be different and we can therefore define a minimum and a maximum value (delta_t_min, delta_t_max). Each observation has it’s own time stamp (t_i) with a given precision or resolution (t_resolution).
As can be seen from this figure the duration of the observation can be defined in different ways: a) as the total integration or exposure time, i. e. the sum of all the exposure times: t_exp_total = \(\sum\) t_exp ; this represents the support along the time axis and is definitely different from the elapsed time t_elapsed = t_max - time_min). Note that in the case that the exposure time is constant for all the observations then t_exp_total = n_observations \(\times\) t_exp.
The situation can be more complicated, for instance during the observation there could be clouds and we therefore pause the exposure for a while and resume once the cloud has passed or we might want to remove parts of the observation due to artefacts in the data. In any case these values can be taken as approximative of the minimum and the maximum value this specific field can have.
The most relevant fields of Time Series metadata are summarized in Table ObscoreTimeExtension:tab:fields.
Explanation |
|
sptablerule |
|
Field |
|
Coordinates\(^1\) |
|
sptablerule |
|
(RA,Dec) |
|
target_name |
Target name\(^1\) |
t_min |
Date of the begining of the time series |
t_max |
Date of the end of the time series |
t_exp_min |
Minimum exposure time |
t_exp_max |
Maximum exposure time |
t_exp_total |
Total exposure time |
delta_t_min |
Minimum time sampling period / cadence |
delta_t_max |
Maximum time sampling period/ cadence |
t_resolution |
Time resolution/precision |
n_observations |
Number of time integrations in time series |
type_of_data |
Type of data (fluxes, radial velocities, images,…) |
sptablerule |
Note: \(^1\)For SSO or moving objects coordinates might not be enough or relevant.
In many cases time series data is composed of only three columns: Time, Magnitude, Magnitude Error. This is the simplest kind of data set, which is identified in the data product type vocabulary as ’light-curve’. See the IVOA product-type vocabulary at https://www.ivoa.net/rdf/product-type/2024-03-22/product-type.html.
For this data to be fully exploitable and reusable (interoperable) it has to be properly documented. In this specific case the minimum information that needs to be provided is: the object coordinates (or name), the filter in which the observations have been carried out, and the time frame and offset (if applicable). However, the dimensionality of what is observed at the time stamps’ sequence may correspond to 1D or 2D observations, like spectra or images as well. That’s why the data product type defined in ObsCore 1.1 should be more precise and eventually rely on the IVOA product-type vocabulary.
In addition, a mechanism should be defined to clarify what part of the data is varying with time, as described further in section 3.2 Clarifying the physical content, dimensionality and time dependency of the data set.
2.2 Science use cases#
Different science use cases for Time Series have been collected and described in by E. Solano at http://wiki.ivoa.net/twiki/bin/view/IVOA/CSPTimeSeries. They highlight the case of optical light curves but can be generalized to all spectral regimes ( xray, gamma ray, radio, multi-messengers) where time dependent measures have been taken. Science cases are grouped according to their common requirements summarized as:
Group A Combine photometry and light curves of a given object/list of objects in the same photometric band
Group B Combine photometry and light curves of a given object/list of objects in different photometric bands
Group C Time series other than light curves
Looking at the different science cases we simplify the questions to two:
Have these two missions observed this object within these two dates?
Is it possible to discover long/short term variability within the data?
To answer the first question a user needs to be sure that dates are comparable, which means time has to be brought into a common time frame. To answer the second question we need to keep track of the minimum and maximum time span.
2.3 Using a common time frame#
To compare datasets from different missions or archives a common representation of time is needed. In order to do so we propose to map time into a pivot format. Following [2015A+A...574A..36R] and [STC1.3] we propose a set of minimum metadata to be added for serializations of Time Series (see Table ObscoreTimeExtension:tab:metadata).
Parameter proposal |
Explanation |
t_scale |
Time frame scale is the scale used to measure time. IAU definition: “A time scale is simply a well defined way of measuring time based on a specific periodic natural phenomenon.” See http://aa.usno.navy.mil/publications/docs/Circular_179.pdf. Recognized time scale values and their meaning are listed in Table ObscoreTimeExtension:tab:scales. If we don’t know use UNKOWN. |
t_ref_position |
Time Frame Position is the place where the time is measured. Standard values are listed in Table ObscoreTimeExtension:tab:positions. If we don’t know use UNKOWN. |
t_uncertainty |
Resolution or uncertainty of the time stamps. |
t_sys_error |
Time Systematic Error to take into account our knowledge of the time frame (scale and position). If time_scale is not known then 100s as DEFAULT value::, if t_scale and t_ref_position are both not known then use 1000s as DEFAULT value. Approximately 100s is good for the time_scale since that is related to changes in the clock in space/earth; 1000s is good if we do not know if times are corrected for the position of the Earth/satellite on its orbit around the Sun since that is approximately twice the time it takes the light to travel the Sun-Earth/satellite distance. |
t_format |
Time representation as JD, MJD, ISO-8601. |
t_offset |
Offset that has been subtracted to the time. Time can be relative to a certain moment, e. g. time after the GRB that happened on date YYYYMMHHMMSS.SS or a random number the authors have subtracted from data to allow higher precision in the time stamps. Its default value is 0.0. |
t_description |
A text briefly describing what is varying with time. Photometric variability in filter V, Radial velocity curve in HJD. This field is aimed to help the reader. |
Common practice is to be specific on the time frame and we suggest to use:
JD(TT;BARYCENTER)
We also give some values that can be used as default in the case that some information is not known and impossible to recover. We minimize the impact of doing this by adding a systematic error to time when those values are unknown.
3 Extension of ObsCore#
ObsCore has a normalized description of the data content along the various physical axes where the data are projected. The spatial properties are described in the s_* group, the spectral ones in em_* group, the temporal ones in t_*, etc. For each data set there is a minimal set of metadata to describe its sky position, spectral band, time interval, etc. which are independent from each other.
This allows to enhance time sampling description by adding new parameters to the time group, in order to warrant backward compatibility to ObsCore 1.1 .
3.1 Extension of ObsCore based on EPNCore#
Astronomy and space science both consider time series data and have proposed metadata data description for it. Some metadata have already been defined and used in the context of data discovery using ObsCore ObsCore, and the remaining ones have been defined in the context of planetary data in the EPNcore specification EPNTAP. In Table ObscoreTimeExtension:tab:obs_epn we show the equivalence between the fields we require here and those existing in ObsCore and EPNcore specifications.
Note: t_resolution in ObsCore needs some clarification and the dataproduct_type labels defined in ObsCore and EPNCore are different currently. That is why dataproduct_type should be enriched in ObsCore, and harmonized with the product type IVOA vocabulary maintained at ivoa.net/rdf/.
3.2 Clarifying the physical content, dimensionality and time dependency of the data set#
ObsCore 1.1 uses the attribute o_ucd to describe what is the quantity observed depending on the various physical axes of the data product. The UCD string corresponding to the observable in a one dimensional dataset is easy to choose in the UCD list. We propose to extend this definition to generalize for time series of multiple dimensional data sets and add a time_variant attribute in ObsCore. In a time series, the principal axis considered is the Time axis. The time variant component can be either one dimensional, like for a light curve or velocity curve, or multi-dimensional. The time series is viewed as time dependent sequence of components, which can be characterized by a data product type, such as an image, a spectrum, a spectral cube, etc., also defined in the product-type vocabulary. Table ObscoreTimeExtension:tab:timevar summarizes the use of time_variant in various cases. This parameter is worth to include in the Time ObsCore extension table. From this metadata, based on the dimensionality and nature of the observed signal, a user application can select to which VO application the data can be forwarded in order to visualize the data.
3.3 Time parameters defined in ObsCore v1.1#
We have seen the data product type helps to search for time sampled data sets. In order to describe properties of the data set along the time axis, we can reuse the axis properties defined in the Characterization data model [CharacterisationDM1.1a]. The idea is to describe how the time stamps are spanned along the time axis, with time duration and cadence.
3.3.1 t_min, t_max#
These parameters provide the bounds of the time coverage for this data set. For a light-curve it is the beginning of the first time sample and the end of the last sample.
3.3.2 t_exptime#
This parameter represents the duration or live time of the observation. For a light-curve it is the sum of all valid time samples. For instance for a time-cube it is the total exposure time summing up all the poses.
3.3.3 t_resolution#
t_resolution can be defined as the time limit under which two observable quantities cannot be distinguished from each other. This works for event-list, light-curve, time-cube data sets, etc.
3.3.4 t_xel, number of time stamps#
This parameter entails the number of observations in the time series. It is important to query for guessing how rich is the dataset, especially for observing variability.
3.4 Time series use cases already covered by ObsCore1.1#
Several uses-cases for time series discoveries were considered in the ObsCore 1.1 specification, built on its short list of time related features. They are available in appendix A in section A.4. Discovering time series. Here the dataproduct_type value is “timeseries”, very general, but the same use cases can be applied for more specific time sampled datasets like “time-cube” or or “light-curve” available now in the product-type vocabulary . ObsCore use cases are also provided in a web page available at : http://saada.unistra.fr/voexamples/show/ObsCore/.
4 Time parameters proposed for ObsCore Extension#
4.1 Time Frame description#
As mentioned in section 2.3 Using a common time frame the Time Frame description used for the data is essential for comparing various time series data sets. This metadata was described first in the STC data model [STC1.3], then in the Coords DM [Coords1.0], and serialized in the VOTABLE format in the TimeSYS element. Up to now, this metadata was not defined in ObsCore1.1. It is coded into the VOTable metadata of the dataset. Having it as part of the query response coming back for a search for time series would help the user application to interpret time stamps precisely.
We propose to add the time frame parameters in the Time ObsCore extension. These various definitions are harmonized in the proposal given in table ObscoreTimeExtension:tab:timereff. We list the corresponding terms used in the Coords Data model and in the UCD vocabulary, as well as the attribute of the TIMESYS param defined for VOTable serialization. All terms are proposed as mandatory, but can be set to UNKNOWN if not available. With the expansion of massive time series datasets, where efficient data discovery will serve the selection of big training sets for analysis workflows, such parameters are highly recommended especially for new data collections.
Values to fill these terms should rely on the terms defined in IVOA vocabularies, namely for time scales and time reference position. As an example Appendix A summarize the definitions listed in previous models like STC.
4.2 Time axis sampling description#
t_delta_min , t_delta_max represent the minimal (resp. maximal) time interval between two time samples.
This concept is covered in the Characterization data model [CharacterisationDM1.1a] and designated as the sampling period along the Time axis. The cadence of the observations in the time series can be assumed from theses parameters.
The TimeAxis ’Sampling Extent’ defined in Characterization DM is the duration of each sample and may vary along the time sequence. During the observation process, it corresponds to an exposure time. If the sampling is not regular the minimal and maximal value described in t_exp_min, t_exp_max give the bounds values of the sampling extent. When the sampling extent is even, all samples have the same duration and t_exp_min, t_exp_max have the same value. When the sampling period, or cadence is even, t_delta_min , t_delta_max have the same value.
In general t_resolution, the minimal distinguishable time interval between two time stamps is much finer than the chosen cadence in the instrument.
4.3 Time axis mode, folding period and phase reference#
Time series may be distributed in two modes, “search mode” or “folded”. The folding allows to improve the SNR and to analyse further the periodicity of the observed phenomenon. For data discovery purpose one parameter may be introduced : t_fold_period, the time duration of the folding. A t_fold_period parameter set to zero means that the time axis is not folded and then indicates the data belongs to “search mode”.
4.3.1 t_fold_period, t_fold_phaseReference#
This metadata gives the length of the folding interval. It is given in the same time units as the time stamps along the sequence. The time origin at which the folding starts is another important metadata and stored in t_fold_phaseReference. It is given as a time stamp within the t_min, t_max interval of the time series before folding. This value is usually chosen according to the transient phenomenon under study, on peak or gap, etc. and cannot be standardized, that is why the time reference in the original curve is more convenient. Both attributes enable to study the periodicity of the signal and compare between various light curves.
5 Extension mechanism in ObsTAP#
ivo://ivoa.net/std/obscore#table-1.1 .This table can also hold more columns corresponding to optional attributes, as summarized in the Table 7 - Optional Parameters of the ObsCore specification. There is no guarantee that an optional parameter will be filled in an ObsTAP service; this must be checked first by the user.
Therefore the Time extension for ObsCore should rely on mandatory parameters. If they cannot be retrieved nor calculated from the data they may be set to UNKNOWN. In order to warn users that extra time parameters have been included in ObsTAP, we propose to gather them in another table named ivoa.time-obscore for services that distribute time sampled data sets. The utype column in ivoa.t_obs should be the standard identifier of this specification, so here ivo://ivoa.net/std/obscore#time-obs-1.0.
If this table contains an identifier for the corresponding dataset described in main ivoa.obscore table, then it is easy to join general ObsCore properties to the time specific ones in an ADQL query. Here is a query example : ( to be checked)
[language=SQL, caption= Query example with a JOIN between the main ObsCore table and the Time extension table]
SELECT obs_id, t_min, t_max, obs_publisher_did, obs_collection, access.reference FROM ivoa.obscore
WHERE dataproduct_type=='light-curve'
AND t_min > 55197
AND t_max < 55204
JOIN ivoa.t-obs as tt
ON obs_publisher_did==tt.obs_publisher_did
WHERE tt.delta_min < 10s AND tt.t_fold == 0
Other examples of queries using these extra parameters are proposed in Appendix B Query examples for join tables.
ivo://ivoa.net/std/obscore1.1, for version 1.1 and containing the different tables : ivoa.obscore, ivoa.time-obscore, ivoa.radio-obscore, ivoa.heig-obscore etc…. when needed.Appendices#
A Appendices#
B Query examples for join tables#
C Previous work on the Time series characterization and description#
.
Very initial draft initiated by D. Tody. https://wiki.ivoa.net/internal/IVOA/LightCurves/STSP.pdf
Table reference for time position : Table 1, p15 in Space-Time Coordinate Metadata for the Virtual Observatory Version 1.33, https://www.ivoa.net/documents/REC/DM/STC-20071030.pdf
Table reference for time scales : Table 2, p17 in Space-Time Coordinate Metadata for the Virtual Observatory Version 1.33
D Vocabulary enhancement#
E Changes from Previous Versions#
First version of this WD. No previous versions yet.
