Data Containers: Product
Product
Product is what links all fdi components together.
Data and Meta Data
- A product has
zero or more datasets: defining well described data entities (say images, tables, spectra etc…).
accompanying meta data – required information such as
the classification of this product,
the creator this product,
when was the product created?
what does the data reflect? (its intended use scope)
and so on;
possible additional meta data specific to that particular product type.
history of this product: how was this data created.
References of relevant products that form a context of this product
History
Product History records how each step of data processing has manipulated the data. Every pipeline add information of input data, auxliary data, calibration data, command line, environment variables to a :class:’fdi.dataset.history.History’ object attached to the product. History can walk up the processing-input chain and visualize the hitory of an example product named “root”
"p1-1" [ref="urn:pools0:fdi.dataset.baseproduct.BaseProduct:1"];
root;
"p1-2" [ref="urn:pools0:fdi.dataset.product.Product:1"];
"p1-2-1" [ref="urn:pools0:fdi.pal.context.Context:0"];
"p1-2-2" [ref="urn:pools0:fdi.pal.context.MapContext:0"];
"p1-2-2-1" [ref="urn:pools0:fdi.dataset.testproducts.TP:0"];
"p1-2-2-1-1" [ref="urn:pools0:fdi.dataset.testproducts.SP:0"];
"p1-1" -> root;
"p1-2" -> root;
"p1-2-1" -> "p1-2";
"p1-2-2" -> "p1-2";
"p1-2-2-1" -> "p1-2-2";
"p1-2-2-1-1" -> "p1-2-2-1";
with a Directed Acyclic Graph like this:
Serializable
In order to transfer data across the network between heterogeneous nodes data needs to be serializable. JSON format is used considering to transfer serialized data for its wide adoption, availability of tools, ease to use with Python, and simplicity.
Product Definition Methodology
Data Products almost always are classified in heirachical orders, reflecting the underlying relation of the data model. Many Products are found to have inheritance relations when comparing their metadata and datasets. Therefore an object-oriented approach is chosen to anlize and define the structure, function, and interface of Products here.
First specify built-in Parameters in in YAML format, which is suitable for reading by both humans and machines. A helper utility yaml2python
to generate test-ready Python code of product class module containing the built-ins.
The YAML schema allows a child Product to inherit metadata definition from one or multiple parent Products. Overriding is also allowed.
BaseProduct
This is the definition file BaseProduct.yml
name: BaseProduct
description: FDI base class data model
parents:
-
schema: '1.6'
metadata:
description:
id_zh_cn: 描述
data_type: string
description: Description of this product
description_zh_cn: 对本产品的描述。
default: UNKNOWN
valid: ''
typecode: B
type:
id_zh_cn: 产品类型
data_type: string
description: Product Type identification. Name of class or CARD.
description_zh_cn: 产品类型。完整Python类名或卡片名。
default: BaseProduct
valid: ''
typecode: B
level:
id_zh_cn: 产品xx
data_type: string
description: Product level.
description_zh_cn: 产品xx
default: ALL
valid: ''
typecode: B
creator:
id_zh_cn: 本产品生成者
data_type: string
description: Generator of this product.
description_zh_cn: 本产品生成方的标识,例如可以是单位、组织、姓名、软件、或特别算法等。
default: UNKNOWN
valid: ''
typecode: B
creationDate:
id_zh_cn: 产品生成时间
fits_keyword: DATE
data_type: finetime
description: Creation date of this product
description_zh_cn: 本产品生成时间
default: 0
valid: ''
typecode:
rootCause:
id_zh_cn: 数据来源
data_type: string
description: Reason of this run of pipeline.
description_zh_cn: 数据来源(此例来自鉴定件热真空罐)
default: UNKNOWN
valid: ''
typecode: B
version:
id_zh_cn: 版本
data_type: string
description: Version of product
description_zh_cn: 产品版本
default: '0.8'
valid: ''
typecode: B
FORMATV:
id_zh_cn: 格式版本
data_type: string
description: Version of product schema and revision
description_zh_cn: 产品格式版本
default: '1.6.0.11'
valid: ''
typecode: B
datasets: {}
The preamble key-value pairs give information about this definition:
- name:
of this product
- description:
– Information about this product
- parents:
– Children products Inherit parent’s metadata
- level:
Applicable Level
- schema:
version of format of this YAML document
From the creation process requires every product to carry the following metadata entries about itself,
- description:
(Also in native language if it is not English.)
- type:
– In software or business domain
- version:
– Products of the same format must be versioned, configuration controlled, and be ready to deal with version differences between inputs , algorithms, software and pipelines.
- FORMATV:
– Version of this document with Schema information, e.g. 1.4.1.2
- creator, rootCause, creationDate:
– Who, why, when, where
The parameters are tabulated below.
name |
value |
unit |
type |
valid |
default |
code |
description |
---|---|---|---|---|---|---|---|
description |
UNKNOWN |
string |
None |
UNKNOWN |
B |
Description of this product |
|
type |
BaseProduct |
string |
None |
BaseProduct |
B |
Product Type identification. Name of class or CARD. |
|
level |
ALL |
string |
None |
ALL |
B |
Product level. |
|
creator |
UNKNOWN |
string |
None |
UNKNOWN |
B |
Generator of this product. |
|
creationDate |
1958-01-01T00:00:0 0.000000 0 |
finetime |
None |
1958-01-01T00:00:0 0.000000 0 |
%Y-% m-%d T%H: %M:% S.%f |
Creation date of this product |
|
rootCause |
UNKNOWN |
string |
None |
UNKNOWN |
B |
Reason of this run of pipeline. |
|
version |
0.8 |
string |
None |
0.8 |
B |
Version of product |
|
FORMATV |
1.6.0.11 |
string |
None |
1.6.0.11 |
B |
Version of product schema and revision |
|
listeners |
<No listener> |
Product Hierachy
Product Levels
Product generation is usually a process with a number of stages. Each stage often produces consumeable prooducts of similar processing “levels”. The processing levels adopted here are:
Level |
Description |
---|---|
L0 |
Data organized in stram of homogeneous packets or transmision frames. |
L1A |
Structured, key-value pairs and arrays, input is packet or frame stream. |
L1B |
Formated according to application domain convention. Translate instrument-specific format to physics standard (e.g. hardware time to Fine Time.) No information from L1A is lost. This is the starting point of most domain-users who demand unspoiled but civilized data. |
L1C |
Binary switch, binary masked, and enumerated metadata quantities are translated to text/numerical mnemonics according to metadata value; domain-specific customary coorrdinates, representation or format are used (e.g. R.A. Dec. instead of quaternion; TT instead of TAI). Some information from L1B is lost. |
Quick-look, or “browse” product can be generated for data products at any level. Their generation is not mandatory.
Although product levels defined this way is useful for grouping, they are too coarse and general for specifying relations between specific products or product groups with respect to other ones preciely. To do that a rigorous hierachical approach is needed.
Examples (from Quick Start page):
>>> # Creation:
... x = Product(description="product example with several datasets",
... instrument="Crystal-Ball", modelName="Mk II")
... x.meta['description'].value # == "product example with several datasets"
'product example with several datasets'
>>> # The 'instrument' and 'modelName' built-in properties show the
... # origin of FDI -- processing data from scientific instruments.
... x.instrument # == "Crystal-Ball"
'Crystal-Ball'
>>> # ways to add datasets
... i0 = 6
... i1 = [[1, 2, 3], [4, 5, i0], [7, 8, 9]]
... i2 = 'ev' # unit
... i3 = 'image1' # description
... image = ArrayDataset(data=i1, unit=i2, description=i3)
... # put the dataset into the product
... x["RawImage"] = image
... # take the data out of the product
... x["RawImage"].data # == [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> # Another syntax to put dataset into a product: set(name, dataset)
... # Different but same function as above.
... # Here no unit or description is given when making ArrayDataset
... x.set('QualityImage', ArrayDataset(
... [[0.1, 0.5, 0.7], [4e3, 6e7, 8], [-2, 0, 3.1]]))
... x["QualityImage"].unit # is None
>>> # add another tabledataset
... s1 = [('col1', [1, 4.4, 5.4E3], 'eV'),
... ('col2', [0, 43.2, 2E3], 'cnt')]
... x["Spectrum"] = TableDataset(data=s1)
... # See the numer and types of existing datasets in the product
... [type(d) for d in x.values()]
[fdi.dataset.arraydataset.ArrayDataset,
fdi.dataset.arraydataset.ArrayDataset,
fdi.dataset.tabledataset.TableDataset]
>>> # mandatory properties are also in metadata
... # test mandatory BaseProduct properties that are also metadata
... a0 = "Me, myself and I"
... x.creator = a0
... x.creator # == a0
'Me, myself and I'
>>> # metada by the same name is also set
... x.meta["creator"].value # == a0
'Me, myself and I'
>>> # change the metadata
... a1 = "or else"
... x.meta["creator"] = Parameter(a1)
... # metada changed
... x.meta["creator"].value # == a1
'or else'
>>> # so was the property
... x.creator # == a1
'or else'
>>> # load some metadata
... m = x.meta
... m['ddetector'] = v['d']
>>> print(x.toString())
=== Product (product example with several datasets) ===
meta= {
============ ==================== ====== ======== ==================== ================= ====== =====================
name value unit type valid default code description
============ ==================== ====== ======== ==================== ================= ====== =====================
description product example with string None UNKNOWN B Description of this p
several datasets roduct
type Product string None Product B Product Type identifi
cation. Name of class
or CARD.
level ALL string None ALL B Product level.
creator or else string None None UNKNOWN
creationDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Creation date of this
000000 00.000000 product
0 0
rootCause UNKNOWN string None UNKNOWN B Reason of this run of
pipeline.
version 0.8 string None 0.8 B Version of product
FORMATV 1.6.0.10 string None 1.6.0.10 B Version of product sc
hema and revision
startDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal start time o
000000 00.000000 f this product.
0 0
endDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal end time of
000000 00.000000 this product.
0 0
instrument Crystal-Ball string None UNKNOWN B Instrument that gener
ated data of this pro
duct
modelName Mk II string None UNKNOWN B Model name of the ins
trument of this produ
ct
mission _AGS string None _AGS B Name of the mission.
ddetector port_1 (0b01) None integer 11000000 0b01: port_ None None valid rules described
stand_by (0b0) 1 with binary masks
normal (0b1) 11000000 0b10: port_
Invalid 2
11000000 0b11: port
closed
00100000 0b0: stand_
by
00100000 0b1: main
00010000 0b0: error
00010000 0b1: normal
00001111 0b0000: res
erved
============ ==================== ====== ======== ==================== ================= ====== =====================
MetaData-listeners = ListnerSet{}},
history= {},
listeners= {ListnerSet{}}
=== History (UNKNOWN) ===
PARAM_HISTORY= {''},
TASK_HISTORY= {''},
meta= {(No Parameter.) MetaData-listeners = ListnerSet{}}
History-datasets =
<ODict >
Product-datasets =
<ODict "RawImage":
=== ArrayDataset (image1) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
shape (3, 3) tuple None () Number of elements in
each dimension. Quic
k changers to the rig
ht.
description image1 string None UNKNOWN B Description of this d
ataset
unit ev string None None B Unit of every element
.
typecode UNKNOWN string None UNKNOWN B Python internal stora
ge code.
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
1 2 3
4 5 6
7 8 9
"QualityImage":
=== ArrayDataset (UNKNOWN) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
shape (3, 3) tuple None () Number of elements in
each dimension. Quic
k changers to the rig
ht.
description UNKNOWN string None UNKNOWN B Description of this d
ataset
unit None string None None B Unit of every element
.
typecode UNKNOWN string None UNKNOWN B Python internal stora
ge code.
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
0.1 0.5 0.7
4000 6e+07 8
-2 0 3.1
"Spectrum":
=== TableDataset (UNKNOWN) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
description UNKNOWN string None UNKNOWN B Description of this d
ataset
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
TableDataset-dataset =
col1 col2
(eV) (cnt)
------ -------
1 0
4.4 43.2
5400 2000
>>>