Data Containers: Product

Product

Product is what links all fdi components together.

Data and Meta Data

../_images/product.png
A product has
  • zero or more datasets: defining well described data entities (say images, tables, spectra etc…).

  • accompanying meta data – required information such as

    • the classification of this product,

    • the creator this product,

    • when was the product created?

    • what does the data reflect? (its intended use scope)

    • and so on;

    • possible additional meta data specific to that particular product type.

  • history of this product: how was this data created.

  • References of relevant products that form a context of this product

History

Product History records how each step of data processing has manipulated the data. Every pipeline add information of input data, auxliary data, calibration data, command line, environment variables to a :class:’fdi.dataset.history.History’ object attached to the product. History can walk up the processing-input chain and visualize the hitory of an example product named “root”

"p1-1" [ref="urn:pools0:fdi.dataset.baseproduct.BaseProduct:1"];
root;
"p1-2" [ref="urn:pools0:fdi.dataset.product.Product:1"];
"p1-2-1" [ref="urn:pools0:fdi.pal.context.Context:0"];
"p1-2-2" [ref="urn:pools0:fdi.pal.context.MapContext:0"];
"p1-2-2-1" [ref="urn:pools0:fdi.dataset.testproducts.TP:0"];
"p1-2-2-1-1" [ref="urn:pools0:fdi.dataset.testproducts.SP:0"];
"p1-1" -> root;
"p1-2" -> root;
"p1-2-1" -> "p1-2";
"p1-2-2" -> "p1-2";
"p1-2-2-1" -> "p1-2-2";
"p1-2-2-1-1" -> "p1-2-2-1";

with a Directed Acyclic Graph like this:

../_images/history.svg

Serializable

In order to transfer data across the network between heterogeneous nodes data needs to be serializable. JSON format is used considering to transfer serialized data for its wide adoption, availability of tools, ease to use with Python, and simplicity.

Product Definition Methodology

Data Products almost always are classified in heirachical orders, reflecting the underlying relation of the data model. Many Products are found to have inheritance relations when comparing their metadata and datasets. Therefore an object-oriented approach is chosen to anlize and define the structure, function, and interface of Products here.

First specify built-in Parameters in in YAML format, which is suitable for reading by both humans and machines. A helper utility yaml2python to generate test-ready Python code of product class module containing the built-ins.

The YAML schema allows a child Product to inherit metadata definition from one or multiple parent Products. Overriding is also allowed.

BaseProduct

This is the definition file BaseProduct.yml

name: BaseProduct
description: FDI base class data model
parents:
  -
schema: '1.6'
metadata:
    description:
        id_zh_cn: 描述
        data_type: string
        description: Description of this product
        description_zh_cn: 对本产品的描述。
        default: UNKNOWN
        valid: ''
        typecode: B
    type:
        id_zh_cn: 产品类型
        data_type: string
        description: Product Type identification. Name of class or CARD.
        description_zh_cn: 产品类型。完整Python类名或卡片名。
        default: BaseProduct
        valid: ''
        typecode: B
    level:
        id_zh_cn: 产品xx
        data_type: string
        description: Product level.
        description_zh_cn: 产品xx
        default: ALL
        valid: ''
        typecode: B
    creator:
        id_zh_cn: 本产品生成者
        data_type: string
        description: Generator of this product.
        description_zh_cn: 本产品生成方的标识,例如可以是单位、组织、姓名、软件、或特别算法等。
        default: UNKNOWN
        valid: ''
        typecode: B
    creationDate:
        id_zh_cn: 产品生成时间
        fits_keyword: DATE
        data_type: finetime
        description: Creation date of this product
        description_zh_cn: 本产品生成时间
        default: 0
        valid: ''
        typecode:
    rootCause:
        id_zh_cn: 数据来源
        data_type: string
        description: Reason of this run of pipeline.
        description_zh_cn: 数据来源(此例来自鉴定件热真空罐)
        default: UNKNOWN
        valid: ''
        typecode: B
    version:
        id_zh_cn: 版本
        data_type: string
        description: Version of product
        description_zh_cn: 产品版本
        default: '0.8'
        valid: ''
        typecode: B
    FORMATV:
        id_zh_cn: 格式版本
        data_type: string
        description: Version of product schema and revision
        description_zh_cn: 产品格式版本
        default: '1.6.0.11'
        valid: ''
        typecode: B
datasets: {}

The preamble key-value pairs give information about this definition:

name:

of this product

description:

– Information about this product

parents:

– Children products Inherit parent’s metadata

level:

Applicable Level

schema:

version of format of this YAML document

From the creation process requires every product to carry the following metadata entries about itself,

description:

(Also in native language if it is not English.)

type:

– In software or business domain

version:

– Products of the same format must be versioned, configuration controlled, and be ready to deal with version differences between inputs , algorithms, software and pipelines.

FORMATV:

– Version of this document with Schema information, e.g. 1.4.1.2

creator, rootCause, creationDate:

– Who, why, when, where

The parameters are tabulated below.

name

value

unit

type

valid

default

code

description

description

UNKNOWN

string

None

UNKNOWN

B

Description of this product

type

BaseProduct

string

None

BaseProduct

B

Product Type identification. Name of class or CARD.

level

ALL

string

None

ALL

B

Product level.

creator

UNKNOWN

string

None

UNKNOWN

B

Generator of this product.

creationDate

1958-01-01T00:00:0 0.000000 0

finetime

None

1958-01-01T00:00:0 0.000000 0

%Y-% m-%d T%H: %M:% S.%f

Creation date of this product

rootCause

UNKNOWN

string

None

UNKNOWN

B

Reason of this run of pipeline.

version

0.8

string

None

0.8

B

Version of product

FORMATV

1.6.0.11

string

None

1.6.0.11

B

Version of product schema and revision

listeners

<No listener>

Product Hierachy

Product Levels

Product generation is usually a process with a number of stages. Each stage often produces consumeable prooducts of similar processing “levels”. The processing levels adopted here are:

Level

Description

L0

Data organized in stram of homogeneous packets or transmision frames.

L1A

Structured, key-value pairs and arrays, input is packet or frame stream.

L1B

Formated according to application domain convention. Translate instrument-specific format to physics standard (e.g. hardware time to Fine Time.) No information from L1A is lost. This is the starting point of most domain-users who demand unspoiled but civilized data.

L1C

Binary switch, binary masked, and enumerated metadata quantities are translated to text/numerical mnemonics according to metadata value; domain-specific customary coorrdinates, representation or format are used (e.g. R.A. Dec. instead of quaternion; TT instead of TAI). Some information from L1B is lost.

Quick-look, or “browse” product can be generated for data products at any level. Their generation is not mandatory.

Although product levels defined this way is useful for grouping, they are too coarse and general for specifying relations between specific products or product groups with respect to other ones preciely. To do that a rigorous hierachical approach is needed.

Examples (from Quick Start page):


>>> # Creation:
... x = Product(description="product example with several datasets",
...             instrument="Crystal-Ball", modelName="Mk II")
... x.meta['description'].value  # == "product example with several datasets"
'product example with several datasets'
>>> # The 'instrument' and 'modelName' built-in properties show the
... # origin of FDI -- processing data from scientific instruments.
... x.instrument  # == "Crystal-Ball"
'Crystal-Ball'
>>> # ways to add datasets
... i0 = 6
... i1 = [[1, 2, 3], [4, 5, i0], [7, 8, 9]]
... i2 = 'ev'                 # unit
... i3 = 'image1'     # description
... image = ArrayDataset(data=i1, unit=i2, description=i3)
... # put the dataset into the product
... x["RawImage"] = image
... # take the data out of the product
... x["RawImage"].data  # == [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> # Another syntax to put dataset into a product: set(name, dataset)
... # Different but same function as above.
... # Here no unit or description is given when making ArrayDataset
... x.set('QualityImage', ArrayDataset(
...     [[0.1, 0.5, 0.7], [4e3, 6e7, 8], [-2, 0, 3.1]]))
... x["QualityImage"].unit  # is None
>>> # add another tabledataset
... s1 = [('col1', [1, 4.4, 5.4E3], 'eV'),
...       ('col2', [0, 43.2, 2E3], 'cnt')]
... x["Spectrum"] = TableDataset(data=s1)
... # See the numer and types of existing datasets in the product
... [type(d) for d in x.values()]
[fdi.dataset.arraydataset.ArrayDataset,
 fdi.dataset.arraydataset.ArrayDataset,
 fdi.dataset.tabledataset.TableDataset]
>>> # mandatory properties are also in metadata
... # test mandatory BaseProduct properties that are also metadata
... a0 = "Me, myself and I"
... x.creator = a0
... x.creator   # == a0
'Me, myself and I'
>>> # metada by the same name is also set
... x.meta["creator"].value   # == a0
'Me, myself and I'
>>> # change the metadata
... a1 = "or else"
... x.meta["creator"] = Parameter(a1)
... # metada changed
... x.meta["creator"].value   # == a1
'or else'
>>> # so was the property
... x.creator   # == a1
'or else'
>>> # load some metadata
... m = x.meta
... m['ddetector'] = v['d']
>>> print(x.toString())
=== Product (product example with several datasets) ===
meta= {
============  ====================  ======  ========  ====================  =================  ======  =====================
name          value                 unit    type      valid                 default            code    description
============  ====================  ======  ========  ====================  =================  ======  =====================
description   product example with          string    None                  UNKNOWN            B       Description of this p
               several datasets                                                                        roduct
type          Product                       string    None                  Product            B       Product Type identifi
                                                                                                       cation. Name of class
                                                                                                        or CARD.
level         ALL                           string    None                  ALL                B       Product level.
creator       or else                       string    None                  None                       UNKNOWN
creationDate  1958-01-01T00:00:00.          finetime  None                  1958-01-01T00:00:  Q       Creation date of this
              000000                                                        00.000000                   product
              0                                                             0
rootCause     UNKNOWN                       string    None                  UNKNOWN            B       Reason of this run of
                                                                                                        pipeline.
version       0.8                           string    None                  0.8                B       Version of product
FORMATV       1.6.0.10                      string    None                  1.6.0.10           B       Version of product sc
                                                                                                       hema and revision
startDate     1958-01-01T00:00:00.          finetime  None                  1958-01-01T00:00:  Q       Nominal start time  o
              000000                                                        00.000000                  f this product.
              0                                                             0
endDate       1958-01-01T00:00:00.          finetime  None                  1958-01-01T00:00:  Q       Nominal end time  of
              000000                                                        00.000000                  this product.
              0                                                             0
instrument    Crystal-Ball                  string    None                  UNKNOWN            B       Instrument that gener
                                                                                                       ated data of this pro
                                                                                                       duct
modelName     Mk II                         string    None                  UNKNOWN            B       Model name of the ins
                                                                                                       trument of this produ
                                                                                                       ct
mission       _AGS                          string    None                  _AGS               B       Name of the mission.
ddetector     port_1 (0b01)         None    integer   11000000 0b01: port_  None               None    valid rules described
              stand_by (0b0)                          1                                                 with binary masks
              normal (0b1)                            11000000 0b10: port_
              Invalid                                 2
                                                      11000000 0b11: port
                                                      closed
                                                      00100000 0b0: stand_
                                                      by
                                                      00100000 0b1: main
                                                      00010000 0b0: error
                                                      00010000 0b1: normal
                                                      00001111 0b0000: res
                                                      erved
============  ====================  ======  ========  ====================  =================  ======  =====================
MetaData-listeners = ListnerSet{}},
history= {},
listeners= {ListnerSet{}}

=== History (UNKNOWN) ===
PARAM_HISTORY= {''},
TASK_HISTORY= {''},
meta= {(No Parameter.) MetaData-listeners = ListnerSet{}}

History-datasets =
<ODict >

Product-datasets =
<ODict "RawImage":
=== ArrayDataset (image1) ===
meta= {
===========  =======  ======  ======  =======  =========  ======  =====================
name         value    unit    type    valid    default    code    description
===========  =======  ======  ======  =======  =========  ======  =====================
shape        (3, 3)           tuple   None     ()                 Number of elements in
                                                                   each dimension. Quic
                                                                  k changers to the rig
                                                                  ht.
description  image1           string  None     UNKNOWN    B       Description of this d
                                                                  ataset
unit         ev               string  None     None       B       Unit of every element
                                                                  .
typecode     UNKNOWN          string  None     UNKNOWN    B       Python internal stora
                                                                  ge code.
version      0.1              string  None     0.1        B       Version of dataset
FORMATV      1.6.0.1          string  None     1.6.0.1    B       Version of dataset sc
                                                                  hema and revision
===========  =======  ======  ======  =======  =========  ======  =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
1  2  3
4  5  6
7  8  9


"QualityImage":
=== ArrayDataset (UNKNOWN) ===
meta= {
===========  =======  ======  ======  =======  =========  ======  =====================
name         value    unit    type    valid    default    code    description
===========  =======  ======  ======  =======  =========  ======  =====================
shape        (3, 3)           tuple   None     ()                 Number of elements in
                                                                   each dimension. Quic
                                                                  k changers to the rig
                                                                  ht.
description  UNKNOWN          string  None     UNKNOWN    B       Description of this d
                                                                  ataset
unit         None             string  None     None       B       Unit of every element
                                                                  .
typecode     UNKNOWN          string  None     UNKNOWN    B       Python internal stora
                                                                  ge code.
version      0.1              string  None     0.1        B       Version of dataset
FORMATV      1.6.0.1          string  None     1.6.0.1    B       Version of dataset sc
                                                                  hema and revision
===========  =======  ======  ======  =======  =========  ======  =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
   0.1  0.5    0.7
4000    6e+07  8
  -2    0      3.1


"Spectrum":
=== TableDataset (UNKNOWN) ===
meta= {
===========  =======  ======  ======  =======  =========  ======  =====================
name         value    unit    type    valid    default    code    description
===========  =======  ======  ======  =======  =========  ======  =====================
description  UNKNOWN          string  None     UNKNOWN    B       Description of this d
                                                                  ataset
version      0.1              string  None     0.1        B       Version of dataset
FORMATV      1.6.0.1          string  None     1.6.0.1    B       Version of dataset sc
                                                                  hema and revision
===========  =======  ======  ======  =======  =========  ======  =====================
MetaData-listeners = ListnerSet{}}
TableDataset-dataset =
  col1     col2
  (eV)    (cnt)
------  -------
   1        0
   4.4     43.2
5400     2000
>>>