数据容器:产品
产品
产品是将所有 fdi 组件连接在一起的东西。
数据和元数据
- 一个产品具有
零个或多个数据集:定义详细描述的数据实体(例如图像、表格、光谱等)。
随附的元数据——所需的信息,例如
这个产品的分类,
这个产品的创造者,
产品的创建时间
数据反映了什么?(其预期使用范围)
等等;
该特定产品类型的可能附加的特定元数据。
这个产品的历史:这些数据是如何创建的。
构成这个产品上下文的相关产品的参考
历史
历史记录是一种轻量级的机制,用于记录该产品的起源或对该产品所做的更改。轻量级意味着,产品数据本身不记录更改,但外部各方可以将附加信息附加到反映更改的产品。
产品历史界面的唯一目的是允许流水线任务(由流水线框架定义)记录它们为生成和/或修改产品所做的工作。
可序列化
为了在异构节点之间跨网络传输数据,数据需要可序列化。JSON 格式被用于传输序列化数据,因为它被广泛采用、工具的可用性,并易于在 Python 内使用。
产品定义方法论
数据产品几乎总是按照继承顺序进行分类,反映了数据模型的底层关系。很多产品在对比元数据和数据集时发现有继承关系。因此,这里选择了面向对象的方法来分析和定义产品的结构、功能和接口。
首先以YAML格式指定内置参数,适合人和机器阅读。一个辅助实用程序 yaml2python
,用于生成包含内置插件的产品类模块的测试就绪 Python 代码。
YAML 架构允许子产品从一个或多个父产品继承元数据定义。也允许覆盖。
基础产品
定义文档 BaseProduct.yml
name: BaseProduct
description: FDI base class data model
parents:
-
schema: '1.6'
metadata:
description:
id_zh_cn: 描述
data_type: string
description: Description of this product
description_zh_cn: 对本产品的描述。
default: UNKNOWN
valid: ''
typecode: B
type:
id_zh_cn: 产品类型
data_type: string
description: Product Type identification. Name of class or CARD.
description_zh_cn: 产品类型。完整Python类名或卡片名。
default: BaseProduct
valid: ''
typecode: B
level:
id_zh_cn: 产品xx
data_type: string
description: Product level.
description_zh_cn: 产品xx
default: ALL
valid: ''
typecode: B
creator:
id_zh_cn: 本产品生成者
data_type: string
description: Generator of this product.
description_zh_cn: 本产品生成方的标识,例如可以是单位、组织、姓名、软件、或特别算法等。
default: UNKNOWN
valid: ''
typecode: B
creationDate:
id_zh_cn: 产品生成时间
fits_keyword: DATE
data_type: finetime
description: Creation date of this product
description_zh_cn: 本产品生成时间
default: 0
valid: ''
typecode:
rootCause:
id_zh_cn: 数据来源
data_type: string
description: Reason of this run of pipeline.
description_zh_cn: 数据来源(此例来自鉴定件热真空罐)
default: UNKNOWN
valid: ''
typecode: B
version:
id_zh_cn: 版本
data_type: string
description: Version of product
description_zh_cn: 产品版本
default: '0.8'
valid: ''
typecode: B
FORMATV:
id_zh_cn: 格式版本
data_type: string
description: Version of product schema and revision
description_zh_cn: 产品格式版本
default: '1.6.0.10'
valid: ''
typecode: B
datasets:
序言键值对提供有关此定义的信息:
- name
这个产品的
- description
产品信息
- parents
子产品继承母产品的元数据
- level
适用等级
- schema
此 YAML 文档的格式版本
从创建过程开始,每个产品都需要携带以下关于自身的元数据条目,
- description
(如果不是英语,也可用母语。)
- type
在软件或业务领域
- version
相同格式的产品必须进行版本控制、配置控制,并准备好处理输入、算法、软件和管道之间的版本差异。
- FORMATV
带有架构信息的本文档版本,例如 1.4.1.2
- creator, rootCause, creationDate
谁、为什么、何时、何地
参数如下表所示。
╒══════════════╤════════════════════╤══════╤══════════╤═══════╤════════════════════╤══════╤═══════════════════════════╕
│ name │ value │ unit │ type │ valid │ default │ code │ description │
╞══════════════╪════════════════════╪══════╪══════════╪═══════╪════════════════════╪══════╪═══════════════════════════╡
│ description │ UNKNOWN │ │ string │ None │ UNKNOWN │ B │ Description of this produ │
│ │ │ │ │ │ │ │ ct │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ type │ BaseProduct │ │ string │ None │ BaseProduct │ B │ Product Type identificati │
│ │ │ │ │ │ │ │ on. Name of class or CARD │
│ │ │ │ │ │ │ │ . │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ level │ ALL │ │ string │ None │ ALL │ B │ Product level. │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ creator │ UNKNOWN │ │ string │ None │ UNKNOWN │ B │ Generator of this product │
│ │ │ │ │ │ │ │ . │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ creationDate │ 1958-01-01T00:00:0 │ │ finetime │ None │ 1958-01-01T00:00:0 │ Q │ Creation date of this pro │
│ │ 0.000000 │ │ │ │ 0.000000 │ │ duct │
│ │ 0 │ │ │ │ 0 │ │ │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ rootCause │ UNKNOWN │ │ string │ None │ UNKNOWN │ B │ Reason of this run of pip │
│ │ │ │ │ │ │ │ eline. │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ version │ 0.8 │ │ string │ None │ 0.8 │ B │ Version of product │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ FORMATV │ 1.6.0.10 │ │ string │ None │ 1.6.0.10 │ B │ Version of product schema │
│ │ │ │ │ │ │ │ and revision │
├──────────────┼────────────────────┼──────┼──────────┼───────┼────────────────────┼──────┼───────────────────────────┤
│ listeners │ <No listener> │ │ │ │ │ │ │
╘══════════════╧════════════════════╧══════╧══════════╧═══════╧════════════════════╧══════╧═══════════════════════════╛
示例(来自 快速开始 页面):
>>> # Creation:
... x = Product(description="product example with several datasets",
... instrument="Crystal-Ball", modelName="Mk II")
... x.meta['description'].value # == "product example with several datasets"
'product example with several datasets'
>>> # The 'instrument' and 'modelName' built-in properties show the
... # origin of FDI -- processing data from scientific instruments.
... x.instrument # == "Crystal-Ball"
'Crystal-Ball'
>>> # ways to add datasets
... i0 = 6
... i1 = [[1, 2, 3], [4, 5, i0], [7, 8, 9]]
... i2 = 'ev' # unit
... i3 = 'image1' # description
... image = ArrayDataset(data=i1, unit=i2, description=i3)
... # put the dataset into the product
... x["RawImage"] = image
... # take the data out of the product
... x["RawImage"].data # == [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> # Another syntax to put dataset into a product: set(name, dataset)
... # Different but same function as above.
... # Here no unit or description is given when making ArrayDataset
... x.set('QualityImage', ArrayDataset(
... [[0.1, 0.5, 0.7], [4e3, 6e7, 8], [-2, 0, 3.1]]))
... x["QualityImage"].unit # is None
>>> # add another tabledataset
... s1 = [('col1', [1, 4.4, 5.4E3], 'eV'),
... ('col2', [0, 43.2, 2E3], 'cnt')]
... x["Spectrum"] = TableDataset(data=s1)
... # See the numer and types of existing datasets in the product
... [type(d) for d in x.values()]
[fdi.dataset.arraydataset.ArrayDataset,
fdi.dataset.arraydataset.ArrayDataset,
fdi.dataset.tabledataset.TableDataset]
>>> # mandatory properties are also in metadata
... # test mandatory BaseProduct properties that are also metadata
... a0 = "Me, myself and I"
... x.creator = a0
... x.creator # == a0
'Me, myself and I'
>>> # metada by the same name is also set
... x.meta["creator"].value # == a0
'Me, myself and I'
>>> # change the metadata
... a1 = "or else"
... x.meta["creator"] = Parameter(a1)
... # metada changed
... x.meta["creator"].value # == a1
'or else'
>>> # so was the property
... x.creator # == a1
'or else'
>>> # load some metadata
... m = x.meta
... m['ddetector'] = v['d']
>>> print(x.toString())
=== Product (product example with several datasets) ===
meta= {
============ ==================== ====== ======== ==================== ================= ====== =====================
name value unit type valid default code description
============ ==================== ====== ======== ==================== ================= ====== =====================
description product example with string None UNKNOWN B Description of this p
several datasets roduct
type Product string None Product B Product Type identifi
cation. Name of class
or CARD.
level ALL string None ALL B Product level.
creator or else string None None UNKNOWN
creationDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Creation date of this
000000 00.000000 product
0 0
rootCause UNKNOWN string None UNKNOWN B Reason of this run of
pipeline.
version 0.8 string None 0.8 B Version of product
FORMATV 1.6.0.10 string None 1.6.0.10 B Version of product sc
hema and revision
startDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal start time o
000000 00.000000 f this product.
0 0
endDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal end time of
000000 00.000000 this product.
0 0
instrument Crystal-Ball string None UNKNOWN B Instrument that gener
ated data of this pro
duct
modelName Mk II string None UNKNOWN B Model name of the ins
trument of this produ
ct
mission _AGS string None _AGS B Name of the mission.
ddetector port_1 (0b01) None integer 11000000 0b01: port_ None None valid rules described
stand_by (0b0) 1 with binary masks
normal (0b1) 11000000 0b10: port_
Invalid 2
11000000 0b11: port
closed
00100000 0b0: stand_
by
00100000 0b1: main
00010000 0b0: error
00010000 0b1: normal
00001111 0b0000: res
erved
============ ==================== ====== ======== ==================== ================= ====== =====================
MetaData-listeners = ListnerSet{}},
history= {},
listeners= {ListnerSet{}}
=== History (UNKNOWN) ===
PARAM_HISTORY= {''},
TASK_HISTORY= {''},
meta= {(No Parameter.) MetaData-listeners = ListnerSet{}}
History-datasets =
<ODict >
Product-datasets =
<ODict "RawImage":
=== ArrayDataset (image1) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
shape (3, 3) tuple None () Number of elements in
each dimension. Quic
k changers to the rig
ht.
description image1 string None UNKNOWN B Description of this d
ataset
unit ev string None None B Unit of every element
.
typecode UNKNOWN string None UNKNOWN B Python internal stora
ge code.
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
1 2 3
4 5 6
7 8 9
"QualityImage":
=== ArrayDataset (UNKNOWN) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
shape (3, 3) tuple None () Number of elements in
each dimension. Quic
k changers to the rig
ht.
description UNKNOWN string None UNKNOWN B Description of this d
ataset
unit None string None None B Unit of every element
.
typecode UNKNOWN string None UNKNOWN B Python internal stora
ge code.
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
ArrayDataset-dataset =
0.1 0.5 0.7
4000 6e+07 8
-2 0 3.1
"Spectrum":
=== TableDataset (UNKNOWN) ===
meta= {
=========== ======= ====== ====== ======= ========= ====== =====================
name value unit type valid default code description
=========== ======= ====== ====== ======= ========= ====== =====================
description UNKNOWN string None UNKNOWN B Description of this d
ataset
version 0.1 string None 0.1 B Version of dataset
FORMATV 1.6.0.1 string None 1.6.0.1 B Version of dataset sc
hema and revision
=========== ======= ====== ====== ======= ========= ====== =====================
MetaData-listeners = ListnerSet{}}
TableDataset-dataset =
col1 col2
(eV) (cnt)
------ -------
1 0
4.4 43.2
5400 2000
>>>