PAL: Product Access Layer

Product Access Layer allows data stored logical “pools” to be accessed with light weight product refernces by data processers, data storage, and data consumers. A data product can include a context built with references of relevant data. A ProductStorage interface is provided to handle saving/retrieving/querying data in registered pools.

Rationale

In a data processing pipeline or network of processing nodes, data products are generated within a context which may include input data, reference data, and auxiliary data of many kind. It is often needed to have relevant context recorded with a product. However the context could have a large size so including their actual data as metadata of the product is often impractical.

Once FDI data are generated they can have a reference through which they can be accessed. The size of such references are typically less than a few hundred bytes, like a URL. In the product context only data references are recorded.

This package provides MapContext, ProductRef, Urn, ProductStorage, ProductPool, and Query classes (simplified but mostly API-compatible with Herschel Common Science System v15.0) for the storing, retrieving, tagging, and context creating of data product modeled in the dataset package.

Definitions

URN

Note

The following is from Urn

The Universial Resource Name (URN, https://datatracker.ietf.org/doc/html/rfc2141 ) string has this format:

urn:<poolname>:<resourcetype>:<serialnumber>

with modified rules desribed below.

<poolname>:

Also called poolID. It consists of 1-32 characters, is case-sensitive, which deviates from rfc2141. Character allowed are alpha, digit, safe, defined in rfc1630 (https://datatracker.ietf.org/doc/html/rfc1630). These are excluded: `` , ``%, ?, !, *,``’, ``", (, ), =, /, and what listed in mod:poolmanager:Invalid_Pool_Names, e.g. pools, urn, URN, api.

<resourcetype>:

type name of the data item (usually class name of data products inheriting BaseProduct)

<serialnumber>:

internal index for a certain <resourcetype>.

The poolname in a URN is a label. Some examples:

URNs are used to to identify data be cause URNs are location agnostic. Storage Pools (subclasses of ProductPool) are where data item reside. The PoolURL is used to give practical information of a pool, such as a poolname, its location, and its access scheme. PoolURL is designed to be a local set-up detail that is supposed to be hidden from pool users. Data processing software use ``URN``s to refer to products, without specifying pool location. The poolID in a URN could be a LocalPool on the development laptop and a HTTPClientPool on the production cloud.

Note

The following is from parse_poolurl()

The PoolURL format is in the form of a URL that preceeds its poolname part:

<scheme>://<place><poolpath>/<poolname>

<scheme>:

Implementation protocol including file for LocalPool, mem for MemPool, http, https for HttpclientPool.

<place>:

IP:port such as``192.168.5.6:8080`` for http and https schemes, or an empty string for file and mem schemes.

<poolname>:

same as in URN.

<poolpath>:

The part between place and an optional poolhint:

<username>:

<password>:

  • For file or server schemes, e.g. poolpath is /c:/tmp in http://localhost:9000/c:/tmp/mypool/ with poolhint keyword arguement of parse_poolurl() not given, or given as mypool (or myp or my …).

  • For http and https schemes, it is e.g. /0.6/tmp in https://10.0.0.114:5000/v0.6/tmp/mypool with poolhint keyword arguement not given, or given as mypool (or myp` or 'my' ...). The meaning of poolpath is subject to interpretation by the  server. In the preceeding example the poolpath has an API version.  :meth:`ProductPool.transformpath` is used to map it further. Note that trailing blank and ``/ are ignored, and stripped in the output.

Examples:

ProductRef

This class not only holds the URN of the product it references to, but also records who ( the _parents_) are keeping this reference.

ProductStorage

A centralized access place for saving/loading/querying/deleting data organized in conceptual pools. One gets a ProductRef when saving data.

ProductPool

An place where products can be saved, with a reference for the saved product generated. The product can be retrieved with the reference. Pools based on different media or networking mechanism can be implemented. Multiple pools can be registered in a ProductStorage front-end where users can do the saving, loading, querying etc. so that the pools are collectively form a larger logical storage.

The reference LocalPool is shown in the following YAML-like schematic:

Pool:!!dict
       _classes:!!odict
           product0_class_name:!!dict
                   currentSN:!!int #the serial number of the latest added prod to the pool
                          sn:!!list
                              - serial number of a prod
                              - serial number of a prod
                              - ...
           product1_class_name:
           ...
       _urns:!!odict
           urn0:!!odict
                   meta:!!MetaData #prod.meta
                   tags:!!list
                         - $tag
                         - $tag
                         - ...
           urn1:!!odict
           ...
       _tags:!!odict
           tag0:!!odict
                   urns:!!list
                        - $urn
                        - $urn
                        - ...
           tag1:!!odict
           ...

       urn0:!!serialized product
       urn1:!!serialized product
       ...

Examples (from Quick Start page):


This section shows how to make/get hold of a pool.

>>> # Create a product and a productStorage with a pool registered
... # First disable debugging messages
... logger = logging.getLogger('')
... logger.setLevel(logging.WARNING)
... # a pool (LocalPool) for demonstration will be create here
... demopoolname = 'demopool_' + getpass.getuser()
... demopoolpath = '/tmp/' + demopoolname
... demopoolurl = 'file://' + demopoolpath
... # clean possible data left from previous runs
... os.system('rm -rf ' + demopoolpath)
... if PoolManager.isLoaded(DEFAULT_MEM_POOL):
...     PoolManager.getPool(DEFAULT_MEM_POOL).removeAll()
... PoolManager.getPool(demopoolname, demopoolurl).removeAll()
0

Saving a Product

This section shows how to store a product in a “pool” and get a reference back.

>>> # create a prooduct and save it to a pool
... x = Product(description='save me in store')
... # add a tabledataset
... s1 = [('energy', [1, 4.4, 5.6], 'eV'), ('freq', [0, 43.2, 2E3], 'Hz')]
... x["Spectrum"] = TableDataset(data=s1)
... # create a product store
... pstore = ProductStorage(poolurl=demopoolurl)
... # see what is in it.
... pstore
ProductStorage( pool= {'demopool_mh': <LocalPool poolname=demopool_mh, poolurl=file:///tmp/demopool_mh, _classes={}, _urns={}, _tags={}>} )
>>> # save the product and get a reference back.
... prodref = pstore.save(x)
... # This gives detailed information of the product being referenced
... print(prodref)
ProductRef {urn:demopool_mh:fdi.dataset.product.Product:0
# Parents=[]
# meta=
============  ====================  ======  ========  =======  =================  ======  =====================
name          value                 unit    type      valid    default            code    description
============  ====================  ======  ========  =======  =================  ======  =====================
description   save me in store              string    None     UNKNOWN            B       Description of this p
                                                                                          roduct
type          Product                       string    None     Product            B       Product Type identifi
                                                                                          cation. Name of class
                                                                                           or CARD.
level         ALL                           string    None     ALL                B       Product level.
creator       UNKNOWN                       string    None     UNKNOWN            B       Generator of this pro
                                                                                          duct.
creationDate  1958-01-01T00:00:00.          finetime  None     1958-01-01T00:00:  Q       Creation date of this
              000000                                           00.000000                   product
              0                                                0
rootCause     UNKNOWN                       string    None     UNKNOWN            B       Reason of this run of
                                                                                           pipeline.
version       0.8                           string    None     0.8                B       Version of product
FORMATV       1.6.0.10                      string    None     1.6.0.10           B       Version of product sc
                                                                                          hema and revision
startDate     1958-01-01T00:00:00.          finetime  None     1958-01-01T00:00:  Q       Nominal start time  o
              000000                                           00.000000                  f this product.
              0                                                0
endDate       1958-01-01T00:00:00.          finetime  None     1958-01-01T00:00:  Q       Nominal end time  of
              000000                                           00.000000                  this product.
              0                                                0
instrument    UNKNOWN                       string    None     UNKNOWN            B       Instrument that gener
                                                                                          ated data of this pro
                                                                                          duct
modelName     UNKNOWN                       string    None     UNKNOWN            B       Model name of the ins
                                                                                          trument of this produ
                                                                                          ct
mission       _AGS                          string    None     _AGS               B       Name of the mission.
============  ====================  ======  ========  =======  =================  ======  =====================
MetaData-listeners = ListnerSet{}}
>>> # get the URN string
... urn = prodref.urn
... print(urn)    # urn:demopool_mh:fdi.dataset.product.Product:0
urn:demopool_mh:fdi.dataset.product.Product:0
>>> # re-create a product only using the urn
... newp = ProductRef(urn).product
... # the new and the old one are equal
... print(newp == x)   # == True
True

Context and MapContext

Context is a Product that holds a set of ProductRef s that accessible by keys. The keys are strings for MapContext which usually maps names to product references.

Examples (from Quick Start page):


This section shows essential steps how product references can be stored in a context.

>>> p1 = Product(description='p1')
... p2 = Product(description='p2')
... # create an empty mapcontext that can carry references with name labels
... map1 = MapContext(description='product with refs 1')
... # A ProductRef created with the syntax of a lone product argument will use a MemPool
... pref1 = ProductRef(p1)
... pref1
ProductRef(urnobj=Urn(urn="urn:defaultmem:fdi.dataset.product.Product:0", _STID="Urn"), _STID="ProductRef")
>>> # A productStorage with a LocalPool -- a pool on the disk.
... pref2 = pstore.save(p2)
... pref2.urn
'urn:demopool_mh:fdi.dataset.product.Product:1'
>>> # how many prodrefs do we have?
... map1['refs'].size()   # == 0
0
>>> # how many 'parents' do these prodrefs have before saved?
... len(pref1.parents)   # == 0
0
>>> len(pref2.parents)   # == 0
0
>>> # add a ref to the context. Every productref has a name in a MapContext
... map1['refs']['spam'] = pref1
... # add the second one
... map1['refs']['egg'] = pref2
... # how many prodrefs do we have?
... map1['refs'].size()   # == 2
2
>>> # parent list of the productref object now has an entry
... len(pref2.parents)   # == 1
1
>>> pref2.parents[0] == map1
True
>>> pref1.parents[0] == map1
True
>>> # remove a ref
... del map1['refs']['spam']
... map1.refs.size()   # == 1
1
>>> # how many prodrefs do we have?
... len(pref1.parents)   # == 0
0
>>> # add ref2 to another map
... map2 = MapContext(description='product with refs 2')
... map2.refs['also2'] = pref2
... map2['refs'].size()   # == 1
1
>>> # two parents
... len(pref2.parents)   # == 2
2
>>> pref2.parents[1] == map2
True

Query

One can make queries to a ProductStorage and get back a list of references to products that satisfy search chriteria. Queries can be constructed using Python predicate expressions about a product and its metadata, or a function that returns True or False.

Examples (from Quick Start page):


A ProductStorage with pools attached can be queried with tags, properties stored in metadata, or even data in the stored products, using Python syntax.

>>> # clean possible data left from previous runs
... poolname = 'fdi_pool_' + getpass.getuser()
... poolpath = '/tmp/' + poolname
... newpoolname = 'fdi_newpool_' + getpass.getuser()
... newpoolpath = '/tmp/' + newpoolname
... os.system('rm -rf ' + poolpath)
... os.system('rm -rf ' + newpoolpath)
... poolurl = 'file://' + poolpath
... newpoolurl = 'file://' + newpoolpath
... if PoolManager.isLoaded(DEFAULT_MEM_POOL):
...     PoolManager.getPool(DEFAULT_MEM_POOL).removeAll()
... PoolManager.getPool(poolname, poolurl).removeAll()
... PoolManager.getPool(newpoolname, newpoolurl).removeAll()
... # make a productStorage
... pstore = ProductStorage(poolurl=poolurl)
... # make another
... pstore2 = ProductStorage(poolurl=newpoolurl)
>>> # add some products to both storages. The product properties are different.
... n = 7
... for i in range(n):
...     # three counters for properties to be queried.
...     a0, a1, a2 = 'desc %d' % i, 'fatman %d' % (i*4), 5000+i
...     if i < 3:
...         # Product type
...         x = Product(description=a0, creator=a1)
...         x.meta['extra'] = Parameter(value=a2)
...     elif i < 5:
... ...
...         x.meta['time'] = Parameter(value=FineTime1(a2))
...     if i < 4:
...         # some are stored in one pool
...         r = pstore.save(x)
...     else:
...         # some the other
...         r = pstore2.save(x)
...     print(r.urn)
... # Two pools, 7 products in 3 types
... # [P P P C] [C M M]
urn:fdi_pool_mh:fdi.dataset.product.Product:0
urn:fdi_pool_mh:fdi.dataset.product.Product:1
urn:fdi_pool_mh:fdi.dataset.product.Product:2
urn:fdi_pool_mh:fdi.pal.context.Context:0
urn:fdi_newpool_mh:fdi.pal.context.Context:0
urn:fdi_newpool_mh:fdi.pal.context.MapContext:0
urn:fdi_newpool_mh:fdi.pal.context.MapContext:1
>>> # register the new pool above to the  1st productStorage
... pstore.register(newpoolname)
... len(pstore.getPools())   # == 2
2
>>> # make a query on product metadata, which is the variable 'm'
... # in the query expression, i.e. ``m = product.meta; ...``
... # But '5000 < m["extra"]' does not work. see tests/test.py.
... q = MetaQuery(Product, 'm["extra"] > 5000 and m["extra"] <= 5005')
... # search all pools registered on pstore
... res = pstore.select(q)
... # we expect [#2, #3] Contex is not a subclass of Product, which is being searched
... len(res)   # == 2
2
>>> # see
... [r.product.description for r in res]
['desc 1', 'desc 2']
>>> def t(m):
...     # query is a function
...     import re
...     # 'creator' matches the regex pattern: 'n' + ? + '1'
...     return re.match('.*n.1.*', m['creator'].value)
>>> q = MetaQuery(BaseProduct, t)
... res = pstore.select(q)
... # expecting [3,4]
... [r.product.creator for r in res]
['fatman 12', 'fatman 16']
>>>

run tests

To test PAL functionalities based on local (JSON) pool and memory pool, run in the same directory:

make test2

To test functionalities based on http client pool, in one terminal run

make runpoolserver

run

make testhttp

examine output.

Design

Packages

../_images/packages_pal.png

Classes

../_images/classes_pal.png