PAL: Product Access Layer
Product Access Layer allows data stored logical “pools” to be accessed with light weight product refernces by data processers, data storage, and data consumers. A data product can include a context built with references of relevant data. A ProductStorage
interface is provided to handle saving/retrieving/querying data in registered pools.
Rationale
In a data processing pipeline or network of processing nodes, data products are generated within a context which may include input data, reference data, and auxiliary data of many kind. It is often needed to have relevant context recorded with a product. However the context could have a large size so including their actual data as metadata of the product is often impractical.
Once FDI data are generated they can have a reference through which they can be accessed. The size of such references are typically less than a few hundred bytes, like a URL. In the product context only data references are recorded.
This package provides MapContext
, ProductRef
, Urn
, ProductStorage
, ProductPool
, and Query
classes (simplified but mostly API-compatible with Herschel Common Science System v15.0) for the storing, retrieving, tagging, and context creating of data product modeled in the dataset package.
Definitions
URN
Note
The following is from Urn
The Universial Resource Name (URN, https://datatracker.ietf.org/doc/html/rfc2141 ) string has this format:
urn:<poolname>:<resourcetype>:<serialnumber>
with modified rules desribed below.
- <poolname>:
Also called poolID. It consists of 1-32 characters, is case-sensitive, which deviates from rfc2141. Character allowed are
alpha
,digit
,safe
, defined in rfc1630 (https://datatracker.ietf.org/doc/html/rfc1630). These are excluded: ``, ``%
,?
,!
,*
,``’, ``"
,(
,)
,=
,/
, and what listed inmod:poolmanager:Invalid_Pool_Names
, e.g.pools
,urn
,URN
,api
.- <resourcetype>:
type name of the data item (usually class name of data products inheriting
BaseProduct
)- <serialnumber>:
internal index for a certain <resourcetype>.
The poolname
in a URN is a label. Some examples:
URNs are used to to identify data be cause URNs are location agnostic. Storage Pools (subclasses of ProductPool
) are where data item reside. The PoolURL is used to give practical information of a pool, such as a poolname, its location, and its access scheme. PoolURL is designed to be a local set-up detail that is supposed to be hidden from pool users. Data processing software use ``URN``s to refer to products, without specifying pool location. The poolID in a URN could be a LocalPool
on the development laptop and a HTTPClientPool
on the production cloud.
Note
The following is from parse_poolurl()
The PoolURL
format is in the form of a URL that preceeds its poolname part:
<scheme>://<place><poolpath>/<poolname>
- <scheme>:
Implementation protocol including
file
forLocalPool
,mem
forMemPool
,http
,https
forHttpclientPool
.- <place>:
IP:port such as``192.168.5.6:8080`` for
http
andhttps
schemes, or an empty string forfile
andmem
schemes.- <poolname>:
same as in URN.
- <poolpath>:
The part between
place
and an optionalpoolhint
:- <username>:
- <password>:
For
file
orserver
schemes, e.g. poolpath is/c:/tmp
inhttp://localhost:9000/c:/tmp/mypool/
withpoolhint
keyword arguement ofparse_poolurl()
not given, or given asmypool
(ormyp
ormy
…).For
http
andhttps
schemes, it is e.g./0.6/tmp
inhttps://10.0.0.114:5000/v0.6/tmp/mypool
withpoolhint
keyword arguement not given, or given asmypool
(ormyp` or 'my' ...). The meaning of poolpath is subject to interpretation by the server. In the preceeding example the poolpath has an API version. :meth:`ProductPool.transformpath` is used to map it further. Note that trailing blank and ``/
are ignored, and stripped in the output.
Examples:
file:///tmp/mydata for pool
`mydata`
file:///d:/data/test2–v2 for pool
test2--v2
mem:///dummy for pool
dummy
https://10.0.0.114:5000/v0.6/obs for a httpclientpool
obs
server:///tmp/data/0.4/test for a pool
test
used on a server.
ProductRef
This class not only holds the URN of the product it references to, but also records who ( the _parents_) are keeping this reference.
ProductStorage
A centralized access place for saving/loading/querying/deleting data organized in conceptual pools. One gets a ProductRef when saving data.
ProductPool
An place where products can be saved, with a reference for the saved product generated. The product can be retrieved with the reference. Pools based on different media or networking mechanism can be implemented. Multiple pools can be registered in a ProductStorage front-end where users can do the saving, loading, querying etc. so that the pools are collectively form a larger logical storage.
The reference LocalPool is shown in the following YAML-like schematic:
Pool:!!dict
_classes:!!odict
product0_class_name:!!dict
currentSN:!!int #the serial number of the latest added prod to the pool
sn:!!list
- serial number of a prod
- serial number of a prod
- ...
product1_class_name:
...
_urns:!!odict
urn0:!!odict
meta:!!MetaData #prod.meta
tags:!!list
- $tag
- $tag
- ...
urn1:!!odict
...
_tags:!!odict
tag0:!!odict
urns:!!list
- $urn
- $urn
- ...
tag1:!!odict
...
urn0:!!serialized product
urn1:!!serialized product
...
Examples (from Quick Start page):
This section shows how to make/get hold of a pool.
>>> # Create a product and a productStorage with a pool registered
... # First disable debugging messages
... logger = logging.getLogger('')
... logger.setLevel(logging.WARNING)
... # a pool (LocalPool) for demonstration will be create here
... demopoolname = 'demopool_' + getpass.getuser()
... demopoolpath = '/tmp/' + demopoolname
... demopoolurl = 'file://' + demopoolpath
... # clean possible data left from previous runs
... os.system('rm -rf ' + demopoolpath)
... if PoolManager.isLoaded(DEFAULT_MEM_POOL):
... PoolManager.getPool(DEFAULT_MEM_POOL).removeAll()
... PoolManager.getPool(demopoolname, demopoolurl).removeAll()
0
Saving a Product
This section shows how to store a product in a “pool” and get a reference back.
>>> # create a prooduct and save it to a pool
... x = Product(description='save me in store')
... # add a tabledataset
... s1 = [('energy', [1, 4.4, 5.6], 'eV'), ('freq', [0, 43.2, 2E3], 'Hz')]
... x["Spectrum"] = TableDataset(data=s1)
... # create a product store
... pstore = ProductStorage(poolurl=demopoolurl)
... # see what is in it.
... pstore
ProductStorage( pool= {'demopool_mh': <LocalPool poolname=demopool_mh, poolurl=file:///tmp/demopool_mh, _classes={}, _urns={}, _tags={}>} )
>>> # save the product and get a reference back.
... prodref = pstore.save(x)
... # This gives detailed information of the product being referenced
... print(prodref)
ProductRef {urn:demopool_mh:fdi.dataset.product.Product:0
# Parents=[]
# meta=
============ ==================== ====== ======== ======= ================= ====== =====================
name value unit type valid default code description
============ ==================== ====== ======== ======= ================= ====== =====================
description save me in store string None UNKNOWN B Description of this p
roduct
type Product string None Product B Product Type identifi
cation. Name of class
or CARD.
level ALL string None ALL B Product level.
creator UNKNOWN string None UNKNOWN B Generator of this pro
duct.
creationDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Creation date of this
000000 00.000000 product
0 0
rootCause UNKNOWN string None UNKNOWN B Reason of this run of
pipeline.
version 0.8 string None 0.8 B Version of product
FORMATV 1.6.0.10 string None 1.6.0.10 B Version of product sc
hema and revision
startDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal start time o
000000 00.000000 f this product.
0 0
endDate 1958-01-01T00:00:00. finetime None 1958-01-01T00:00: Q Nominal end time of
000000 00.000000 this product.
0 0
instrument UNKNOWN string None UNKNOWN B Instrument that gener
ated data of this pro
duct
modelName UNKNOWN string None UNKNOWN B Model name of the ins
trument of this produ
ct
mission _AGS string None _AGS B Name of the mission.
============ ==================== ====== ======== ======= ================= ====== =====================
MetaData-listeners = ListnerSet{}}
>>> # get the URN string
... urn = prodref.urn
... print(urn) # urn:demopool_mh:fdi.dataset.product.Product:0
urn:demopool_mh:fdi.dataset.product.Product:0
>>> # re-create a product only using the urn
... newp = ProductRef(urn).product
... # the new and the old one are equal
... print(newp == x) # == True
True
Context and MapContext
Context is a Product that holds a set of ProductRef
s that accessible by keys. The keys are strings for MapContext which usually maps names to product references.
Examples (from Quick Start page):
This section shows essential steps how product references can be stored in a context.
>>> p1 = Product(description='p1')
... p2 = Product(description='p2')
... # create an empty mapcontext that can carry references with name labels
... map1 = MapContext(description='product with refs 1')
... # A ProductRef created with the syntax of a lone product argument will use a MemPool
... pref1 = ProductRef(p1)
... pref1
ProductRef(urnobj=Urn(urn="urn:defaultmem:fdi.dataset.product.Product:0", _STID="Urn"), _STID="ProductRef")
>>> # A productStorage with a LocalPool -- a pool on the disk.
... pref2 = pstore.save(p2)
... pref2.urn
'urn:demopool_mh:fdi.dataset.product.Product:1'
>>> # how many prodrefs do we have?
... map1['refs'].size() # == 0
0
>>> # how many 'parents' do these prodrefs have before saved?
... len(pref1.parents) # == 0
0
>>> len(pref2.parents) # == 0
0
>>> # add a ref to the context. Every productref has a name in a MapContext
... map1['refs']['spam'] = pref1
... # add the second one
... map1['refs']['egg'] = pref2
... # how many prodrefs do we have?
... map1['refs'].size() # == 2
2
>>> # parent list of the productref object now has an entry
... len(pref2.parents) # == 1
1
>>> pref2.parents[0] == map1
True
>>> pref1.parents[0] == map1
True
>>> # remove a ref
... del map1['refs']['spam']
... map1.refs.size() # == 1
1
>>> # how many prodrefs do we have?
... len(pref1.parents) # == 0
0
>>> # add ref2 to another map
... map2 = MapContext(description='product with refs 2')
... map2.refs['also2'] = pref2
... map2['refs'].size() # == 1
1
>>> # two parents
... len(pref2.parents) # == 2
2
>>> pref2.parents[1] == map2
True
Query
One can make queries to a ProductStorage and get back a list of references to products that satisfy search chriteria. Queries can be constructed using Python predicate expressions about a product and its metadata, or a function that returns True or False.
Examples (from Quick Start page):
A ProductStorage
with pools attached can be queried with tags, properties stored in metadata, or even data in the stored products, using Python syntax.
>>> # clean possible data left from previous runs
... poolname = 'fdi_pool_' + getpass.getuser()
... poolpath = '/tmp/' + poolname
... newpoolname = 'fdi_newpool_' + getpass.getuser()
... newpoolpath = '/tmp/' + newpoolname
... os.system('rm -rf ' + poolpath)
... os.system('rm -rf ' + newpoolpath)
... poolurl = 'file://' + poolpath
... newpoolurl = 'file://' + newpoolpath
... if PoolManager.isLoaded(DEFAULT_MEM_POOL):
... PoolManager.getPool(DEFAULT_MEM_POOL).removeAll()
... PoolManager.getPool(poolname, poolurl).removeAll()
... PoolManager.getPool(newpoolname, newpoolurl).removeAll()
... # make a productStorage
... pstore = ProductStorage(poolurl=poolurl)
... # make another
... pstore2 = ProductStorage(poolurl=newpoolurl)
>>> # add some products to both storages. The product properties are different.
... n = 7
... for i in range(n):
... # three counters for properties to be queried.
... a0, a1, a2 = 'desc %d' % i, 'fatman %d' % (i*4), 5000+i
... if i < 3:
... # Product type
... x = Product(description=a0, creator=a1)
... x.meta['extra'] = Parameter(value=a2)
... elif i < 5:
... ...
... x.meta['time'] = Parameter(value=FineTime1(a2))
... if i < 4:
... # some are stored in one pool
... r = pstore.save(x)
... else:
... # some the other
... r = pstore2.save(x)
... print(r.urn)
... # Two pools, 7 products in 3 types
... # [P P P C] [C M M]
urn:fdi_pool_mh:fdi.dataset.product.Product:0
urn:fdi_pool_mh:fdi.dataset.product.Product:1
urn:fdi_pool_mh:fdi.dataset.product.Product:2
urn:fdi_pool_mh:fdi.pal.context.Context:0
urn:fdi_newpool_mh:fdi.pal.context.Context:0
urn:fdi_newpool_mh:fdi.pal.context.MapContext:0
urn:fdi_newpool_mh:fdi.pal.context.MapContext:1
>>> # register the new pool above to the 1st productStorage
... pstore.register(newpoolname)
... len(pstore.getPools()) # == 2
2
>>> # make a query on product metadata, which is the variable 'm'
... # in the query expression, i.e. ``m = product.meta; ...``
... # But '5000 < m["extra"]' does not work. see tests/test.py.
... q = MetaQuery(Product, 'm["extra"] > 5000 and m["extra"] <= 5005')
... # search all pools registered on pstore
... res = pstore.select(q)
... # we expect [#2, #3] Contex is not a subclass of Product, which is being searched
... len(res) # == 2
2
>>> # see
... [r.product.description for r in res]
['desc 1', 'desc 2']
>>> def t(m):
... # query is a function
... import re
... # 'creator' matches the regex pattern: 'n' + ? + '1'
... return re.match('.*n.1.*', m['creator'].value)
>>> q = MetaQuery(BaseProduct, t)
... res = pstore.select(q)
... # expecting [3,4]
... [r.product.creator for r in res]
['fatman 12', 'fatman 16']
>>>
run tests
To test PAL functionalities based on local (JSON) pool and memory pool, run in the same directory:
make test2
To test functionalities based on http client pool, in one terminal run
make runpoolserver
run
make testhttp
examine output.
Design
Packages
Classes