Flexible Dataset Integrator (fdi)

FDI, known as SPDC before, is written in Python for integrating different types of data, and letting the integrated product take care of inter-platform compatibility, serialisation, persistence, and data object referencing that enables lazy-loading.

Features

With FDI one can pack data of different format into modular Data Products, together with annotation (description and units) and meta data (data about data). One can make arrays or tables of Products using basic data structures such as sets, sequences (Python list), mappings (Python dict), or custom-made classes. FDI accomodates nested and highly complex structures.

Access APIs of the components of ‘FDIs’ are convenient, making it easier for scripting and data mining directly ‘on FDIs’.

All levels of FDI Products and their component (datasets or metadata) are portable (serializable) in human-friendly standard format (JSON implemented), allowing machine data processors on different platforms to parse, access internal components, or re-construct “an FDI”. Even a human with a web browser can understand the data.

The toString() method of major containers classes outputs nicely formated text representation of complex data to help converting FDI to ASCII.

Most FDI Products and components implement event sender and listener interfaces, allowing scalable data-driven processing pipelines and visualizers of live data to be constructed.

FDI storage ‘pools’ (file based and memory based) are provided as references for 1) queryable data storage and, 2) for all persistent data to be referenced to with URNs (Universal Resource Names).

FDI provides Context type of product so that references of other products can become components of a Context, enabling encapsulation of rich, deep, sophisticated, and accessible contextual data, yet remain light weight.

For data processors, an HTML server with RESTful APIs is implemented (named Processing Node Server, PNS) to interface data processing modules. PNS is especially suitable for Docker containers in pipelines mixing legacy software or software of incompatible environments to form an integral data processing pipeline.

This package attempts to meet scientific observation and data processing requirements, and is inspired by data models of, and designs APIs as compatible as possible with, European Space Agency’s Interactive Analysis package of Herschel Common Science System (written in Java, and in Jython for scripting).

FDI Python packages

  • The base data model is defined in package dataset.

  • Persistent data access, referencing, querying, and Universal Resource Names are defined in package pal.

  • A reference REST API server designed to communicate with a data processing docker using the data model is in package pns.

API Document

_images/packages_all.png

Indices and tables