Data Abstraction and Virtualization

Access and import data sources into the DataPorts Platform

2 minute read

Data virtualization is a data integration technique that provides access to information via a virtualized service layer, regardless of the location of the data sources. It allows applications to access data, from a variety of heterogeneous sources, through a single endpoint, thus providing a unified, abstracted, and encapsulated view of information for query purposes, while being able to transform and process the data to prepare it for consumption. A significant challenge in data virtualization is to manage different types of storage systems (e.g., key-value, document, or relational databases) which all need to be integrated. In addition, data-intensive applications, that use a virtualized data source, still expect certain quality of service guarantees from the system, such as performance, availability, etc. The Data Abstraction and Virtualization (DAV) component attempts to deal with those challenges, also contributing to the data interoperability of the platform. Furthermore, it focuses on the fulfilment of the project’s requirements that are related to quality of data.


In a nutshell, Data Abstraction & Virtualization (DAV) is responsible for:

  • Correctly preparing data input from different sources inside the generic DataPorts architecture
  • Maintaining metadata from all feeds
  • Exporting the “cleaned” & processed datasets through exposed RESTful APIs, thus making them available to any eventual client. Persistent data streams (that is, data that has already been collected and stored) are the primary source of load for DAV.

Demo | Screenshots

DAV’s nature is to function in the background, accepting incomming datasets, filtering them, storing them and then forwarding them to clients. There is no UI or some kind of command interface. Therefore, a Demo is not applicable.


Getting Started

Initial steps to deploy and configure the software

How to use it

User guide to understand how the software is used


OpenAPI specification to interact with Data Abstraction & Virtualization’s Virtual Data Container (VDC), in order to get data

Source Code

Other docs

(Optional) Additional documents worth reading

Last modified April 19, 2023: Files update (a811bf0)