Comment on page
What problem are we trying to solve?
Today, companies (and also the public sector) are capturing and storing an enormous amount of data from their customers and citizens.
Nevertheless, it is difficult for a single organization to house all relevant data and skills necessary to analyze such data. Additional data may be needed to enrich the context and/or the solution space. Therefore organizations are usually in need to collaborate with external organizations (separate legal entities).
Data may come from different sources:
- Public sources such as open data portals
- Purchased datasets from commercially-driven data providers
- Clients, suppliers or other stakeholders from the organization's environment
Either way, a clear contract between data providers and data consumers must be set in place.
Additionally, both parties (consumers and providers) may be located in different countries and even different regions, which incur in extra efforts to enable the respect of the corresponding regulations.
Furthermore, once the counter-parties have been identified and the agreements have been in place, additional technical hurdles kick-in, e.g. what format should the data be shared? excel documents, json documents, sql databases, nosql databases, etc.
It is generally not recommended to aggregate the data in a single data store, as to avoid honeypots vulnerable to ransomware, so a more decentralized approach is desirable.
RBAC has many benefits for centralized data management systems, but it gets increasingly hard when the number of participants (different employees from different organizations) increases. In those situations, it would be better to automate the access by interpreting the Roles and Responsibilities that are mentioned in the Data-sharing agreement.
In any given centralized system, there will be a trade-off between the "standardization" of data before importing it, or a limited search of disparate data formats. The ideal solution would give you freedom to share data in any format (no specific schema) and enabling users to "query data" as if they were "rows & columns".