Research Topics
Schema Inference
A key aspect of schema inference is the ability to automatically extract and understand the structure and semantics of data from various sources. This involves identifying patterns, relationships, and dependencies within the data, as well as inferring the types and constraints of the entities involved. Schema inference is crucial for tasks such as data integration, data cleaning, and data analysis, where a clear understanding of the underlying data model is essential.
In Commonplace, we think of schema inference in a more limited sense, initially, although we should examine where we could apply these ideas for groups of cards or entire collections.
Initially, we are using a narrower approach to schema inference, focusing on the structure and semantics of individual cards within a collection. This allows us to infer the types and constraints of entities within the collection, which can then be used to improve the user experience and enable more advanced features.
To follow up on:
- Starting with freeform data on a card.
- Looking only for which existing schemas could map to this specific card.
- In future, analysing the relationships between entities within the collection to infer more complex schemas.
Academic research on schema inference:
-
Schema Matching and Mapping: Research on automatically identifying correspondences between different database schemas or data models. Key work includes similarity-based approaches and machine learning techniques for schema alignment.
-
Schema Evolution and Versioning: Studies on how database schemas change over time and techniques for managing these changes. Relevant to our card schema compatibility and merging features.
-
Automatic Schema Generation from Semi-Structured Data: Research on inferring schemas from JSON, XML, and other flexible data formats. Particularly relevant for our card field inference.
-
Ontology Learning: Work on automatically extracting conceptual structures and relationships from data, which relates to our collection-level schema inference goals. NOTE This one could be potentially very interesting to follow up on, but I think this would work at a Collection level not at a Card level, and could be experimented with using Commonscript if that is performant enoiugh.
-
Entity Recognition and Type Inference: Machine learning approaches to identifying entity types and relationships in unstructured data, applicable to inferring card schemas from content.
-
Schema Summarization: Techniques for creating concise representations of complex schemas, relevant to our schema signature concept.
TODO: Check with Andras on ontology discovery?