The IRL Record Ontology

While constructing the semantic architecture of the IRL Linked Data platform, we make a clear distinction between (i) the curation of the encoded data contained in the records as well as the long-term preservation thereof, managed by a digital archivist, and (ii) the analysis and interpretation of that data to answer specific questions for historians and researchers. To support the two processes and to maintain the clear separation of concerns, two distinct but interrelated knowledge bases are developed [1].

A first knowledge base is set up to contain the in RDF encoded records using a “flat” ontology that captures the information contained in those records in a lexical manner. While encoding, any noise such as errors or missing values will be respected to preserve the original historical record and provenance. This first knowledge base is then used as input to populate a second knowledge base with more expressive ontologies and with specific assumptions and interpretations of values prior to the data analysis.

In this post, we present the ontology for representing vital records and elaborate on its construction and validation process, which we will from now on call the records ontology. This records ontology had to be created as, to the best of our knowledge, available ontologies for vital records were non-existent.

The latest version of the records ontology can be found by following http://www.purl.org/net/irish-record-linkage/records

Construction of the records ontology

Births, deaths and marriages were captured per district (within a union, within a county) as single records on register pages. These pages can contain up to 10 records after which such a page is signed off by the registrar and sent to the superintendent registrar for inspection and validation. To create a first version of the records ontology, we merely “lifted” the information one could see on one such register page to an ontology.

To minimize interpretation, we choose to develop a “flat” ontology, which means that most information that can be found on such a register page was captured as literals. For example, instead of creating a concept Person that can have a fore– and surname, we choose to relate the concept of a Record to these attributes.

For the records ontology, we have defined the following concepts:

  • A RegisterPage, which contains records.
  • Record to represent the different types of records. Each record must belong to a register page and each register page can have zero (blank pages) or more records.
  • Then we make a distinction between a Certificate and a MarriageRecord, both of them being subclasses of the concept Record. The first has as a subject only one person and the latter two. The two concepts are disjoints, which makes that no instance of a certificate can be an instance of a marriage record and vice verse.
  • Finally, we created two disjoint subclasses of the concept Record to represent birth- and death records; BirthRecord and DeathRecord.

The only object properties – a relation between two concepts – we needed were to relate records to register pages. All other properties are data type properties. Data type properties are related to the greatest common denominator. For instance, all records are signed off by a registrar on a certain date. The date of registration as well as information on the registrar are therefore related to the concept of Record so that all subtypes of this class inherit this property.

<owl:DatatypeProperty rdf:about="&records;dateOfRegistration">
   <rdfs:label rdf:datatype="&xsd;string">date of registration</rdfs:label>
   <rdfs:comment rdf:datatype="&xsd;string">The registration date of a record.</rdfs:comment>
   <rdfs:domain rdf:resource="&records;Record"/>
   <rdfs:range rdf:resource="&rdfs;Literal"/>
</owl:DatatypeProperty>

One of the challenges is to capture the domain as well as possible, yet maintain a valid OWL 2 [2] ontology.  As explained by Motik and Horrocks in [3], it is difficult to reason about date and time intervals, and therefore only specific points in time (captured by both xsd:dateTime and xsd:dateTimeStamp) were “amenable for implementation” and those “can be handled by techniques similar to the ones for numbers.” Together with the digital archivist, we choose not to capture dates mentioned in records as instances of xsd:dateTime as we do not know the exact times and we felt uncomfortable to encode “default” times. We thus chose to declare the range of these properties as being rdfs:Literal, but provided encoding guidelines in which the use of xsd:date was to be highly encouraged.

Assessing the ontology

The records ontology was evaluated for any problems using the OOPS! Ontology Pitfall Scanner [4] (http://oeg-lia3.dia.fi.upm.es/oops/catalogue.jsp). OOPS! allows one to quickly scan an ontology for common or potential problems based on experience of many ontology projects in an automated way.

Minor problems surfaced such as the lack of documentation (comments and labels) and ontology annotations that were quickly rectified. Using OOPS!, and interesting question surfaced. Initially, we did not provide an inverse relation for the predicate hasRecord, which has as domain RegisterPage and range Record. OOPS! suggested us to declare an inverse relation, which is useful for browsing through the data by means of, for instance, faceted browsing.

After declaring the inverse property and re-evaluating the ontology, however, the framework suggested us to also explicitly declare the domain and range of the inverse property. Ontologies are supposed to be minimally redundant and the domain and range of the inverse property can be inferred from the relation using a reasoner. This redundancy can lead to errors if one changes the domain and range of one relation, but not the other. Though one can debate whether this really poses a problem – it is true that the redundancy does help a human understanding the inverse relation – we decided to declare those “missing” domains and ranges.

Ongoing work

The digital archivist is currently encoding the different records in a relational database using adequate input mechanisms. We adopted R2RML [5] to create RDF triples from the relational database via a mapping language. Those generated triples are loaded in a triplestore to constitute our first knowledge base. The construction of the second knowledge base and the construction of the second, more expressive ontology will be the subject of a second post in the near future.

References

[1] C. Debruyne, O. Beyan, S. Decker and S. Collins. Using Semantic Technologies to Create Virtual Families from Historical Vital Records, 1st EUON Workshop, 2014.

[2] W3C. OWL 2 Web Ontology Language Document Overview (Second Edition), 2012. Via http://www.w3.org/TR/owl2-overview/ (last accessed December the 2nd, 2014).

[3] B. Motik and I/ Horrocks. OWL Datatypes: Design and implementation. Springer Berlin Heidelberg, 2008.

[4] M. Poveda-Villalón, M. C. Suárez-Figueroa and A.  Gómez-Pérez. “Validating ontologies with oops!.” Knowledge Engineering and Knowledge Management. Springer Berlin Heidelberg, 2012. 267-281.

[5] W3C. R2RML: RDB to RDF Mapping Language, 2012. Via http://www.w3.org/TR/r2rml/, last accessed December 2, 2014.

Advertisement

Problematising Pregnancy in Ireland, 1864-1913

Perceptions of Pregnancy

Submitted by Dr Ciara Breathnach, Department of History, University of Limerick, Ireland.

Key words: infant mortality, illegitimacy, institutions

Problematic motherhood in Free State Ireland was routinely conflated with discourses of morality and illegitimacy, this tendency meant that overarching issues of health inequality and associated problems did not receive due consideration.  Indeed, the socio-legal positioning of the family followed the dictates of Roman Catholicism, the majority religion, much to the detriment of the socially disadvantaged. Together with Eunan O’Halpin I have co-written two articles on unknown infant dead, where parentage was unknown (Irish Historical Studies, 38:149,2012) and on the subject of unnamed infant dead, where parents were known to the authorities (Social History, 39:2, 2014) and placed them in wider social contexts. Our analysis of the records of civil registration and coroners’ courts records has led us to the conclusion that dire poverty played a central role in…

View original post 239 more words

Irish Record Linkage, 1864-1913 Project

Irish Record Linkage is a project funded by the Irish Research Council, and developed in partnership with the Digital Repository of Ireland, University of Limerick and Insight at NUI Galway.

This project uses pre-digitised births, deaths and marriages records, generously shared by the Office of the Registrar General. These vital registration records are stored and used in line with data protection best practice for the research purposes of the IRL project. Semantic Web and Linked Data technologies are applied to create a platform to store and link the records. The resulting platform will provide a powerful research resource to enable the University of Limerick’s project participants to study Irish infant and maternal mortality rates and patterns during this period of Irish history. The project aims to provide a comprehensive map of infant and maternal mortality for Dublin from 1864-1913.

Archives & Linked Data
Linked Data involves publishing structured data on the Web, allowing it to be connected and
enriched and facilitating linking between related resources. Linked Data refers to data published on the Web following a set of principles designed to promote linking between entities. An essential requirement to enable this linking is that each entity (for example a personal name) is given a unique identifier, generally in the form of a Uniform Resource Identifier (URI). Having determined these URI identifiers, Linked Data reuses other data models such as the Resource Description Framework (RDF) to specify the links, and their type, between two URIs. As well as serving the purpose of identifying and expressing the objects, the assignment of a URI removes any ambiguity between people of the same name; a key concern in relation to vital registration data from 19th and early 20th century Ireland.

Linked Data is an incredibly powerful tool when applied to archives collections as it has the potential to greatly enrich archival cataloguing and searching. Translating archival catalogues into Linked Data allows for linking to other digitised collections and reveal interrelations in vast archival collections. There is also great potential to enrich collections through providing further contextual information; for example by linking to geographic coordinates, place name information or statistical information.

Using Linked Data can greatly benefit the archival community as it enables archives services to keep pace with the digital environment, meet ever increasing online access demands and provide a rich resource for stakeholders.The Linked Logainm project, launched by the DRI in September 2013, demonstrates the significance of Linked Data for cultural institutions. Linked Logainm is a Linked Data version of the logainm.ie database, providing Irish place name data in computer readable formats. The project greatly enriched this bilingual authoritative list of place names by enhancing searchability and linking to appropriate maps; thus allowing its value to be fully exploited by users. A Location LODer demonstrator website was also created to provide an interactive introduction to the potential of Linked Logainm. (http://apps.dri.ie/locationLODer)

Project Methodology
A key initial step in this project is the ontology construction, engineered by the Linked Data
specialists. This ontology is driven by key research questions and will influence the construction of the platform, and later, what information can be extracted. The next stages in the project will involve data ingest, setting up the Linked Data infrastructure and data preparation and curation, following best practice in digital archiving.
This project will also demonstrate some of the potential of the significance of Linked Data,
particularly for the archives community. I look forward to updating you all further on the project as it progresses. Please see http://www.dri.ie/projects for project updates and more information.
For further information on the work of the Digital Repository of Ireland, please see http://www.dri.ie
Dolores Grant
IRL-DRI Digital Archivist, Digital Repository of Ireland

First published in the Archives and Records Association Summer newsletter 2014.