Henriette's Notes

Home » Posts tagged 'semantic web' (Page 2)

Tag Archives: semantic web

Using Jena and SHACL to validate RDF Data

RDF enables users to capture data in a way that is intuitive to them. This means that data is often captured without conforming to any schema. It is often useful to know that an RDF dataset conforms to some (potential partial) schema. This is where SHACL (SHApe Constraint Language), a W3C standard, comes into play. It is a language for describing and validating RDF graphs. In this post I will give a brief overview of how to use SHACL to validate RDF data using the Jena implementation of SHACL.

A SHACL Example

We will use an example from the SHACL specification. Assume we have a file person.ttl that contains the following data:

person

Example RDF data

To validate this data we create a shape definition in personShape.ttl containing:

personShape

Person shape definition

A Code Example using Jena

To validate our RDF data using our SHACL shape we will use the Jena implementation of SHACL. Start by adding the SHACL dependency to your Maven pom.xml. Note that you do not need to add Jena as well as the SHACL pom already includes Jena.

SHACLPom

SHACL Maven dependency

In the code we will assume the person.ttl and personShape.ttl files are in $Project/src/main/resources/. The code for doing the validation is the following then:

personValidation

Java code using Jena implementation of SHACL

Running the Code

Running the code will cause a report.ttl file to be written out to $Project/src/main/resources/. We can determine that our data does not conform by checking the sh:conforms property. We have 4 violations of our ex:PersonShape:

  1. For ex:Alice the ex:ssn property does not conform to the pattern defined in the shape.
  2. ex:Bob has 2 ex:ssn properties.
  3. ex:Calvin works for a company that is not of type ex:Company.
  4. ex:Calvin has a property ex:birthDate that is not allowed by ex:PersonShape since it is close by sh:closed true.

A corrected version of our person data may look as follows:

personCorrected

Person data that conforms to our person shape

Conclusion

In this post I have given a brief overview of how SHACL can be used to validate RDF data using the SHACL implementation of Jena. This code example is available at shacl tutorial.

DBPedia Extraction Framework and Eclipse Quick Start

I recently treid to compile the DBPedia Extraction Framework. What was not immediately clear to me is whether I have to have Scala installed. It turns out that having Scala installed natively is not necessary, seeing as the scala-maven-plugin is sufficient.

The steps to compile DBPedia Extraction Framework from the command line are:

  1. Ensure you have the JDK 1.8.x installed.
  2. Ensure Maven 3.x is installed.
  3. mvn package

Steps to compile DBPedia Extraction Framework from the Scala IDE (which can be downloaded from Scala-ide.org) are:

  1. Ensure you have the JDK 1.8.x installed.
  2. Ensure you have the Scala IDE installed.
  3. mvn eclipse:eclipse
  4. mvn package
  5. Import existing Maven project into Scala IDE.
  6. Run mvn clean install from within the IDE.

Associations between Classes

This far we have only considered UML classes where the attributes are primitive types rather than classes. Here we will consider UML classes that have classes as attributes. Assume we want to model projects. Assume a project must have one name, one sponsor that must be a manager and it must have a team of between 3 and 10 employees. In UML this can be stated using attributes (see Fig.1(a)) or associations (see Fig. 1(b)). For interest sake Wazlawick [1] suggests using attribute notation for data types and associations for classes. His motivation is that associations makes dependencies between classes more apparent. I usually follow this guideline myself.

Fig. 1

Fig. 1

The OWL representation for these 2 class diagrams is given in Fig. 2. The first thing to notice is that we use ObjectProperty instead of DataProperty to represent the sponsor attribute/association. Similar for the team attribute/association. Our property definitions also now have Domain and Range restrictions. When we say that Susan is the sponsor for ABC, we can infer that Susan is a manager and ABC is project. This information can be captured through Domain and Range restrictions. For the purpose of finding modeling errors in it is preferable to add Domain and Range restrictions.

Association between Classes Manchester

Fig. 2

To limit the number of employees on a team to between 3 and 10 employees we use the property cardinality restrictions team min 3 owl:Thing and team max 10 owl:Thing. It may seem strange that we use team max 10 owl:Thing rather than team max 10 Employee. Surely we want to restrict team members to employees? Well true, but that is achieved through our range restriction on the team object property. Here we restricting our team to 10 whatever classes and the range restriction will infer that the team must be of type Employee.

References

1. R. S. Wazlawick, Object-oriented Analysis and Design for Information Systems: Modeling with UML, OCL and IFML, Morgan Kaufmann, 2014.