In my previous post I’ve explained that RDF graphs are collections of triples each containing a subject, a predicate and an object. The standard serialization for RDF graphs are RDF/XML documents which are made available on the World Wide Web. RDF agents harvest these RDF/XML documents and store the resulting merged graph in triple stores. These triple stores can be queried using the SPARQL language, which is the SQL of the Semantic Web. SPARQL is used for further processing of the information carried in the RDF graphs.
Fig 1. Simplified view of an Semantic Web application
One problem with storing RDF/XML graphs in a triple store is that information about the origin of the RDF graph could be lost, depending on the triple store used [Ref]. To show how this can happen take a look at Fig 1. An RDF graph is created which contains statements about characters appearing in one of Shakespeare’s plays. The triples of the RDF graph live, for the sake of argument, as some information spread around in tables and columns of a database application. We use a pseudo N3 notation in this blog to display the in memory view of the docA RDF graph:
<#Romeo> r:loves <#Juliet> <#Juliet> r:daugherOf <#LadyCapulet> <#Mercutio> r:friendOf <#Romeo>
To serialize this RDF graph an RDF/XML document ‘docA’ is created and published on a webserver. There are other RDF/XML documents on the Internet called ‘docB’, ‘docC’, etc. Here is the in memory view of docB:
<#Romeo> r:sonOf <#LordMontague>
Note, the RDF/XML serialization of these graphs are not shown in these examples.
An RDF application harvests all these records, processes them and stores them in a triple store. The graphs are not stored as two seperate documents. No, a merged graph is created which contains the combined triples of all documents. If the two graphs docA and docB both contain statements about Romeo, then all these statements will be thrown on a heap in the triple store:
<#Romeo> r:loves <#Juliet> <#Romeo> r:sonOf <#LordMontague> <#Juliet> r:daugherOf <#LadyCapulet> <#Mercutio> r:friendOf <#Romeo>
Without special precautions, it is not possible to say which graph made which statement on Romeo.
To solve this problem, a process called RDF reification can be used. Reification are statements about stamentens. We could say that “Romeo loves Juliet” was created by docA:
<docA> a:type a:Statement
<docA> a:subject <#Romeo>
<docA> a:predicate r:loves
<docA> a:object <#Juliet>
Which means something like “docA, says: ‘Romeo loves Juliet’”. Do this for all the statements in all the graphs in docA, docB, .., store them again in the triple store, and you will have the context in which all statements were made. This is correct and works most of the time. However, formally you’ve created something that might mean something different than you hope [Ref]. RDF has powerful expressive power with layered semantics on top of which ontologies, rules, logic and proof of statements can be added. Reification adds to RDF the ability to create statements about statements. But, the resulting reified triple doesn’t have the same expressive power. A reified triple isn’t the triple itself. If we created exactly the same reified triple for a docC:
<docC> a:type a:Statement
<docC> a:subject <#Romeo>
<docC> a:predicate r:loves
<docC> a:object <#Juliet>
, then we can’t conclude that the same statement “Romeo loves Juliet” appears in both documents [Ref]. In RDF, reification is not a quoting mechanism [Ref].
Over the years extensions are proposed to the RDF to add contextual information to RDF graphs in other ways. One proposal is to move from triples (subject, predicate, object) to quads (context, subject, predicate, object) [Ref]. But this solution is dependent on client-side adaptation. Another proposal is to give names (URI’s) to the (sub)graphs by RDF graph creators, in a solutoin called Named Graphs [Ref]. This last proposal works like this, if RDF graph docA has triples like:
<#Romeo> r:loves <#Juliet> <#Juliet> r:daugherOf <#LadyCapulet> <#Mercutio> r:friendOf <#Romeo>
we can name this graph with an URI ‘graphA’:
<graphA> { <#Romeo> r:loves <#Juliet> <#Juliet> r:daugherOf <#LadyCapulet> <#Mercutio> r:friendOf <#Romeo> }
The same can be done for RDF graph docC:
<#Romeo> r:loves <#Juliet>
with a name ‘graphC’ we get:
<graphC> {
<#Romeo> r:loves <#Juliet>
}
The Named Graph approach defines that any statment about a graph name (like graphA, graphC) is a statement about the graph-as-a-whole. It is now possible to compare both statements “Romeo loved Juliet” and find out that one was produced by graphA and the other by graphC. Named Graph-enabled triples store (e.g. Jena) add this name as extra information which can be used in SPARQL queries. We can also add statemnts about the graph-as-a-whole. E.g.
<graphA> { <#Romeo> r:loves <#Juliet> <#Juliet> r:daugherOf <#LadyCapulet> <#Mercutio> r:friendOf <#Romeo> <graphA> d:creator <#Peter> }
Here we made ‘Peter’ the creator of the RDF graph named ‘graphA’.
Named Graphs, quads are gaining very fast popularity in the Semantic Web community with projects such as ORE, POWDER and OWL seeking ways to add metadata to (sub)graphs [Ref]. Unfortunately, serialization of Named Graphs in RDF/XML documents is problematic. There is no support for adding names in the current XML format. One suggestion is to use the URI of the RDF/XML document itself as the name of the graph. E.g. if I would create an RDF/XML document like [namespaces declarations omitted]:
<RDF>
<Description ID=”Romeo”>
<r:loves resource”#Juliet”/>
</Description>
…
</RDF>
, and would publish this as a “doc1”. Then, the Named Graph triples would become:
<doc1> {
<#Romeo> r:loves <#Juliet>
}
This method has the disadvantage that the URI used to name the graph is terribly overloaded. ‘doc1’ is used as the location of the RDF/XML graph and as the name of the graph, conflicts can occur. E.g. when I create a triple:
<doc1> r:owner “root”
Is the graph owned by ‘root’ (as in UNIX ownership) or the RDF/XML document? Probably the latter.
Another possible solution is to use a construct called ‘xml:base’ which provides a base URI for XML documents, and define this xml:base as graph name:
<RDF xml:base=”ABCD”>
<Description ID=”Romeo”>
<r:loves resource=”#Juliet”/>
</Description>
…
</RDF>
Which would result in these triples:
<ABCD> { <#Romeo> r:loves <#Juliet> }
This method (like the previous one) has the disadvantage that each (sub)graph you want to name should appear in a separate RDF/XML document, which can be problematic in many use cases.
A third proposal is being considered. By extending the RDF/XML with a new attribute rdf:graph, any description which carries this attribute will be ’stored’ in a graph named by the value of the attribute. E.g. if the RDF/XML in doc1 would contain:
<RDF> <Description ID=”Romeo” graph=”#gA”> <r:loves resource=”#Juliet”/> </Description> <Description ID="Romeo" graph="#gB"> <r:sonOf resource="LordMontague"/> </Description> <Description ID=”gA” graph=”#gA”> <d:creator>Peter</d:creator> </Description> <Description ID=”gB” graph=”#gB”> <d:creator>Mary</d:creator> </Description> </RDF>
Then this would be equivalent with these Named Graph triples:
<doc1> { <#Romeo> r:loves <#Juliet> <#Romeo> r:sonOf <#LordMontague"> <#gA> d:creator "Peter" <#gB> d:creator "Mary" } <doc1#gA> { <#Romeo> r:loves <#Juliet> <#gA> d:creator "Peter" } <doc1#gB> { <#Romeo> r:sonOf <#LordMontague"> <#gB> d:creator "Mary" }
The semantics would mean that the graph “Romeo loves Juliet” was created by “Peter” and the graph “Romeo is son of Lord Montague” is created by “Mary”. These graph are quite trivial, they contain only one triple. But, the same technique could be used for graphs containing many triples, as shown in Fig 2.
Fig 2. Graphical view of Named Graphs


