Contents

Beagle RDF Store

Beagle extracts and keeps track of data and metadata from nooks and corners of any user's information space. We are talking about his/her files, emails, contacts, appointments, bookmarks, browsing history, chats etc etc etc. While this information is enough to provide a search service, allowing users to search in the extracted information, there are many interesting possibilities if metadata from different sources could be related to each other. For example, relate links in emails to browsing histories. Bookmarks to bibtex entries. Emails in notes to contacts. This is briefly one goal of Semantic desktop.

There has been talks about an RDF store on these extracted data. A few implementations exist too. As of beagle May-19,2008 the svn trunk has a virtual read-only RDF store and the BeagleClient C# API gives programs access to this RDF store. A description of RDF with respect to Beagle and desktop indexing services in general can be found at http://dtecht.blogspot.com/2008/03/beyond-search-arrhhdeeefff.html.


Building RDF Store

You need to builld beagle from source to use this feature. Check out Getting Started on how to set up beagle.

Advanced data and metadata queries are already possible with beagle Query Syntax. Since beagle currently does not ship with any other service which can make use of the RDF store, this feature is disabled by default. You have to pass --enable-rdf-adapter to ./configure to enable it. Note that, the RDF store requires a slightly different index than the usual beagle index. In short, the index created by the RDF store can be used by normal beagle but not the other way around.

Using RDF Store

All the RDF related libraries, programs and tools are in beagle/RDFAdapter. Beagle uses the reputed SemWeb library to understand RDF.

Roughly the architecture is as follows. Beagle provides a BeagleSource which acts as a gateway to the RDF store. Programs submit RDF queries (the RDF store is necessarily read-only) to the BeagleSource. The BeagleSource converts uses the BeagleClient API to send it to the normal beagle query service. The results are again converted to the appropriate RDF triples and presented to the user. SemWeb comes with sophisticated Reasoning abilities and other RDF operations which can also be used on the result triples. SemWebClient.cs contains some examples queries.

  • TestSource - queries for different types of RDF triples
  • TestEmailThreads - returns all emails from any threads on the given subject
  • TestReceipients - returns all receipients of emails from the given sender

The following features from SemWeb can be useful:

  • Notation 3: Reading and writing NTriples, Turtle, and most of Notation 3 .
  • Persistent storage supports an extended Select operation to query many things at once (much faster than making individual calls to the underlying database).
  • Reasoning: RDFS reasoning (though not complete) and rule-based reasoning based on the backward-chaining Euler engine, over any data.
  • 4-Tuples: Statements are quads, not triples. The fourth meta field can be used for application-specific purposes, like storing provenance, grouping statements, or storing N3 formulas.
  • Querying: Simple graph entailment tests and SPARQL queries.
  • Experimental algorithms for finding MSGs and making graphs lean.

Limitations

The RDF store the user sees is a virtual read-only store. This means it is not possible to store data in the store; this is because beagle is about indexing existing information. There are many RDF stores available if you need one to store your data in. SemWeb itself comes with such stores.

More importantly, the store is virtual. This means, the RDF triples are dynamically generated at query time from the normal beagle index. Since a full text indexing service has different goals than an RDF triple store, this means there are subtle pitfalls one should be aware of when using the beagle RDF store which is actually built from the beagle index. For example, if the data for a file file:///a contains the word beagle, querying for all triples with object=beagle will return

 file:///a   http://beagle-project.org/#text    beagle

but querying for all triples with subject=file:///a will not return the same triple. This is because the #text predicates are generated from non-existent properties. Most of the predicates are thankfully generated from existing properties and will not have this problem.

Also, during searching beagle performs stemming. Since RDF querying generally involves querying for exact matches, this can sometimes cause some incompatibilities.

In any case, there is no fix for the above issues. Most of the times they can be worked around by rewriting the query in a different way. And if you get stuck, do not hesitate to file a bug or email us.


This page was last modified 19:03, 21 May 2008. This page has been accessed 976 times.

  
MediaWiki

Copyright © 2004-2007