Contents |
Introduction
This document will step you through the process of creating an indexing backend. There are also querying backends, but they're not covered here. See the Google backend for a simple querying backend that you can base your own off of.
If you haven't read the Beagle Architecture Overview, you probably should do that now.
This document was written by Joe Shaw.
Trash backend
GNOME desktops have a Trash icon on the desktop. When files from the user's home directory are moved to the trash, they are moved into a directory called .Trash. Because Beagle does not crawl dot directories, these files are not normally indexed. This example backend will crawl the files in .Trash and set up inotify watches. Indexables will be produced through an indexable generator.
The beginnings
First we need to create a class for our backend and mark it as a backend:
using System;
using System.Collections;
using System.IO;
using System.Threading;
using Beagle;
using Beagle.Daemon;
using Beagle.Util;
[QueryableFlavor (Name="Trash", Domain=QueryDomain.Local, RequireInotify=false)]
public class TrashQueryable : LuceneFileQueryable, IIndexableGenerator {
public TrashQueryable () : base ("TrashIndex") { }
}
The QueryFlavor attribute tells the Beagle daemon some important things about this backend: that its name is "Trash", that its QueryDomain is Local, and that it doesn't require inotify.
QueryDomains specify the scope of the data covered by this backend. The values are Local, System, Neighborhood, and Global.
- Local - This backend deals with data for this user only.
- System - This backend is shared by multiple users on this system.
- Neighborhood - This backend accesses data on a trusted remote source
- Global - This backend access data on an untrusted or unknown remote source
In most cases, your indexing backends will be Local. If you create remote querying backends, they should be marked Global.
Our TrashQueryable class derives from the LuceneFileQueryable class, which is used for indexing backends that are built on top of files. For indexing backends that don't have a one-to-one mapping between files and indexables (like a mail backend, for instance), you should derive from LuceneQueryable.
Our class also implements the IIndexableGenerator interface. We will use this to create indexable objects on demand, rather than creating them individually up front and slamming the scheduler with indexing tasks.
Lastly, our constructor chains up to our base class's. The parameter tells the indexing code to create an index directory with the name "TrashIndex".
To let the beagle-daemon pick up the new backend, it needs to be added to an existing AssemblyInfo.cs, or you can create a separate one for the backend:
using System; using Beagle.Daemon; using Beagle.Daemon.TrashQueryable; [assembly: IQueryableTypes (typeof (TrashQueryable)) ]
Starting the backend
The first thing we should do is override the Start() method and begin indexing. Because the daemon starts up the backends one at a time sequentially, backends should start in a new thread or use the GLib main loop to defer most of their work. For our example, we'll start a new thread.
public override void Start ()
{
base.Start ();
ExceptionHandlingThread.Start (new ThreadStart (StartWorker));
}
private void StartWorker ()
{
// Ok, do stuff.
}
Ok, so now we're ready to do the grunt work. All backends should first crawl their data to index any data that has changed since the last time Beagle was run, or index it for the first time. To do this, we'll add a task for the scheduler to use our IIndexableGenerator implementation.
private void StartWorker ()
{
Log.Debug ("Starting up trash backend!");
// Set our backend's state to "Crawling"
State = QueryableState.Crawling;
// Create a scheduler task for our indexable generator
Scheduler.Task task;
task = NewAddTask (this); // The parameter is an IIndexableGenerator instance
task.Tag = "Dumpster diving"; // This task's unique identifier
task.Source = this; // The object that is responsible for this task
// Add the task to the scheduler
ThisScheduler.Add (task);
}
Now we just need to implement the methods from IIndexableGenerator, and we'll have a working backend.
// IIndexableGenerator implementations.
public string StatusName {
// Displayed in beagle-status
get { return "TrashQueryable"; }
}
// Called each time a set of indexable is written to the index
public void PostFlushHook () { }
IEnumerator files = null;
public bool HasNextIndexable ()
{
if (files == null) {
string trash_dir = Path.Combine (PathFinder.HomeDir, ".Trash");
files = DirectoryWalker.GetFileInfosRecursive (trash_dir).GetEnumerator ();
}
if (! files.MoveNext ()) {
// All finished crawling, reset our backend's state
State = QueryableState.Idle;
return false;
} else
return true;
}
public Indexable GetNextIndexable ()
{
FileInfo file = (FileInfo) files.Current;
Indexable indexable = FileToIndexable (file);
return indexable;
}
private Indexable FileToIndexable (FileInfo file)
{
if (! file.Exists)
return null;
if (IsUpToDate (file.FullName))
return null;
Uri uri = UriFu.PathToFileUri (file.FullName);
Indexable indexable = new Indexable (uri);
indexable.ContentUri = uri;
indexable.Timestamp = file.LastWriteTimeUtc;
indexable.HitType = "File";
indexable.AddProperty (Property.NewKeyword ("fixme:filename", file.Name));
return indexable;
}
All of our files (but not directories) are indexed and the fixme:filename property points to the file name.
Adding inotify support
Now, the only thing missing is our inotify logic. In HasNextIndexable(), we'll add:
if (Inotify.Enabled) Inotify.Subscribe (trash_dir, OnInotifyEvent, Inotify.EventType.CloseWrite | Inotify.EventType.MovedTo);
This will tell us if any files are written to or moved into the trash directory. And our OnInotifyEvent() method:
private void OnInotifyEvent (Inotify.Watch watch,
string path,
string subitem,
string srcpath,
Inotify.EventType type)
{
string file_path = Path.Combine (path, subitem);
Indexable indexable = FileToIndexable (new FileInfo (file_path));
if (indexable != null) {
Scheduler.Task task;
task = NewAddTask (indexable); // This parameter is an indexable instance
task.Priority = Scheduler.Priority.Immediate; // Run this task right away
ThisScheduler.Add (task);
}
}
And now any time a file is edited or moved into the trash directory, we index it.
Conclusion
There's a lot more work that can be done here. We don't handle files disappearing from the trash at all. We don't index or recurse into directories. But this should give you a good idea of what it takes to build a backend.
