Beagle supports the following data sources:

and extracts text and metadata from the following file formats:

  • Folders
  • Office Documents
    • OpenOffice.org (sxw, sxc, sxi and more)
    • OpenDocument (odt, ods, odp)
    • Microsoft Office (doc, xls, ppt)
    • AbiWord (abw)
    • Scribus (sla)
    • Rich Text Format (rtf)
    • PDF
  • Text Documents
    • HTML (xhtml, html, htm)
    • Source code (Boo, C, C++, C#, Fortran, Java, JavaScript, Lisp, Matlab, Pascal, Perl, PHP, Python, Ruby, Scilab and Shell scripts)
    • Latex, BibTeX
    • Plain text (txt, any plain text file that isn't filed under any other category)
  • Documentation/Help Documents
    • Texinfo
    • Man pages
      • gzip and bzip2 compressed man pages
    • Info pages
      • gzip and bzip2 compressed info pages
    • Docbook
    • Monodoc
    • Windows help files (chm)
  • Images (jpeg, png, bmp, tiff, gif, svg)
    • Exif tags, IPTC tags and other metadata are indexed
    • F-Spot and Digikam tags in the images are also indexed
  • Audio (mp3, ogg, flac, ape, mpc, m4a, aac, tracker, amiga audio, wma)
    • m3u and pls playlists
  • Video (mpeg, asf, wmv, mng, mp4, quicktime and other formats supported by MPlayer or Totem)
  • Archive files (zip, tar, gzip, bzip2) and their contents
  • Application launchers
  • Linux packages (ebuild, rpm, dpkg)
  • Generic XSLT files

Beagle also allows users to write their own simple filters by using external programs. For example, one could use untex to extract text data from TeX files. To create these filters, users simply add information to the external-filters.xml file. Instructions and a sample config file can be found at external-filters.xml.sample file.

To obtain examples or to share your own external-filters.xml, check out the ExternalFiltersRepository.


To obtain the list of supported data sources for a beagle installation, give the command

$ beagle-info --list-backends

To obtain the list of supported filters, give the command

$ beagle-info --list-filters

This page was last modified 19:10, 2 June 2008. This page has been accessed 75,496 times.

  
MediaWiki

Copyright © 2004-2007