django-xappy: Searching with Xapian

Yes, yet another search app. However, unlike other projects, say this year’s GSoC project, it doesn’t try to be generic. Rather, it is specific to Xapian, making it possible to make full use of all the advanced features the excellent Xappy library offers. Xappy is a high-level interface to Xapian (as opposed to the tedious-to-use lowlevel Python bindings), and supports some pretty nice stuff like facets, tags or ranges.

Yes, I also know about djapian, but… I really wanted to use Xappy!

So, here’s how it works. First, define your index (note that since I really built this for my own use where indexes usually span multiple models, it might not be as simple as it could be if you want to search through only one – at least, right now).

from django_xappy import Index, action, FieldActions

class MyIndex(Index):
    location = '/var/www/mysite/search-index'

    class Data:
        @action(FieldActions.INDEX_FREETEXT, spell=True, language="en")
        @action(FieldActions.STORE_CONTENT)
        def name(self):
            if self == auth.models.User:
                return self.content_object.username  

        @action(FieldActions.SORTABLE, type="date")
        def date(self):
            if self == app.models.Book:
                return self.content_object.released_at
            elif self == user:
                return self.content_object.date_joined

MyIndex.register(Book)
MyIndex.register(auth.models.User)

This says where the index lives, what fields it has, what Xappy actions to use for each, and from where to get the data for those fields (note that not all models have to provide data for each field).

Now, the way django-xappy works is, it logs all changes in a database table instead of updating the index directly. This means that your index won’t always be up-to-date, but also, that the rest of your site’s functionality will never be affected by troubles with your search engine. Instead, you regularly apply the changes to your index (e.g. using a cronjob). The easiest way to do that is using the management command.

Let’s create the index for the first time:

PS G:...trunkexamplessimple> .manage.py index --full-rebuild
Creating a new index in "index-1218575904"...
Indexing 11 objects of type "Book"...
Indexing 2 objects of type "User"...
Switching "index-1218575904" to live index...
Done.

Then, after doing some changes, say in the admin:

PS G:...trunkexamplessimple> .manage.py index --update
Updating 1 index with 18 changes...
Done.

Good. Now that we have an up to date index, let’s search:

results = search("searchterm", page=1, num_per_page=10)

Pass results to your template:

    {% if results %}
        {% for result in results %}
            {{ result.content_object }}
        {% endif %}
    {% endif %}

Done! For more information, see the readme – and don’t forget to check out the Xappy docs as well.

As usual, the code is available on Launchpad – and now also PyPi. Via bzr:

bzr branch lp:django-xappy

For questions and support, use the django-apps Google Group and prefix your message with (xappy).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s