A pet problem of mine to solve is that when you do a keyword search for "image editor", gimp does not show up. This is indeed because the description of gimp does not include the word "editor". And if you are a gimp developer, please don't change this otherwise I don't have a good example to point at anymore.
Now that I have a Xapian engine in ept-cache, I already managed to make gimp actually appear somewhere in the search results through approximated matches.
But I know one can do better: debtags is very good in providing data that brings gimp back to the world of image editors. So, how can one use tags to improve the xapian results? Today the Xapian developers told me.
I've implemented it in ept-cache 0.5.6, that I've just uploaded to unstable.
After you install it, do an ept-cache reindex
to rebuild the index with
debtags in it.
The system is very clever: here is how it works.
First you prepare the query as usual: tokenize, stem and so on:
// This is the nice ept interface to Xapian TextSearch textsearch; Xapian::Enquire enquire(textsearch); // Set up the base query Xapian::Query query = textsearch.makeORQuery(keywords.begin(), keywords.end()); enquire.set_query(query); // Get a set of tag-based tokens that can be used to expand the query vector<string> expand = textsearch.expand(enquire); // Build the expanded query Xapian::Query expansion(Xapian::Query::OP_OR, expand.begin(), expand.end()); enquire.set_query(Xapian::Query(Xapian::Query::OP_OR, query, expansion)); // Get the results as usual Xapian::MSet matches = enquire.get_mset(pos, 20); for (Xapian::MSetIterator i = matches.begin(); i != matches.end(); ++i) ...
And this is how you build the expanded query:
// This functor filters out all tokens that are not tags // (tags are indexed with a 'T' prefix) struct TagFilter : public Xapian::ExpandDecider { virtual bool operator()(const std::string &term) const { return term[0] == 'T'; } }; static TagFilter tagFilter; vector<string> TextSearch::expand(Xapian::Enquire& enq) const { // A Xapian RSet is a list of keywords that can be used to 'expand' // the search to show more documents like a set of given ones Xapian::RSet rset; // Select the top 5 result documents as the 'good ones' to // use to expand the search Xapian::MSet mset = enq.get_mset(0, 5); for (Xapian::MSet::iterator i = mset.begin(); i != mset.end(); ++i) rset.add_document(i); // Get the expansion terms, but only those that are tags Xapian::ESet eset = enq.get_eset(5, rset, &tagFilter); vector<string> res; for (Xapian::ESetIterator i = eset.begin(); i != eset.end(); ++i) res.push_back(*i); // Pass the tags to the caller, who will OR them to their normal keyword search return res; }