I tried generating a list of 10 suggested packages for every package in the archive. You cat get it at http://people.debian.org/~enrico/Suggestions.gz and build nice things with it.
Generation took two hours of a nice 18
process on the busy gluck, but it can
be run in any faster/less busy machine that has a local copy of the
unaggregated popcon data.
This is (roughly) the generating algorithm:
- Take a package P.
- Query P on a Xapian index that indexes popcon submissions as documents and their packages as words.
- Get the first 20 resulting popcon votes.
- Score each package mentioned in those 20 results by a combination of how many votes mention it, its TFIDF scores in the various votes and the Xapian relevance score of the resulting votes.
- Take the top 10 packages as suggestions for P.
The code can be fetched with:
bzr branch http://people.debian.org/~enrico/2007-01/popcon/