While answering to a long message in the debtags-devel mailing list I accidentally put together the pieces of a fun idea.
This is the bit of message I was answering:
- It would be very useful if the means for indicating the supported data formats was more comprehensive. This could mean a lot of expanding in the "works-with-format" section of the vocabulary, which doesn't even include formats such as gif or mpg at the moment. I don't know how feasible it is to alter underlying debtags functionality, but perhaps it would be the easiest to make "works-with-format" a special case tag which allows for formats not listed in the vocabulary.
This is my answer:
Good point. The idea has popped up in the past to list supported mime types among the package metadata, so that one could point to a file and get a list of all the packages that can work with it.
I'm not sure it's a good idea to encode mime types in debtags and I'd like to see something ad-hoc for it. In the meantime works-with-format is the best we can do, but we should limit it to the most common formats.
This is the fun idea: if works-with-format
is the best we can do, what can we
do with it?
Earlier today I worked on resurrecting some old code of mine to expand Zack's ls2rss with Dublin Core metadata extracted from the files. The mime type scanner was ready for action.
Some imports:
import sys
# Requires python-extractor, python-magic, python-apt
# and an unreleased python-debtags from http://bzr.debian.org/bzr/pkg-python-debian/trunk/
import extractor
import magic
from debian_bundle import debtags
import re
from optparse import OptionParser
import apt
A tenative mapping between mime types and debtags tags:
mime_map = (
( r'text/html\b', ("works-with::text","works-with-format::html") ),
( r'text/plain\b', ("works-with::text","works-with-format::plaintext") ),
( r'text/troff\b', ("works-with::text", "works-with-format::man") ),
( r'image/', ("works-with::image",) ),
( r'image/jpeg\b', ("works-with::image:raster","works-with-format::jpg") ),
( r'image/png\b', ("works-with::image:raster","works-with-format::png") ),
( r'application/pdf\b', ("works-with::text","works-with-format::pdf")),
( r'application/postscript\b', ("works-with::text","works-with-format::postscript")),
( r'application/x-iso9660\b', ('works-with-format::iso9660',)),
( r'application/zip\b', ('works-with::archive', 'works-with-format::zip')),
( r'application/x-tar\b', ('works-with::archive', 'works-with-format::tar')),
( r'audio/', ("works-with::audio",) ),
( r'audio/mpeg\b', ("works-with-format::mp3",) ),
( r'audio/x-wav\b', ("works-with-format::wav",) ),
( r'message/rfc822\b', ("works-with::mail",) ),
( r'video/', ("works-with::video",)),
( r'application/x-debian-package\b', ("works-with::software:package",)),
( r'application/vnd.oasis.opendocument.text\b', ("works-with::text",)),
( r'application/vnd.oasis.opendocument.graphics\b', ("works-with::image:vector",)),
( r'application/vnd.oasis.opendocument.spreadsheet\b', ("works-with::spreadsheet",)),
( r'application/vnd.sun.xml.base\b', ("works-with::db",)),
( r'application/rtf\b', ("works-with::text",)),
( r'application/x-dbm\b', ("works-with::db",)),
)
Code that does its best to extract a mime type:
extractor = extractor.Extractor()
magic = magic.open(magic.MAGIC_MIME)
magic.load()
def mimetype(fname):
keys = extractor.extract(fname)
xkeys = {}
for k, v in keys:
if xkeys.has_key(k):
xkeys[k].append(v)
else:
xkeys[k] = [v]
namemagic = magic.file(fname)
contentmagic = magic.buffer(file(fname, "r").read(4096))
return xkeys.has_key("mimetype") and xkeys['mimetype'][0] or contentmagic or namemagic
Command line parser:
parser = OptionParser(usage="usage: %prog [options] filename",
version="%prog "+ VERSION,
description="search Debian packages that can handle a given file")
parser.add_option("--tagdb", default="/var/lib/debtags/package-tags", help="Tag database to use (default: %default)")
parser.add_option("--action", default=None, help="Show the packages that allow the given action on the file (default: %default)")
(options, args) = parser.parse_args()
if len(args) == 0:
parser.error("Please provide the name of a file to scan")
And here starts the fun: first we load the debtags data:
# Read full database
fullcoll = debtags.DB()
tagFilter = re.compile(r"^special::.+$|^.+::TODO$")
fullcoll.read(open(options.tagdb, "r"), lambda x: not tagFilter.match(x))
Then we scan the mime type and look up tags in the mime_map
above:
type = mimetype(args[0])
#print >>sys.stderr, "Mime type:", type
found = set()
for match, tags in mime_map:
match = re.compile(match)
if match.match(type):
for t in tags:
found.add(t)
if len(found) == 0:
print >>sys.stderr, "Unhandled mime type:", type
else:
If the user only gave the file name, let's show what Debian can do with that file:
if options.action == None:
print "Debtags query:", " && ".join(found)
query = found.copy()
query.add("role::program")
subcoll = fullcoll.filterPackagesTags(lambda pt: query.issubset(pt[1]))
uses = map(lambda x:x[5:], filter(lambda x:x.startswith("use::"), subcoll.iterTags()))
print "Available actions:", ", ".join(uses)
If the user picked one of the available actions, let's show the packages that do it:
else:
aptCache = apt.Cache()
query = found.copy()
query.add("role::program")
query.add("use::"+options.action)
print "Debtags query:", " && ".join(query)
subcoll = fullcoll.filterPackagesTags(lambda pt: query.issubset(pt[1]))
for i in subcoll.iterPackages():
aptpkg = aptCache[i]
desc = aptpkg.rawDescription.split("\n")[0]
print i, "-", desc
\o/
The morale of the story:
- Debian is lots of fun
- We have amazing tecnology just waiting for good ideas.
- I'd love to see more little scripts like this getting written.