Here is how to setup liferea not to show me some entries in Planet Debian:
- Create a script that reads the rss from stdin, removes the entries you don't want and then writes the rss to stdout;
- From the feed properties in liferea, choose the source tab, enable the conversion filter and point that at your script.
Now you just need a simple script that filters the RSS. Here is mine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #!/usr/bin/python # Copyright (C) 2007 Enrico Zini <enrico@debian.org> # This software is licensed under the therms of the GNU General Public # License, version 2 or later. import libxml2, re # What links we should filter out unwanted = re.compile(r"^(http://feed1.example.com|http://feed2.example.com)") doc = libxml2.parseFile("-") root = doc.getRootElement() # Create an xpath context and register the namespaces xpc = doc.xpathNewContext() for d in root.nsDefs(): if d.name == None: xpc.xpathRegisterNs("rss", d.content) else: xpc.xpathRegisterNs(d.name, d.content) # Remove unwanted items from the channel list for x in xpc.xpathEval("/rdf:RDF/rss:channel/rss:items/rdf:Seq/rdf:li"): res = x.nsProp("resource", "http://www.w3.org/1999/02/22-rdf-syntax-ns#") if unwanted.match(res): x.unlinkNode() x.freeNode() # Remove unwanted items from the item list for x in xpc.xpathEval("/rdf:RDF/rss:item"): res = x.nsProp("about", "http://www.w3.org/1999/02/22-rdf-syntax-ns#") if unwanted.match(res): x.unlinkNode() x.freeNode() # Serialize the result print doc.saveFormatFile("-", True) |
Now, getting to this simple script took some spitting blood. Basically, in Debian we seem to have lots of simple libraries for:
- parsing rss, but not outputting it;
- outputting rss, but not parsing it;
- pasing and outputting rss, but not modifying it.
I tried, in order:
- The standard ruby rss module,
after seeing this.
However,
rss.channel.items
doesn't seem to be a normal array anymore, and I could not find any documentation on how to modify it. - python-feedparser allows you to read rss and change it, but not to serialize it.
- libxml-rss-perl can read,
modify and serialize, but serializing loses all the content of the items.
Try this script and see:
1 2 3 4 5 6 7 8 9 10
#!/usr/bin/perl -w use strict; use warnings; use XML::RSS; my $rss = new XML::RSS; $rss->parsefile("/tmp/rss10.xml"); print $rss->as_string;
Update: Nemui Ailin told me that with the most recent upstream version it works. I've reported the bug
- libxml-rsslite-perl does not serialize. Plus, it parses rss via crude regexps and its manpage has a longish list of things that can go wrong.
- libmrss0-dev has only a README that points to example files that are not packaged. I reported it as a bug.
- The description of any other module that I could find that would mention rss was quite clearly showing that it didn't support one of the three (read, edit, reserialize) features that I needed. With a quick look at the code, I couldn't find out if cl-rss supported serialisation.