Yesterday, I found this feed while searching for a way to be automatically notified when a new issue of a comic book that I like is available.
Unfortunately, it didn’t quite get the job done. For one, it presents all comics published on the same day as a single post. Also, as a result, it is very challenging to filter out the comics I don’t care about.
I wrote some code to parse the feed to give me an rss feed with posts for each comic, rather than one post for all comics:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | #!/usr/bin/perl use XML::Simple; use XML::Writer; use Time::Local; use Date::Format; use Getopt::Long; use strict; my $generator = "ComicsFeedParser.pl @ARGV"; my $publisherfilter; my $comicfilter; my $pricefilter; my $filter; my $feedtitle = 'Published Comic Books'; my $silent = 0; GetOptions( 'publisher=s' => \$publisherfilter, 'comic=s' => \$comicfilter, 'price=f' => \$pricefilter, 'filter=s' => \$filter, 'title=s' => \$feedtitle, 'silent!' => \$silent, ); my $curlurl = 'http://feedproxy.google.com/comiclistfeed'; my $curloptions = $silent ? '--silent' : undef; my $data = `curl $curlurl $curloptions`; my $xs1 = XML::Simple->new(); my $doc = $xs1->XMLin($data); my $pagelink = $doc->{channel}->{link}; my $writer = XML::Writer->new(); $writer->xmlDecl(); $writer->startTag('rss', 'version' => '2.0'); $writer->startTag('channel'); $writer->startTag('title'); $writer->characters($feedtitle); $writer->endTag('title'); $writer->startTag('link'); $writer->characters($pagelink); $writer->endTag('link'); $writer->startTag('description'); $writer->characters('Newly Published Comic Books'); $writer->endTag('description'); $writer->startTag('pubDate'); my @now = gmtime; $writer->characters(strftime('%a, %d %b %Y %H:%M:%S %Z', @now, 'GMT')); $writer->endTag('pubDate'); $writer->startTag('generator'); $writer->characters($generator); $writer->endTag('generator'); foreach my $item (@{$doc->{channel}->{item}}) { my @gmt; if ($item->{title} =~ /(\d\d)\/(\d\d)\/(\d\d\d\d)/) { @gmt = gmtime(timelocal(0,0,0,$2,$1-1,$3-1900)); } else { @gmt = gmtime; } my $date = strftime('%a, %d %b %Y %H:%M:%S %Z', @gmt, 'GMT'); my $link = $item->{'feedburner:origLink'}; my @lines = split("\n", $item->{description}); my $publisher; my $publisherlink; foreach my $line (@lines) { $line =~ s/\s+/ /g; my $itemlink; if ($line =~ s/<a href="(.*)">(.*)<\/a>/$2/) { $itemlink = $1; } if ($line =~ /<p><b><u>(.*)<\/u><\/b>/) { $publisher = $1 unless ($1 eq 'PUBLISHER'); $publisherlink = $itemlink; } elsif (defined $publisher) { my $comic; if ($line =~ /(.*)<br \/>/) { $comic = $1; } elsif ($line =~ /(.*)<\/p>/) { $comic = $1; } if ($comic) { my $title; my $price; if ($comic =~ /(.*), (.*)$/) { $title = $1; $price = $2; } else { $title = $comic; } my $comiclink = $itemlink; my $filterresults = 1; if ($filter) { $filterresults = eval $filter; if ($@) { warn $@; $filterresults = 1; } } if ((!$publisherfilter || $publisher =~ /$publisherfilter/) && (!$comicfilter || $title =~ /$comicfilter/) && (!$pricefilter || $price <= $pricefilter) && $filterresults) { $writer->startTag('item'); $writer->startTag('title'); $writer->characters("$publisher: $title"); $writer->endTag('title'); $writer->startTag('link'); $writer->characters($comiclink || $publisherlink || $link || $pagelink); $writer->endTag('link'); $writer->startTag('description'); $writer->characters("$publisher: $title, $price"); $writer->endTag('description'); $writer->startTag('pubDate'); $writer->characters($date); $writer->endTag('pubDate'); $writer->endTag('item'); } } if ($line =~ /(.*)<\/p>/) { $publisher = undef; $publisherlink = undef; } } } } $writer->endTag('channel'); $writer->endTag('rss'); $writer->end(); |
As you can see, there are a variety of filter flags that allow me to create customized feeds.
For example:
- ComicsFeedParser.pl −−comic “Buffy.*Season Eight” −−title “Buffy Season 8″
- ComicsFeedParser.pl −−publisher MARVEL −−title Marvel
- ComicsFeedParser.pl −−comic Powers −−title Powers
- ComicsFeedParser.pl −−comic Fables −−title Fables
Obviously, the generic filter flag behavior would have to be changed before I could make this available to the public (−−filter “fork while true;”, for example, could be a problem). But it suits my purposes for now.