Geek Culture12 Feb 2009 11:32 am

Yesterday, I found this feed while searching for a way to be automatically notified when a new issue of a comic book that I like is available.

Unfortunately, it didn’t quite get the job done. For one, it presents all comics published on the same day as a single post. Also, as a result, it is very challenging to filter out the comics I don’t care about.

I wrote some code to parse the feed to give me an rss feed with posts for each comic, rather than one post for all comics:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
#!/usr/bin/perl
 
use XML::Simple;
use XML::Writer;
use Time::Local;
use Date::Format;
use Getopt::Long;
use strict;
 
my $generator = "ComicsFeedParser.pl @ARGV";
 
my $publisherfilter;
my $comicfilter;
my $pricefilter;
my $filter;
my $feedtitle = 'Published Comic Books';
my $silent = 0;
GetOptions(
    'publisher=s' => \$publisherfilter,
    'comic=s' => \$comicfilter, 
    'price=f' => \$pricefilter,
    'filter=s' => \$filter,
    'title=s' => \$feedtitle,
    'silent!' => \$silent,
    );
 
my $curlurl = 'http://feedproxy.google.com/comiclistfeed';
my $curloptions = $silent ? '--silent' : undef;
my $data = `curl $curlurl $curloptions`;
my $xs1 = XML::Simple->new();
 
my $doc = $xs1->XMLin($data);
 
my $pagelink = $doc->{channel}->{link};
 
my $writer = XML::Writer->new();
$writer->xmlDecl();
$writer->startTag('rss', 'version' => '2.0');
$writer->startTag('channel');
$writer->startTag('title');
$writer->characters($feedtitle);
$writer->endTag('title');
$writer->startTag('link');
$writer->characters($pagelink);
$writer->endTag('link');
$writer->startTag('description');
$writer->characters('Newly Published Comic Books');
$writer->endTag('description');
$writer->startTag('pubDate');
my @now = gmtime;
$writer->characters(strftime('%a, %d %b %Y %H:%M:%S %Z', @now, 'GMT'));
$writer->endTag('pubDate');
$writer->startTag('generator');
$writer->characters($generator);
$writer->endTag('generator');
 
foreach my $item (@{$doc->{channel}->{item}}) {
 
    my @gmt;
    if ($item->{title} =~ /(\d\d)\/(\d\d)\/(\d\d\d\d)/) {
 
        @gmt = gmtime(timelocal(0,0,0,$2,$1-1,$3-1900));
    }
    else {
 
        @gmt = gmtime;
    }
    my $date = strftime('%a, %d %b %Y %H:%M:%S %Z', @gmt, 'GMT');
    my $link = $item->{'feedburner:origLink'};
    my @lines = split("\n", $item->{description});
    my $publisher;
    my $publisherlink;
    foreach my $line (@lines) {
 
        $line =~ s/\s+/ /g;
        my $itemlink;
        if ($line =~ s/<a href="(.*)">(.*)<\/a>/$2/) {
 
            $itemlink = $1;
        }
        if ($line =~ /<p><b><u>(.*)<\/u><\/b>/) {
 
            $publisher = $1 unless ($1 eq 'PUBLISHER');
            $publisherlink = $itemlink;
        }
        elsif (defined $publisher) {
 
            my $comic;
            if ($line =~ /(.*)<br \/>/) {
 
                $comic = $1;
            }
            elsif ($line =~ /(.*)<\/p>/) {
 
                $comic = $1;
            }
 
            if ($comic) {
 
                my $title;
                my $price;
                if ($comic =~ /(.*), (.*)$/) {
 
                    $title = $1;
                    $price = $2;
                }
                else {
 
                    $title = $comic;
                }
                my $comiclink = $itemlink;
 
                my $filterresults = 1;
                if ($filter) {
 
                    $filterresults = eval $filter;
                    if ($@) {
 
                        warn $@;
                        $filterresults = 1;
                    }
                }
                if ((!$publisherfilter || $publisher =~ /$publisherfilter/) &&
                    (!$comicfilter || $title =~ /$comicfilter/) &&
                    (!$pricefilter || $price <= $pricefilter) &&
                    $filterresults) {
 
                    $writer->startTag('item');
                    $writer->startTag('title');
                    $writer->characters("$publisher: $title");
                    $writer->endTag('title');
                    $writer->startTag('link');
                    $writer->characters($comiclink || $publisherlink || $link || $pagelink);
                    $writer->endTag('link');
                    $writer->startTag('description');
                    $writer->characters("$publisher: $title, $price");
                    $writer->endTag('description');
                    $writer->startTag('pubDate');
                    $writer->characters($date);
                    $writer->endTag('pubDate');
                    $writer->endTag('item');
                }
            }
 
            if ($line =~ /(.*)<\/p>/) {
 
                $publisher = undef;
                $publisherlink = undef;
            }
        }
    }
}
 
$writer->endTag('channel');
$writer->endTag('rss');
$writer->end();

As you can see, there are a variety of filter flags that allow me to create customized feeds.

For example:

Obviously, the generic filter flag behavior would have to be changed before I could make this available to the public (−−filter “fork while true;”, for example, could be a problem). But it suits my purposes for now.

Trackback this Post | Feed on comments to this Post

Leave a Reply