Some Perl and problems with referral spammers
I have been spending a bit of time playing with Perl at the weekend and LWP::Simple and XML::Simple to parse RSS 1.0 and RSS 2.0 files so that I can extract the top 5 entries from my own blog to add links to my home page. I know it can be done with Javascript but I like the code to be under my control rather than using one of these sites that allow you to generate a Javascript call. My pages are also static HTML so I cannot use php to generate the entries on the fly which might have been easier long term. Anyway I got some perl running that generates a set of links and link text that I can now add to my index.html page. That part is not easy. I also need to integrate it into the blog software so that I can simply regenerate the index page every time I add a blog entry. I have also been looking at RSS aggregator software, perl again so that i can finally populate my newsgroup, blogs and news pages. I want to list some of the news sites and blogs and newsgroups / mailing lists I look at looking for Otracle security information. I then plan to add an aggregator as well to group the news sites and blogs that I watch on one page. That’s the advantage and power of RSS.
I have also spent some time yesterday looking at the problem of referrer spammers. I have been getting these people spamming me for a long time now but it’s now picking up to a much bigger issue. They use bots probably on compromised PC's that sent out requests to sites with forged referrer fields so that they can get either click through or increase their Google PR. A good paper on it is "Proposal on referrer spam: Background and blacklists". The problem is that I don't publish either blog referrer lists (links) or statistics that include referrer lists. These people seem to use a scatter gun approach and just spam blogs anyway. I guess this is the case as they are spamming my Web development site which has a blog. There again they have also been requesting pages from another site of mine that doesn’t have any content yet and also does not have a blog. Although I have said on its index page that i will add a blog. Google hacking is probably being used to find sites to spam. They are a real problem. One particular IP has sent 30,000 hits in the last few days. It doesn't increase my visitor numbers much so doesn’t really skew the figures I am interested in but the hits and the download sizes are getting ridiculous. These people consume a lot of bandwidth. The problem is hard to solve though. I have been working on some solutions yesterday, which I won't discuss the details of here for obvious reasons. Anyway normal Oracle security service should now resume!