How many fish in the pond? |
Each
day, the 2 guestbooks I'm responsible for receive 4 to 15 spams,
advertising such needful things as rolex replicas, gold coins,
ringtones; amitriptyline, atenolol, valium, tamiflu, cyclobenzaprine,
zoloft, clonazepam, codeine, alprazolam, furosemide, trazodone, even
methadone and morphine, of course viagra; girls (latinsex included...
whatever it is), Russian women; not to forget apartments in Bologna,
Milano, and Udine; antique oriental furniture; una bicicletta
elettrica, porno videos, lingerie; and most recently a 1.000 watt car
amplifier (just to mention the least monotonous stuff). |
What can
a poor webmaster do? Fortunately, I keep a copy of the data files on my
harddisk, and since these files are rarely changed by ‘regular’ entries
(3-5 per year), I start each day by simply replacing the new data files
infested by spam entries with the saved ones, which takes just a few
seconds. I could have been happy with that, but one day one of these
impudent spammers added the line: "To the guestbook! Muhahahahahaha!"
That was too much for me. I decided to do more against these
transgressions! |
By
comparing the exact times of the inappropriate guestbook entries with
the weblog statistics, I found out the sending IP addresses and added
them to the .htaccess file supervising the respective guestbook sites,
behind the words deny from (I
really denied the littering of ‘my’ guestbooks to these villains, from
the bottom of my heart!). After 13 days, I had collected 216 black
sheep on both blacklists together (interestingly, 19 of them appearing
on both lists). To my disappointment, however, both books still
received an unabating number of spam entries each day. |
With
this frustrating result, I could have given up, but these numbers
continued to intrigue me. Why did I find 19 IP addresses on both lists?
Since both lists have been roughly the same size, each of them might be
regarded as a random sample of size 108. Let's imagine a pond with an
unknown number of fish (this method is widely used to monitor
population densities in the wild, see Pollock et al 2002). Let's
further draw (e.g. with a net) a first random sample of 108 fish. This
number still doesn't say too much about the total number of fish
roaming these waters. But let's now assume further, that I mark these
fish and throw them back into the pond. The next day I draw again a
sample of 108. This time my sample contains 19 marked and 89 unmarked
fish. What might we think now about the total number? |
We might
imagine now, that we kind of ‘diluted’ the sample of marked fish in the
greater (unknown) reservoir of unmarked ones. The ‘degree of dilution’
can be calculated by relating the number of fish found marked in the
2nd catch (19) to the total number of that catch (108). Since we know
the total number of marked fish (108), we can calculate the total
number of fish in the pond by multiplying 108 with the ‘dilution
factor’ (108/19), resulting roughly in the number 614. This was good
news for me. While it might appear fruitless to embark into the
setting-up of a blacklist with several thousand entries, the number 614
was a glimpse of light at the end of the tunnel. So I kept on
collecting the sinners, until... |
... until I found out that many of these sinners don't give a dime to my deny to.
They simply continue to leave their garbage every 2nd or 3rd day in at
least one of these guestbooks. Apparently, they can do that even
without accessing the web site at all. Not only can I not prevent them
from doing so; also my nice calculation above is simply wrong, since it
was based (to a still unknown, but most likely large extent) on
habitual sinners. To remain in the picture with the fish in the pond:
My pond contains 19 fish that jump into my net the first moment they
see it. These 19 will always contribute to my catch, and they always
will escape to come again the next day. The rest I have never seen
twice. |
We
will have a new homepage soon. Including a new guestbook. A real good
one. At least that's what I've been told. In the meantime, I start
each day by simply replacing the new data files infested by spam
entries with the saved ones, which takes just a few seconds. |
Kenneth
H. Pollock, James D. Nichols, Theodore R. Simons, George L. Farnsworth,
Larissa L. Bailey, John R. Sauer (2002) Large scale wildlife monitoring
studies: statistical methods for design and analysis. Environmetrics
13: 105-119. |
Latest
news: Following valuable suggestions, I changed the name of the
guestbook.cgi file. The unwanted entries stopped immediately. My
blacklist will not grow further (at least for now...). |
7/06 < MB
(11/06) < 12/06 |
spampoetry
thorny blossoms of globalization |