Archives » May 29th, 2004

May 29, 2004

Web Stats

I like web stats. I like seeing how many people visited my site and where they came from. But I’ve always found it so inconvenient to work with server-generated log files. Right now my host gives me a monthly access log, and a yearly log. They also have Webalizer setup, and every night it runs and generates a report. The problem is, I don’t like that Webalizer report. For one thing, it only breaks down your traffic by month. So all your visits for the months are lumped together. It gives you the daily hit count, but when it comes to things like which pages were looked at, which referrers were logged, and which user agents hit your site, those are all lumped in for the whole month. And even worse, it only shows the top 15 UAs, and the top 30 referrers. And of course all of my visits are logged in there, and since I’m my site’s biggest visitor, that skews everything, showing how many times I visited my Movable Type administrative page. Well, that’s useful information!

So the Webalizer reports are dead to me. They’re good for looking at monthly trends, but that’s not usually what I want. What I really want out of a log report is this: it would have to be broken down daily, for one thing. For each day, it would list the number of visits, it would show every single user agent that came in, and how many times, and it would list all the referrers. And it would ignore my own visits, since I don’t care how many times I looked at the site. And it would only list HTML pages, since I don’t care how many times my CSS was downloaded. Simple stuff. Hard to get.

Now, I know Webalizer can supposedly be configured to do anything you want. There’s a config file with enough options to make everyone dizzy. But I have no control over the config file at my host. So what I would have to do is download the logfile, and run it through Webalizer on my own computer. Right now, at the end of May, my monthly logfile is 19MB. The yearly one is 65MB. I’m supposed to download those anew every time I want to check my stats? Not bloody likely. So that’s the other big must-have on my ideal logfile: it has to run at my host, and it has to be up-to-date when I request it. So it has to be regenerated each time I view it, and it has to be one-click simple. Follow a bookmark, and there’s my stats as of this second.

I never did find any stats package that would do all that. So what do you do when you reach the end of the road? Build your own! So that’s what I did. And just to be even more ornery, I don’t use the server-generated logfiles. I built a MySQL database and a script that logs every pageview. It may be the long way around the problem, but it’s the only way to get just what I want.

So now I have my new stats page. I even have a way to separate robots and spiders from browsers, so I can see just how many actual humans are looking at my site, compared to bots. It’s all based on some code I wrote for my old ASP-based host, but I had to convert everything from ASP/MSAccess code to PHP/MySQL. Not that difficult, really, just gruelling. And I’m sure there will still be a few bugs. I’ve been stomping them for two days, and I’m still a little green at PHP, so I know there’s something I overlooked.

Now I can finally keep up on my referrers, and see if anybody new’s been linking to me. Usually not. I can also go back to seeing what Google searches point to me. Like today: what does glutton mean?. Always useful.