« Tease | Main | Google Analytics follow-up »

Google Analytics + apache configuration mini - howto

I’ve been thinking of setting up google analytics on my site for a while now and have finally taken a few moments to get started. I'm curious to see where folks are coming from and am kind of tired of writing one-off scripts to grep my apache logs.

If, like me, you have a site made of many many html pages, this one’s for you. (DO please make a backup of your site and configuration files before you start).

In order for google to analyze a given web page, you need to add a short javascript to each. It looks something like this:


<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>
<script type="text/javascript">
_uacct = "XX-XXXXXXX-X";
urchinTracker();
</script>

In order to easily add this text the each of my pages, I chose to use Apache's 'Server Side Includes . This is an easy way to get a chunk of text to be included in many files. Put the javascript that google gives you into a file called (say) includes.html and drop it in your apache document root. (I chose to put mine in my /data/ subdirectory as it works best for the structure of my site).

Each html document you want google to track will need to include that file’s text immediately before the </body> tag through the placement of a small SSI tag.

It will look something like this:


<!--#include virtual="/data/included.html" -->

To do such a massive replace, we’ll need a little Perl (yes, Juan PERL!) and bash.

perl -p -i -e 's/<body<!--#include virtual=\"\/data\/included.html\" -->\n <body/g' `find . |grep \.html`

This finds all html files, then performs a mass insertion before the </body> tag with the SSI include. Apache will take the include and replace it with the contents of the file.

If SSI isn't yet configured on your apache box, there is a simple set of tricks that will make life MUCH easier. Assuming that you have compiled mod_include (most apache installations have this), you simply need to add

 Options +Includes 
to your site configuration. I placed it in my virtual hosts file along with my other Options directives. The real juice is here: place
 XBitHack on 
in your apache configuration as well. This will make apache parse any file (looking for SSI includes) with the execute (remember your unix permissions?) bit enabled. You will then set the execute bit on all of your html pages like so:

cd your/apache/htdocs
chmod +x `find . |grep \.html`

That should do it. We have written a small includes.html file to contain google's javascript, done a massive search and insert on all of your html files, setup apache to use SSI with XBitHack, and changed the execute bit of the html files so apache will parse them. Now, when you look at the source of your page as served from apache, it should include the javascript from the includes.html file within each page.

Note: XBitHack will not work on windows for obvious reasons. There is no executable bit on windows. See the apache tutorial referenced above for other methods to enable SSI if you're on windows.

TrackBack

TrackBack URL for this entry:
http://www.jonathansaggau.com/blog/mt-tb.cgi/90