So, a much quicker update after last week’s delayed entry. This past week I’ve been working on a project for Clear Digital. A relatively simple project that required setting up a Wordpress blog and re-skinning it to match the client’s requirements. I hadn’t played with Wordpress so much in quite a while so it was an interesting experience. Turned out not to be too difficult, making use of plenty of existing plugins to extend functionality. On a recommendation from Dave Coveney I used the Thematic theme. This is more of a tool than an actual theme itself. The theme you get is very simple but it allows lots of hooks to extend the theme and customise it how you like. I think there’s lots of themes that are based off this but I chose to create a new theme building on top of the very basics that Thematic provides, the better to match the client’s requirements. Wordpress provides “widgets” which are small UI elements that you can drop onto the page in various places. Things like tag clouds, a calendar of your blog posts, a list of Categories, and lots of others. Thematic provides quite a few different places that you can drop Widgets making it even easier to customise your blog.

In case it helps someone else, here’s a complete list of the plugins I used:

As well as this project, I also started out on a new personal project last Sunday. I intended it to be just a quick thing to try something out but it’s started taking a lot more time and resources than I expected. As you may know I have quite a few iPhone apps in the app store. Right now I’ve got 22 live on my own account and another that I did for a client under their account. Though Apple provide perfectly good sales statistics they don’t give any indication of how well you’re doing in their “Top 100” charts. Though much of the desire to know your position is due to vanity there are some uses to knowing, you can use it for marketing and if you reach the top 20 it’s a good reassurance that you’re going to make a reasonable sum of money from sales.

Apple don’t provide this information but a number of other people do. APPlyzer offer access to some of the data for free and require you to pay for more. An iPhone app called “PositionApp” also gives you some information and allows you to select favourite apps but still didn’t give me the information in a way that I liked, so I decided to write my own.

I had already found a perl script that would download the information for the Top 100s and would give me information for a specific app, category and country if I wanted. I was originally running this twice a day but unfortunately I hadn’t updated it to list some of my latest apps so when I found that two of my apps were in the Top 100 in the UK Education category I decided I needed a better option. If I was going to download the Top 100s I really ought to be putting them in a database them so that I could do more with them in the future.

I started by writing a script that would do the basic download of the XML and for some reason decided to throw the XML in the database for later parsing. Actually a large part of my reasoning behind this was having minimal time but wanting to leave something downloading data as I went off to OGGCamp. As it turned out storing the XML in the database was an incredibly bad idea, after a short while I had thousands of entries with 600KB of data in each meaning that an SQL query to request the latest download to check if it had changed took 15 seconds!

So, version two, parse the data straight away. The data was in XML so obviously the safest way to parse it was to use a proper XML parser. Because the file was pretty big I decided to go with a SAX style parser. After spending a while doing this and getting a completed parser going, I found that my XML parsing was taking over a minute! I’d already noticed that sometimes the HTTP request from Apple could take up 15 seconds and doing that 5000 times (for all the categories and countries) was going to take a long time, so an additional minute was terrible news!

Next day I decided to skip the “proper XML parsing” and go with a regex. After half an hour of coding I had something that would parse the entire 600KB file in less than a second, much better.

I’ve now been running this script four times a day for nearly a week. I’ve downloaded approximately 20 batches of data in that time. Each batch is pretty big as I’m querying 40 categories in 62 countries for two types of app (free and paid), which comes to 4960 requests four times a day! Each of those requests then generates 100 positions entries meaning I now have over 10 million position entries in my database. This quantity of data has been causing its own problems but so far I’m keeping on top of them. Yesterday I added a few more indexes to the tables and converted the tables from InnoDB to MyISAM. This gave much better results. The 6pm batch yesterday took 5 hours to run whereas the midnight batch took 1 hour 45 minutes and the 6am batch took just an hour to run. I’m also coincidentally hoping to move to a more powerful server this weekend so that should help too.

So, future plans for this data? Well basically I’m not sure how much effort I’m willing to put into it. The main thing that I want to get out of it is positions for all of my apps on some sort of regular basis, and the ability to query history for apps even if I haven’t specifically remembered to add them to my list. Other people might have other ideas of things to do though so I’m intending to dump the data out into some basic form, CSV most likely, and make it available to download. Hopefully I’ll get around to putting a web interface on this to allow people to look for information on their own apps or even register to get emailed position updates but any of that will be time permitting, and I’ve got lots of work to keep me busy!

If you’re interested to know though, Basic Sat Nav is continuing to do well in the UK Navigation category, hovering around the number 10 mark and hovering around the 60 mark in Ireland. My GCSE and A-Level revision apps are doing nicely in the run up to the exams, none of them getting particularly high in the Top 100 but most of them making appearances in various positions. Even iFreeThePostcode is sitting at number 60 in the UK Free Navigation category.

I’ll be talking about this project a little at NSManchester on Monday night so go along to that if you’re interested to know more.