Big iOS Database, Week 142 (was 145)

Another week spent on the Chess Viewer app. One of the features that I’m adding is support for searching within the chess games, specifically of the filenames and the headers that are generally attached to each game. In the original app I was reading all of this information into memory and storing it in a big in-memory data structure of arrays and dictionaries. This made the searching fast but also made it complicated and could result in having very big data structures in memory. To fix this I decided to use Core Data.

Core Data is the “Model” part of the Model View Controller paradigm in iOS and is the recommended way to store your data. It’s generally backed by an sqlite database so can be quite quick but as it’s an Object Model can be simple to work with from code without having to use SQL. I ported the code over to use it pretty quickly and then sent a build over to the client to try out. Unfortunately the app crashed for them soon after launching, in fact they couldn’t get it to launch at all. It turned out that as the code was parsing the chess files and building up the Core Data data model it was using excessive amounts of memory. It could also be really slow, some files (specifically The Week in Chess) can hold around a thousand games, each of those games will then have an average of ten headers each meaning you end up with around ten thousand headers to store. This alone causes the parsing to be slow and use memory, but if you have multiple of these files (and TWIC is updated weekly) it could cause real problems. This was mostly a problem on the iPhone 3G but it would be silly to refuse to support it.

In the end I decided to use sqlite without going via Apple’s Core Data model. Perhaps there was a way to optimise the existing code but I couldn’t find it in the days I spent on it. Considering the search function was supposed to be a half day task I really needed to move on! Previously I’ve tried using the sqlite functions directly, these are a real pain to work with so it’s fortunate that I came across FMDB quite recently. This is a library that wraps the C functions of sqlite with a nice Objective-C interface making things much easier to work with. Again I managed to port the code over to use this without too much difficulty… I still had to make various optimisations to make sure that the big imports don’t take too long and that the searching would be as fast as possible. It’s “funny” when spending so much time on something like this, it’s only really going to affect people upgrading to the latest version that have big files, as this is when the database will be built initially. When downloading files in the future it shouldn’t be so bad considering there’ll only generally be one file being processed at a time.

In case anyone else has issues I thought I’d paste in some snippets (uninteresting bits of code replaced with comments) to show how I ended up doing things:

// OPEN THE DATABASE
    
// All the insert statements are the same so this makes sure
// sqlite doesn't have to parse them repeatedly
[db setShouldCacheStatements:YES];

// This stops sqlite from writing to disk so much, increases the
// chance of db corruption but if we die half-way through an
// update we'll need to start from scratch anyway
sqlite3_exec([db sqliteHandle],
    "PRAGMA synchronous=OFF", NULL, NULL, NULL);
sqlite3_exec([db sqliteHandle],
    "PRAGMA journal_mode=MEMORY", NULL, NULL, NULL);
sqlite3_exec([db sqliteHandle],
    "PRAGMA temp_store=MEMORY", NULL, NULL, NULL);
// Use memory as much as possible, <a href="http://www.sqlite.org/pragma.html">more on sqlite pragmas here</a>

// I actually tried to make sure I didn't use any autoreleased
// objects but there was one I couldn't avoid and could be
// some in code I didn't control anyway.
// Having my own pool means I can release these periodically
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];

// This monitors how many inserts we've done
NSUInteger numUpdates = 0;

// Tell the database not to commit the inserts straight away
[db beginTransaction];

// FOR EACH FILE

    // FOR EACH GAME IN THE FILE

        // INSERT ROW IN DATABASE FOR FILE

        // FOR EACH HEADER IN THE GAME

            // INSERT ROW IN DATABASE FOR HEADER

            ++numUpdates;
            if (memoryWarning || numUpdates > 5000) {
                [db commit];
                [db beginTransaction];
                memoryWarning = NO;
                numUpdates = 0;
                [pool drain];
                pool = [[NSAutoreleasePool alloc] init];
            }

So what’s happening there is every 5000 inserts for a file we’re committing the transaction to the database and clearing the auto release pool. This way we should hopefully not run out of memory but will run as fast as possible. You’ll notice there’s a memoryWarning flag too, this gets set separately by a notification, so if we get a memory warning before we’ve done 5000 transactions we’ll still commit the transaction and clear the pool to clear out as much as possible.

And, finally, more progress on my Arduino control system, I’ve got all the bits working now so that I can control my central heating and hall light from a website. The UI still needs a lot of work but it essentially matches the abilities of the original central heating controller and that’s the minimum I need in the short term. I’ll be putting some blog posts up with build logs soon.