2011-11-27

All those SQLite databases...

Many current application use the SQLite database for tracking information; this includes F-Spot, Banshee, Hamster, Evolution, and others.  Even the WebKit component uses SQLite [you might be surprised to discover ~/.local/share/webkit/databases].  It is wonderfully efficient that there is one common local data storage technique all these applications can use,  especially since it is one that is managable using a universally known dialect [SQL]. But there is a dark-side to SQLite.  Much like old Dbase databases it needs to be vacuumed.  And how reliably are all those applications providing their little databases with the required affection?  Also, do you trust those lazy developers to have dealt with the condition of a corrupted database?   If an application hangs, or is slow, or doesn't open... maybe that little database is corrupted?
Aside: As a system administrator for almost two decades I do not trust developers. They still put error messages in applications like "File not found!".  Argh!
On the other hand SQLite provides a handy means of performing an integrity check on databases - the "PRAGMA integrity_check" command.  I've watched a few of these little databases and discovered that (a) they aren't often all that little, and (b) manually performing a VACUUM may dramatically reduce their on-disk size.  Both these facts indicate that developers are lazy and should not be trusted.
Note: in at least one of these cases the application has subsequently been improved. Developers do respond rather quickly when offered a blend of compliments spiced with bug reports.  No, I'm not going to name offending applications as that is too easily used as fodder by nattering nabobs.  And even the laziest Open Source developer is working harder than their proprietary brothers.
In light of this situation my solution is a hack - a Python script [download] that crawls around looking for SQLite databases.  First the script attempts to open the database in exclusive mode, then it performs an integrity check, and if that succeeds it performs a vacuum operation.  Currently it looks for databases in "~/.local/share" [where it will find databases managed by application appropriately following the XDG specification], "~/.cache", "~/.pki", "~/.rcc", and "~/.config".
Download the script and run it. Worst thing that happens is that it accomplishes nothing.  On the other hand it might recover some disk space, improve application performance, or reveal a busted database.