Foxmarks: Synchronization Heuristics
From Foxcloud Wiki
This page documents some thinking about how to improve the timing of automatic synchronization.
Contents |
Current Status
We've decided to implement the synchronization heuristics described here, but we're waiting for the Cosmo team to provide etag support for if-match and if-none-match. They expect this to be available in the next maintenance release, the week of 11/7/2005. Until then, this is on hold.
- This appears to be working Cosmo 0.2.2, so we're now preparing to deploy this. As part of this change, we're going to clean up the top-level algorithm first.
Current Proposal
The overarching goal described here is to make working with Foxmarks feel easier. Especially for users with many bookmarks (our ideal customer), the sync operation feels heavy. The way we've implemented automatic sync is especially inconvenient.
The approach described here is a two-pronged attack: offer better choices than "at startup and at shutdown" for automated sync, and make the sync operation itself feel much more lightweight.
Automatic Sync Options
In early versions of Foxmarks, we had a timer option in the UI that was never implemented. I propose bringing it back. The UI would simply add a field into which the user could enter the frequency of auto sync, probably in minutes. While Firefox is running, Foxmarks will attempt to auto-sync every X minutes according to this setting.
The "On Startup" and "On Shutdown" options could be revised somewhat in this scheme. On Shutdown, for instance, we could perform a check to do a sync only if the local datastore is dirty. Since you're shutting down, you don't really care about reading changes from the server (if there are any); you really only care about making sure that any changes you made locally get uploaded. If the user hadn't changed anything locally, this option would be entirely silent.
I'd modify On Startup slightly to skip sync unless it'd been more than X minutes since the last recorded sync. Again, this avoids a sync entirely if you've synced within X minutes, turns into a lightweight sync in most cases (see below), but guarantees that on startup you've got the latest and greatest off the server.
Make sync more lightweight
The big deal here is to be more intelligent about use of network resources, as this is the big performance drain. Before we do a sync, we can determine whether the server file has changed since we last wrote to it by storing the file's etag. When retrieving the file, we only retrieve it if the etag no longer matches.
If the etag matches, what we do next depends on the dirty state of the local datastore. If the local datastore is clean, we're done. If the local datastore is dirty, we write to the server a copy of the local datastore (updating the last-modified-dates of each item, of course).
If the etag doesn't match, we have to retrieve the entire file. At this point, we might as well do a standard sync. Although we could probably save a few cycles by simply copying to the local datastore if the local datastore is clean, this hardly seems worth it to me. Actually, if the local datastore is clean, we don't want to write the server file after we're done with the sync. Not only is there no reason to write the file (as, by definition, it will be unchanged), but writing the file would cause the etag to change, which would force the other clients to read the file, which they themselves would then write, etc.
Internal notes
The goal is to make synchronization as transparent as possible, while being reasonably responsible with use of network resources.
In an ideal world, every time a user made a change a bookmark, that change would be instantly propogated from that user's machine up to the server and then down to each of the synchronized clients. There are at least two reasons why this is impractical. First, we don't have any kind of notification system to use, so clients need to poll to determine whether there's something new to pull down.
Second, we don't yet have any way to detect the absence of changes in the server file without downloading the whole thing. (This, presumably, could be addressed for webDAV servers via eTags or If-Modified-Since.)
Finally, while we can monitor the local datastore to watch for changes, we have to be careful about how we treat Livemarks, as they tend to change frequently on their own, but those changes aren't ones that we need to propogate.
So, what about the following:
1) After a "meaingful" local change happens, start a timer. If another meaningful local change happens, start the timer again scratch. If the timer fires, perform a synch. If the browser commences shutdown with the timer outstanding, perform the synch as we shut down.
2) Periodically, poll the server for changes -- this should mean simply performing a synch, but one that gets aborted if the file is determined (by the server) not to have been modified since last synch. (This can only mean eTag). If at startup, we determine that we've gone longer than our desired poll interval without synching, invoke synch (after a suitable delay to avoid startup issues).
- Okay, after thinking about this a bit, I've got a slightly different take on the implementation. First, forget about observing changes. Just make the sync operation itself ultra-lightweight for most cases, and perform the sync regularly. First, check to see if there are local changes by building the dirty list. If the list is empty (meaning that the local datastore is clean), do an etag get of the server file. If it returns unchanged, we're done. If the local datastore is dirty or the server file's etag no longer matches, we're off to do a complete sync.
- Special case processing for shutdown: if the local datastore is dirty on shutdown, perform a full sync.
- Special case processing on startup: if the server file is dirty on startup, perform a full sync.
- More optimization: if, in doing a sync, we see that the etag matches what we've currently got, don't bother download the file; just do a write (with updated last-mod-dates) of the local file.
- Seems like this requires a bit of restructuring of the top-level sync code; need to get sync more involved in network operations, something of which is currently blisfully unaware. This okay; just layer it properly.
--Todd 17:46, 31 October 2005 (EST)