Sunday, September 10, 2006

Common Ways of Data Synchronization

The problem of data synchronization is critical when it comes to develop disconnected systems. Although there are several ways to solve it, there are no good decisions. They can be only worse and the worst.
The article Making your smart client application work in a disconnected environment By Steve Joubert considers the problem from the smart application architecture angle.
In the article the application can work both on-line and off-line. The second way is much more problematic.
The user starts creating new data and modifying or even deleting existing data. When the application connects back to the network, how do you get the new and modified data from the client data store to the server? How do you know what has changed? How do you deal with identity field assignments on the client that may not match those on the server?
First, be sure to synchronize only the data that has changed. A few approaches you can take include database delta, change logging, or data flagging.
Obviously in all three approaches you must consider identity fields because the ID assigned to a record on the client will probably not match the one assigned on the server. In fact, in a busy system, the ID will likely have been used before you synchronize. Another consideration is the order in which the tables are synced. If a hierarchy or table exists, you must respect that structure and start at the appropriate place.
In the database delta approach, the system must look at each database table that can be synchronized and evaluate if a record has been changed, inserted or deleted. Changed records are easiest to identify and update. New records need to take into account identity fields, particularly if the ID field is referenced in other places. It then becomes the responsibility of the data synchronization controller to ensure that the new ID, assigned by the server, cascades to the referencing tables before the data is synchronized.
Deleted records are more problematic. You can’t assume that if a record exists on the server and not on the client a delete should be performed. If someone added a record while you were disconnected, it would be wrongfully deleted.
One way to implement database delta synchronization is to use a time-stamped field on each record and note the time of the last synchronization. Then any record whose time stamp is greater than the last synchronization is either updated or inserted. Any record that exists on the server with a time stamp that is less than the last synchronization time and does not exist on the client can be deleted. While this approach is workable, it isn’t very elegant and can be dangerous if not implemented carefully.

On the other hand, change logging is a safer method. Think of it as implementing your own SQL Server transaction log. Every action you take against the local database is logged as a synchronization record. When you are connected to the server again, the synchronization records are played against the server and the data is synched.

Data flagging is another option. It is safer than the database delta approach, and while less elegant than change logging, it is simpler to implement. With data flagging, each table that can be updated is given an extra column—a flag that tells the status of the record, whether it is unchanged, updated, inserted or deleted. In addition, any table with an identity field has a GUID field added to it.
When the application inserts, updates or deletes a row, the status of the record changes appropriately. When a record is inserted into a table with an identity field, however, a GUID is used instead. The GUID relates to other tables rather than to the identity key until synchronization occurs. Then, during synchronization, the identity field is assigned on the server and can be matched to related records via the GUID. Synchronization is simplified because each record that needs to participate is clearly marked and indicates the operation that needs to be performed.

Another approach is data locking. You create a flag alerting the server that a record or set of records is locked by a user; all other users have read-only access to the information. This approach is restrictive, but it enables you to identify the data sets that need to be loaded onto the client and eliminates concurrency issues.
When synchronizing data back to the server, you must have a set of rules by which the data is transferred. Problems can occur when the record you are updating or deleting has already been altered. Do you overwrite the data? Discard new changes? Let the client decide?
The “last-in” method is the easiest to implement but also the best way to lose valuable changes. In this method, the last person to synchronize or save his or her data controls the values in the record. Your updates overwrite previous updates and your delete removes records—even those that have been modified.
“First-in” disallows changes to records that have been modified since the start of the disconnected work. The idea is to flag records that need to be reviewed and enable the user to view the new version of the record and re-apply his or her changes. You may also want to build in more intelligence and enable the user to do an on-the-spot, side-by-side comparison of the two records and select needed values from each record before updating.

Thus, connected and disconnected data access mechanism is not a small task anymore...

No comments: