Monday, June 05, 2006

DataMover tackles an INNER JOIN across the Internet

Data synchronization has always been a challenge; DataMover helps address this in Microsoft Access97. But one of the problems I have been facing is: how do you do an INNER JOIN or even a LEFT JOIN when one table is on your local machine, and the other table is thousands of km away, connected only by an internet connection?
The table is visible, using a VPN, but it contains half a million records, and you only want to update, say, 10 of the 20000 records you have locally. Until now I was just using an INNER JOIN, hoping the ODBC connection would be efficient enough to manage. It didn't. It was a bad idea to begin with, and so I spent most of Thursday night and Friday trying to make it work better.
The first task was to extract just the minimum data from the remote server using a stored procedure. The problem with this approach is that you can't simply create the stored procedure on demand, and then drop it when done, because another DataMover user could be busy with it. Once I had that problem out of the way, the next problem was to determine what the minimum data would look like. As it turns out, it's just the Primary Key and the DateTimeStamp field.
The next problem was getting just a single record from the remote server's table of half a million records. Again, a stored procedure works just fine, provided you send the primary key values as parameters to the query. The next thing that went haywire was working out when to restrict the queries even further with user-supplied filters, and when to ignore them.
I first tackled the SYNS keyword in DataMover, which synchronizes the local (master) table with the remote (slave) table. Eventually it worked, and nice and fast as predicted. The reverse direction proved to be bit more tricky. But it works, and the 14 remote clinics who are using it have already noticed the difference. For now.

No comments: