Sunday, July 02, 2006

Many-to-many Relationships in Database Design

The problem of representing many-to-many relationships in a database is not a trivial one. And it can require several ingenious approaches to prevent the complexity from getting completely out of hand.
Take, for example, the simple question of the many properties of an object. Traditional object design says there should be a field for each property. But this isn't always practical, or even desirable. If there are hundreds of properties it becomes a nightmare. If a lot of the properties have a boolean or near-boolean state, the practical answer may be simpler.
In a call centre program I started off noting various events that took place in a call, so that the calls could be categorised by various criteria. I used an existing "StatusCode" field that was a text length of 8, because all I wanted to record was whether the caller spoke to the client or not, whether the client made any bookings, and whether they had any complaints. I used a different letter of the alphabet for each event type, and since thre wereless than 8 events it all worked out fine.
Now the program is in use and the number of categories have grown, so the field has been extended to 20 letters, and the number of categories in use has grown to around 18. Now the wheels are starting to fall off because the letters in use don't match up to the category description that well.
When the many-to-many relationship can be represented by a simple combination, and the number of elements in one side is not too great, the "is a member of" idea can be represented by letters, or even lists with check boxes next to them. But what does one do when the number of elements in the relationship is seemingly large? An example would be the use of key words. Since there are tens of thousands of works that are possible, how does one deal with them in a way that is simple, yet effective?
Several key words could all be stored in a single field, separated by something like a comma or tilde: "quick,brown,dog" or "quick~brown~dog", although for searching purposes it would be better to write "~quick~brown~dog~" so that a search for "~dog~" would work even if the word was at the beginning or end of the list.
The next question arises: does one sort the list alphabetically, or does it even matter? Also, how does one validate the words? A spell check, perhaps? Or a table with a list of valid words? What about invalid words, such as spelling mistakes?
The more traditional approach to the many-to-many combination is to have a table that contains the primary keys of both tables. That's fine if the table also contains other fields, such as quantity or date. But if it's a simple combination without too many possibilities, it seems clumsy to have a header-details screen (the kind you get with invoices and invoice items) when a simple list with tick boxes will suffice.
I guess the point of this ramble is that a one-size-fits-all approach doesn't work, and that's one of the biggest problems with Mustang. It's one I intend to solve with Open Access. The question is: how many options should be catered for, and which options to leave out?

No comments: