Wednesday, March 21, 2012

Making "incremental" updates to a data warerhouse

We have a legacy database whose data needs to be included in our yet-to-be-built sql 2005 data warehouse. Some of the tables don’t have any natural candidates for a primary key. (Of course, we also need to add other data to the mix.)

Suppose we load the empty warehouse initially. In following loads we don’t want to include those records that haven’t changed from the first load (“duplicates”) but we also don’t want to delete the contents of the entire warehouse because of the load time.Any ideas/best practices how to handle “incremental updates” to a warehouse would be appreciated.

TIA,

Bill

<<tables don’t have any natural candidates for a primary key>>

The above situation is likely a fundamental design flaw for which there is no easy fix. What is the usefullness and business meaning of an entity for which there is no natural key? It is likely nonsense.

However, from your definition of the problem, it sounds more like the key would be the combination of all columns in the row. How else would you define what is a "new" row vs. a "changed" row? Isn't that what you are describing? If so, there is your key.

Ken

|||

You can use the Slowly Changing Dimension Wizard, an advanced transformation component, to create this type of process quite easily.

|||

Can you explain how "slowly changing dimensions" can be used to create this type of process? (I'm new to this stuff and I can't make the connection on my own.)

TIA,

Bill

|||

Here's a few references:

http://blogs.conchango.com/jamiethomson/archive/2005/06/06/1543.aspx (includes a demo)

http://sqljunkies.com/WebLog/tpagel/archive/2005/07/24/16195.aspx

Probably the best available reference is BOL.

-Jamie

No comments:

Post a Comment