I'm trying to normalize some data by creating rows in a dimension table as I find new values in the original table.
My first thought is to use a Lookup to replace string values with id's to the Dimension table, and when the Lookup fails, insert the value into the Dimension table with a new key.
Does that make sense? This would be relatively easy to do with TSQL, but surely there is a way to do this in SSIS.
Thoughts?
Greg,
Why you do not, first hand, create a default(error, unknown, etc) member in every dimension and then if the lookup fails assign the id of the default. Anyway, I think that has been disscussed in previous threads; here is one.
http://forums.microsoft.com/msdn/showpost.aspx?postid=75211&siteid=1
Rafael Salas
|||Thanks! I will read that thread...Responding to your suggestion, however, I can't use a default value because if the lookup doesn't work then a new record needs to be inserted into the Dimension table.
For example, if my row has "XXX" as a value and that value isn't in the Dimension table, I want to insert the row with the KEY rather than the VALUE("XXX") and insert the KEY/VALUE pair into the dimension table
Basically, I want to do an insert into dimension (select distinct value from facts) before I do a lookup to ensure that all lookups will be successful.|||
I read that thread, and the example that Thomas Pagel blogs on is what I'm looking for...
however, this really seems like an unnecessarily complicated approach.
Since I'm new to this, I'm inclined to think that I'm making it more difficult that it needs to be, but can't think of another approach.
If I was scripting this in TSQL, or SQL, I'd do a select of the distinct values from the FACTs table, insert those into the dimension, and then for each row in the FACTS table replace the value with the key from the new Dimension table.
Is this approach inconsistent with how SSIS is intended to be used?
|||Greg Akins wrote: however, this really seems like an unnecessarily complicated approach.
Since I'm new to this, I'm inclined to think that I'm making it more difficult that it needs to be, but can't think of another approach.
I really think this approach over complicates things. Not matches between a fact and its dimensions should be an exceptional case rather than the driver of your dimension load process.
Greg Akins wrote: If I was scripting this in TSQL, or SQL, I'd do a select of the distinct values from the FACTs table, insert those into the dimension, and then for each row in the FACTS table replace the value with the key from the new Dimension table.
I would to exactly the same in SSIS; in general dimensions should be processed before the facts, even if like in your case, they come from the same source.
Greg Akins wrote: Is this approach inconsistent with how SSIS is intended to be used?
Not sure if there is an answer for that...
Rafael Salas
|||OK. Thanks alot. I appreciate your advice.
No comments:
Post a Comment