Wednesday, March 7, 2012
Inserting/Updating and locking
complaining that their SELECT queries on the same tables are blocking (not
finishing) until our INSERT or UPDATE finishes.
Is there any way to tell an INSERT or UPDATE to not lock the whole table so
that SELECT's can still be run on the table?Mike W wrote:
> We are inserting and updating large amounts of rows and find that
> users are complaining that their SELECT queries on the same tables
> are blocking (not finishing) until our INSERT or UPDATE finishes.
> Is there any way to tell an INSERT or UPDATE to not lock the whole
> table so that SELECT's can still be run on the table?
That's probably not the problem. While it's likey that SQL Server is
escalating row locks to page locks and even possibly a table lock if
enough rows are affected by the insert/update, while that transaction is
running, there are exclusive locks on those rows/pages/table.
When a page has an exclusive lock, no other readers or writers can touch
the page. They are blocked until the update/insert transaction
completes. That is, unless they use the read uncommited or NOLOCK table
hint on the tables in the Select statements. But since you are updating
information, is it ok for your users to read dirty data? I don't know.
That's up to you to determine. Dirty data arises when data is
updated/inserted in a transaction and another user reads the data using
read uncommitted isolation level. if the update transaction then rolls
back the changes the user that selected the data is staring at data that
doesn't exist any longer in the database.
The other option is to keep the insert/update transactions as short as
possible. Use batches if you need to to. That will keep the outstanding
locks to a minimim.
SQL Server 2005 offers a method for readers to see the original data
even if it's being updated by another user, but this option will likley
introduce overhead in the database because the data is temporarily
written to tempdb so it's available for other users to read. And it's a
database-wide setting.
For SQL 2000, it's either dirty reads or blocking. But short
transactions mitigate most of these problems.
David Gugick
Imceda Software
www.imceda.com|||Mike W wrote:
> We are inserting and updating large amounts of rows and find that
> users are complaining that their SELECT queries on the same tables
> are blocking (not finishing) until our INSERT or UPDATE finishes.
> Is there any way to tell an INSERT or UPDATE to not lock the whole
> table so that SELECT's can still be run on the table?
Another thing to consider is where the newly inserted rows are going.
For example, if you have a clustered index on an IDENTITY column, then
inserting new rows will have less of an effect on existing data because
the new most of the rows are inserted on new pages. Updated rows will
still cause problems.
What is your clustered index on? What data are you inserting? Can you
insert and update in different transactions? Can you also update and
insert in small amounts, say 1,000 rows at a time, rather than all at
once?
David Gugick
Imceda Software
www.imceda.com|||Thanks for your responses. I believe the table that is being updated does
have a Clustered index on the Primary key field, but it's not an Identity
field.
The data we are inserting is customer lead information.
I will ask about the last 2 questions since it is not me that is doing the
updates.
Thanks again.
"David Gugick" <davidg-nospam@.imceda.com> wrote in message
news:On30MJv4EHA.1596@.tk2msftngp13.phx.gbl...
> Mike W wrote:
>> We are inserting and updating large amounts of rows and find that
>> users are complaining that their SELECT queries on the same tables
>> are blocking (not finishing) until our INSERT or UPDATE finishes.
>> Is there any way to tell an INSERT or UPDATE to not lock the whole
>> table so that SELECT's can still be run on the table?
> Another thing to consider is where the newly inserted rows are going. For
> example, if you have a clustered index on an IDENTITY column, then
> inserting new rows will have less of an effect on existing data because
> the new most of the rows are inserted on new pages. Updated rows will
> still cause problems.
> What is your clustered index on? What data are you inserting? Can you
> insert and update in different transactions? Can you also update and
> insert in small amounts, say 1,000 rows at a time, rather than all at
> once?
>
> --
> David Gugick
> Imceda Software
> www.imceda.com|||Mike W wrote:
> Thanks for your responses. I believe the table that is being updated
> does have a Clustered index on the Primary key field, but it's not an
> Identity field.
> The data we are inserting is customer lead information.
> I will ask about the last 2 questions since it is not me that is
> doing the updates.
> Thanks again.
>
If the clustered index is on something other than a date or an identity,
you have a few potential issues:
1-The clustered index is probably causing page splits as new rows are
inserted. This is a very expensive operation because it requires a page
is split, a new one created, and rows moved around. While this operation
is going on, both pages are locked by SQL Server, adding to the locking
overhead of this operation.
2- The clustered index is likely requiring the disk heads move all
around the physical disk to locate the page to update. There is nothing
slower than random disk access for a database, further slowing down the
operation.
You can mitigate some of the problems here (not the physical disk heads
moving around) by leaving space in your clustered index using a fill
factor. However, this requires you rebuild the clustered index as needed
to maintain the free space before a bulk update of data. Using a fill
factor will leave a percentage of space available in each row and help
prevent page splitting. However, it will make the table a percentage
larger, but this may not be a problem if the free space is going to be
filled with new data anyway.
And the rebuilding of the clustered index will likely affect concurrency
for that table while the rebuild occurs.
If your clustered index is, in fact, on a column or set of columns that
are causing these problem, you may need to consider changing the index
to something that keeps all new data at the end of the table. The other
option is to temporarily remove the clustered index during the load and
rebuild when complete, but this will cause an automatic rebuild of all
non-clustered indexes as well and may be more expensive than what you
want.
A clustered index on an identity can prevent a host of problems and
speed the load process significantly in this case. It's worth a test to
see if it helps the load and if it affects other queries run on that
table.
David Gugick
Imceda Software
www.imceda.com
Inserting/Updating and locking
complaining that their SELECT queries on the same tables are blocking (not
finishing) until our INSERT or UPDATE finishes.
Is there any way to tell an INSERT or UPDATE to not lock the whole table so
that SELECT's can still be run on the table?
Mike W wrote:
> We are inserting and updating large amounts of rows and find that
> users are complaining that their SELECT queries on the same tables
> are blocking (not finishing) until our INSERT or UPDATE finishes.
> Is there any way to tell an INSERT or UPDATE to not lock the whole
> table so that SELECT's can still be run on the table?
That's probably not the problem. While it's likey that SQL Server is
escalating row locks to page locks and even possibly a table lock if
enough rows are affected by the insert/update, while that transaction is
running, there are exclusive locks on those rows/pages/table.
When a page has an exclusive lock, no other readers or writers can touch
the page. They are blocked until the update/insert transaction
completes. That is, unless they use the read uncommited or NOLOCK table
hint on the tables in the Select statements. But since you are updating
information, is it ok for your users to read dirty data? I don't know.
That's up to you to determine. Dirty data arises when data is
updated/inserted in a transaction and another user reads the data using
read uncommitted isolation level. if the update transaction then rolls
back the changes the user that selected the data is staring at data that
doesn't exist any longer in the database.
The other option is to keep the insert/update transactions as short as
possible. Use batches if you need to to. That will keep the outstanding
locks to a minimim.
SQL Server 2005 offers a method for readers to see the original data
even if it's being updated by another user, but this option will likley
introduce overhead in the database because the data is temporarily
written to tempdb so it's available for other users to read. And it's a
database-wide setting.
For SQL 2000, it's either dirty reads or blocking. But short
transactions mitigate most of these problems.
David Gugick
Imceda Software
www.imceda.com
|||Mike W wrote:
> We are inserting and updating large amounts of rows and find that
> users are complaining that their SELECT queries on the same tables
> are blocking (not finishing) until our INSERT or UPDATE finishes.
> Is there any way to tell an INSERT or UPDATE to not lock the whole
> table so that SELECT's can still be run on the table?
Another thing to consider is where the newly inserted rows are going.
For example, if you have a clustered index on an IDENTITY column, then
inserting new rows will have less of an effect on existing data because
the new most of the rows are inserted on new pages. Updated rows will
still cause problems.
What is your clustered index on? What data are you inserting? Can you
insert and update in different transactions? Can you also update and
insert in small amounts, say 1,000 rows at a time, rather than all at
once?
David Gugick
Imceda Software
www.imceda.com
|||Thanks for your responses. I believe the table that is being updated does
have a Clustered index on the Primary key field, but it's not an Identity
field.
The data we are inserting is customer lead information.
I will ask about the last 2 questions since it is not me that is doing the
updates.
Thanks again.
"David Gugick" <davidg-nospam@.imceda.com> wrote in message
news:On30MJv4EHA.1596@.tk2msftngp13.phx.gbl...
> Mike W wrote:
> Another thing to consider is where the newly inserted rows are going. For
> example, if you have a clustered index on an IDENTITY column, then
> inserting new rows will have less of an effect on existing data because
> the new most of the rows are inserted on new pages. Updated rows will
> still cause problems.
> What is your clustered index on? What data are you inserting? Can you
> insert and update in different transactions? Can you also update and
> insert in small amounts, say 1,000 rows at a time, rather than all at
> once?
>
> --
> David Gugick
> Imceda Software
> www.imceda.com
|||Mike W wrote:
> Thanks for your responses. I believe the table that is being updated
> does have a Clustered index on the Primary key field, but it's not an
> Identity field.
> The data we are inserting is customer lead information.
> I will ask about the last 2 questions since it is not me that is
> doing the updates.
> Thanks again.
>
If the clustered index is on something other than a date or an identity,
you have a few potential issues:
1-The clustered index is probably causing page splits as new rows are
inserted. This is a very expensive operation because it requires a page
is split, a new one created, and rows moved around. While this operation
is going on, both pages are locked by SQL Server, adding to the locking
overhead of this operation.
2- The clustered index is likely requiring the disk heads move all
around the physical disk to locate the page to update. There is nothing
slower than random disk access for a database, further slowing down the
operation.
You can mitigate some of the problems here (not the physical disk heads
moving around) by leaving space in your clustered index using a fill
factor. However, this requires you rebuild the clustered index as needed
to maintain the free space before a bulk update of data. Using a fill
factor will leave a percentage of space available in each row and help
prevent page splitting. However, it will make the table a percentage
larger, but this may not be a problem if the free space is going to be
filled with new data anyway.
And the rebuilding of the clustered index will likely affect concurrency
for that table while the rebuild occurs.
If your clustered index is, in fact, on a column or set of columns that
are causing these problem, you may need to consider changing the index
to something that keeps all new data at the end of the table. The other
option is to temporarily remove the clustered index during the load and
rebuild when complete, but this will cause an automatic rebuild of all
non-clustered indexes as well and may be more expensive than what you
want.
A clustered index on an identity can prevent a host of problems and
speed the load process significantly in this case. It's worth a test to
see if it helps the load and if it affects other queries run on that
table.
David Gugick
Imceda Software
www.imceda.com
Friday, February 24, 2012
Inserting to Dimension on Failed Lookup
I'm trying to normalize some data by creating rows in a dimension table as I find new values in the original table.
My first thought is to use a Lookup to replace string values with id's to the Dimension table, and when the Lookup fails, insert the value into the Dimension table with a new key.
Does that make sense? This would be relatively easy to do with TSQL, but surely there is a way to do this in SSIS.
Thoughts?
Greg,
Why you do not, first hand, create a default(error, unknown, etc) member in every dimension and then if the lookup fails assign the id of the default. Anyway, I think that has been disscussed in previous threads; here is one.
http://forums.microsoft.com/msdn/showpost.aspx?postid=75211&siteid=1
Rafael Salas
|||Thanks! I will read that thread...Responding to your suggestion, however, I can't use a default value because if the lookup doesn't work then a new record needs to be inserted into the Dimension table.
For example, if my row has "XXX" as a value and that value isn't in the Dimension table, I want to insert the row with the KEY rather than the VALUE("XXX") and insert the KEY/VALUE pair into the dimension table
Basically, I want to do an insert into dimension (select distinct value from facts) before I do a lookup to ensure that all lookups will be successful.|||
I read that thread, and the example that Thomas Pagel blogs on is what I'm looking for...
however, this really seems like an unnecessarily complicated approach.
Since I'm new to this, I'm inclined to think that I'm making it more difficult that it needs to be, but can't think of another approach.
If I was scripting this in TSQL, or SQL, I'd do a select of the distinct values from the FACTs table, insert those into the dimension, and then for each row in the FACTS table replace the value with the key from the new Dimension table.
Is this approach inconsistent with how SSIS is intended to be used?
|||Greg Akins wrote:
however, this really seems like an unnecessarily complicated approach.
Since I'm new to this, I'm inclined to think that I'm making it more difficult that it needs to be, but can't think of another approach.
I really think this approach over complicates things. Not matches between a fact and its dimensions should be an exceptional case rather than the driver of your dimension load process.
Greg Akins wrote:
If I was scripting this in TSQL, or SQL, I'd do a select of the distinct values from the FACTs table, insert those into the dimension, and then for each row in the FACTS table replace the value with the key from the new Dimension table.
I would to exactly the same in SSIS; in general dimensions should be processed before the facts, even if like in your case, they come from the same source.
Greg Akins wrote:
Is this approach inconsistent with how SSIS is intended to be used?
Not sure if there is an answer for that...
Rafael Salas
|||OK. Thanks alot. I appreciate your advice.
Inserting the diff rows into a table
Hi,
I have two tables as follows
Table TempTab
{
CatId varchar(20),
lastupdate Datetime
}
Table MainTab
{
CatId varchar(20),
lastupdate Datetime
}
and the data in those tables are as follows
Table TempTab
{
CatId LastUpdate
Cat1 D1
Cat2 D2
Cat3 D3
Cat4 D4
}
Table MainTab
{
CatId LastUpdate
Cat1 D1
Cat3 D3
Cat5 D5
}
I need a query to insert the differences into the MinTab, i mean to say the fincal MainTab should look like as follows
Table MainTab
{
CatId LastUpdate
Cat1 D1
Cat2 D2
Cat3 D3
Cat4 D4
Cat5 D5
}
can any one please let me know the query
Thanks alot
~Mohan
INSERT INTO MainTab(CatID, LastUpdate)SELECT CatID, LastUpdate FROM TempTab
WHERE CatID NOT IN(SELECT CatID FROM MainTab WHERE atID IS NOT NULL)
|||
Here it is,
Code Snippet
Create Table #temptab (
[CatId] Varchar(100) ,
[LastUpdate] Varchar(100)
);
Insert Into #temptab Values('Cat1','D1');
Insert Into #temptab Values('Cat2','D2');
Insert Into #temptab Values('Cat3','D3');
Insert Into #temptab Values('Cat4','D4');
Create Table #maintab (
[CatId] Varchar(100) ,
[LastUpdate] Varchar(100)
);
Insert Into #maintab Values('Cat1','D1');
Insert Into #maintab Values('Cat3','D3');
Insert Into #maintab Values('Cat5','D5');
Code Snippet
Insert Into #maintab
Select * from #temptab t
Where not exists
(
select 1 from #maintab m
where m.catid=t.catid
)
Inserting rows problem
tables.
As a "programmer", I thought this would be trivial, so using VB6 and ADO, I
created a recordset of all the import rows and then iterated through the
recordset extracting the data from the fields and putting them into the
relational tables. This worked fine with my test data but rather obviously
takes way too long with real data; for every row, I have a network
round-trip to update the first table and retrieve the primary key's ID, I
then have another round trip to update the second table and get it's ID, and
a third network round trip for the third table. All in all, we're talking >
7 hours to complete.
I need to knock this down to minutes...
I'm guessing that I can do this with a single SQL command sent to the
server, but my SQL skills aren't really up to it. I'm hoping that this is
relatively trivial and that someone could provide a few hints.
If it's any help, I've created some example tables (the real ones are
obviously more complicated) - see end of message.
Thanks in advance
Griff
--
--The table holding the import table:
CREATE TABLE [dbo].[inputData] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[product] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[pack] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[price] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
-- Data to fill this:
insert into [inputData] (product,pack,price) values ('diary','T','100')
insert into [inputData] (product,pack,price) values ('diary','R','5')
insert into [inputData] (product,pack,price) values ('envelope','T','1000')
insert into [inputData] (product,pack,price) values ('envelope','R','100')
insert into [inputData] (product,pack,price) values ('hatstand','T','5')
insert into [inputData] (product,pack,price) values ('hatstand','R','1')
insert into [inputData] (product,pack,price) values ('lamp','T','10')
insert into [inputData] (product,pack,price) values ('lamp','R','2')
-- The following relational tables
CREATE TABLE [dbo].[Product] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[catalogueProduct] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Product] ADD
CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Pack] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[Product_ID] [int] NOT NULL ,
[Pack_Details] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Pack] ADD
CONSTRAINT [PK_Pack] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Price] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[Price_Details] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[Pack_ID] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Price] ADD
CONSTRAINT [PK_Price] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Pack] ADD
CONSTRAINT [FK_Pack_Product] FOREIGN KEY
(
[Product_ID]
) REFERENCES [dbo].[Product] (
[id]
)
GO
ALTER TABLE [dbo].[Price] ADD
CONSTRAINT [FK_Price_Pack] FOREIGN KEY
(
[Pack_ID]
) REFERENCES [dbo].[Pack] (
[id]
)
GOYou could create a DTS package that loads tables - perhaps, work tables -
with data via either a data pump, or more likely, a bulk insert.
You didn't post sample data from the flat file, so it's hard to say for
sure.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
.
"Griff" <howling@.the.moon> wrote in message
news:ujMIIulJGHA.3064@.TK2MSFTNGP10.phx.gbl...
I have a "flat" CSV file that I need to import into a set of 3 relational
tables.
As a "programmer", I thought this would be trivial, so using VB6 and ADO, I
created a recordset of all the import rows and then iterated through the
recordset extracting the data from the fields and putting them into the
relational tables. This worked fine with my test data but rather obviously
takes way too long with real data; for every row, I have a network
round-trip to update the first table and retrieve the primary key's ID, I
then have another round trip to update the second table and get it's ID, and
a third network round trip for the third table. All in all, we're talking >
7 hours to complete.
I need to knock this down to minutes...
I'm guessing that I can do this with a single SQL command sent to the
server, but my SQL skills aren't really up to it. I'm hoping that this is
relatively trivial and that someone could provide a few hints.
If it's any help, I've created some example tables (the real ones are
obviously more complicated) - see end of message.
Thanks in advance
Griff
--
--The table holding the import table:
CREATE TABLE [dbo].[inputData] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[product] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[pack] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[price] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
-- Data to fill this:
insert into [inputData] (product,pack,price) values ('diary','T','100')
insert into [inputData] (product,pack,price) values ('diary','R','5')
insert into [inputData] (product,pack,price) values ('envelope','T','1000')
insert into [inputData] (product,pack,price) values ('envelope','R','100')
insert into [inputData] (product,pack,price) values ('hatstand','T','5')
insert into [inputData] (product,pack,price) values ('hatstand','R','1')
insert into [inputData] (product,pack,price) values ('lamp','T','10')
insert into [inputData] (product,pack,price) values ('lamp','R','2')
-- The following relational tables
CREATE TABLE [dbo].[Product] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[catalogueProduct] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Product] ADD
CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Pack] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[Product_ID] [int] NOT NULL ,
[Pack_Details] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Pack] ADD
CONSTRAINT [PK_Pack] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Price] (
[id] [int] IDENTITY (1, 1) NOT NULL ,
[Price_Details] [varchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,
[Pack_ID] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Price] ADD
CONSTRAINT [PK_Price] PRIMARY KEY CLUSTERED
(
[id]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Pack] ADD
CONSTRAINT [FK_Pack_Product] FOREIGN KEY
(
[Product_ID]
) REFERENCES [dbo].[Product] (
[id]
)
GO
ALTER TABLE [dbo].[Price] ADD
CONSTRAINT [FK_Price_Pack] FOREIGN KEY
(
[Pack_ID]
) REFERENCES [dbo].[Pack] (
[id]
)
GO|||> You didn't post sample data from the flat file, so it's hard to say for
> sure.
The sample data was in the [inputData] table (see insert statements).
I can see the advantage of DTS, but it's going to be difficult to get that
to work in our current application.
I guess what I really want is a way to say, "insert this information into
these three tables, ensuring that the referential integrity is maintained"
Griff|||Griff wrote:
> I have a "flat" CSV file that I need to import into a set of 3 relational
> tables.
> As a "programmer", I thought this would be trivial, so using VB6 and ADO,
I
> created a recordset of all the import rows and then iterated through the
> recordset extracting the data from the fields and putting them into the
> relational tables. This worked fine with my test data but rather obviousl
y
> takes way too long with real data; for every row, I have a network
> round-trip to update the first table and retrieve the primary key's ID, I
> then have another round trip to update the second table and get it's ID, a
nd
> a third network round trip for the third table. All in all, we're talking
>
> 7 hours to complete.
> I need to knock this down to minutes...
> I'm guessing that I can do this with a single SQL command sent to the
> server, but my SQL skills aren't really up to it. I'm hoping that this is
> relatively trivial and that someone could provide a few hints.
> If it's any help, I've created some example tables (the real ones are
> obviously more complicated) - see end of message.
> Thanks in advance
> Griff
> --
You've made the classic mistake of not declaring any alternate keys.
Leaving them out is not an option if the PK is an IDENTITY - otherwise
redundancy and lack of integritry will rule and your problems will be
much harder to solve. Here's what I suggest:
ALTER TABLE product ADD CONSTRAINT ak1_product UNIQUE
(catalogueproduct);
ALTER TABLE pack ADD CONSTRAINT ak1_pack UNIQUE (product_id,
pack_details);
ALTER TABLE price ADD CONSTRAINT ak1_price UNIQUE (pack_id /* ? */);
Now populate all three tables with three INSERTs:
INSERT INTO product (catalogueproduct)
SELECT DISTINCT product
FROM inputdata ;
INSERT INTO pack (product_id, pack_details)
SELECT DISTINCT P.id, pack
FROM inputdata AS D
JOIN product AS P
ON D.product = P.catalogueproduct ;
INSERT INTO price (price_details, pack_id)
SELECT DISTINCT D.price, A.id
FROM inputdata AS D
JOIN product AS P
ON D.product = P.catalogueproduct
JOIN pack AS A
ON D.pack = A.pack_details
AND A.product_id = P.id ;
David Portas, SQL Server MVP
Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.
SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--|||File layouts can be significant, so posting INSERT statements is not really
helpful. Is what you're doing:
1) Populating inputData
2) Populating the other 3 tables, based on what you've loaded into inputData
If so, then inputData is just a work table. Once loaded - via bulk insert -
you then can kick off 3 SQL queries to insert rows into the other tables.
See David's remarks re FK's.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
"Griff" <howling@.the.moon> wrote in message
news:O8%232VgmJGHA.668@.TK2MSFTNGP11.phx.gbl...
> You didn't post sample data from the flat file, so it's hard to say for
> sure.
The sample data was in the [inputData] table (see insert statements).
I can see the advantage of DTS, but it's going to be difficult to get that
to work in our current application.
I guess what I really want is a way to say, "insert this information into
these three tables, ensuring that the referential integrity is maintained"
Griff|||Thank you Dave & Tom very much for your help.
The example tables were admittedly rather poor - the real ones have
composite primary keys that ensure that the relevant fields are unique -
sorry for not including that in the example tables.
Regarding Tom's comment - I was going to load the data from the CSV file
into a DB table rather than work directly on the CSV file. The reason for
this is that there are several intervening steps that were seemingly
irrelevant for this exercise.
The approach you have suggested should work fine - thanks for the idea.
Griff|||Looks like the DTS package is the route to go:
1) Create 3 SQL Server connections.
2) Load the table via a bulk insert task.
3) Run 2 Execute SQL tasks in parallel - one on each SQL connection - to
load the other tables. You'll be using SELECT DISTINCT in the queries.
Tom
----
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinpub.com
"Griff" <howling@.the.moon> wrote in message
news:uc9RMinJGHA.3856@.TK2MSFTNGP12.phx.gbl...
Thank you Dave & Tom very much for your help.
The example tables were admittedly rather poor - the real ones have
composite primary keys that ensure that the relevant fields are unique -
sorry for not including that in the example tables.
Regarding Tom's comment - I was going to load the data from the CSV file
into a DB table rather than work directly on the CSV file. The reason for
this is that there are several intervening steps that were seemingly
irrelevant for this exercise.
The approach you have suggested should work fine - thanks for the idea.
Griff