Hi,
I've got a real problem inserting approx 1,000,000 rows of data into
some SQL server 2000 tables. The data is being inserted 1 row at a
time using a T-sql cursor. When I view the table size using
sp_spaceused I get the following type of results:
reserved = 27021280 KB
data = 3376216 KB
Unused = 23642952
why is the reserved space sooooo much more than the actual amount of
data in the table?
The table cannot be fragmented as this is the first data which has
been inserted into it.
I cannot use a different insert method (like DTS) as logic needs to be
applied to the data before it is inserted.
Is there any way round this?
Shrinking the tables after the event is also not an option, as I would
run out of disk space way before all of the tables are populated.Out-of-date space usage info? Have you tried DBCC UPDATEUSAGE?
Also, what indexes do you have on the table. This along with the data distribution of the data you
are inserting will determine the level of fragmentation you get when you perform your inserts. Did
you check fragmentation level using DBCC SHOWCONTIG?
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Jackie" <jackiesmith_3@.hotmail.com> wrote in message
news:6cb5ab33.0410040640.4e8dc430@.posting.google.com...
> Hi,
> I've got a real problem inserting approx 1,000,000 rows of data into
> some SQL server 2000 tables. The data is being inserted 1 row at a
> time using a T-sql cursor. When I view the table size using
> sp_spaceused I get the following type of results:
> reserved = 27021280 KB
> data = 3376216 KB
> Unused = 23642952
> why is the reserved space sooooo much more than the actual amount of
> data in the table?
> The table cannot be fragmented as this is the first data which has
> been inserted into it.
> I cannot use a different insert method (like DTS) as logic needs to be
> applied to the data before it is inserted.
> Is there any way round this?
> Shrinking the tables after the event is also not an option, as I would
> run out of disk space way before all of the tables are populated.|||Thanks for your advice, I am new to SQL server so had not heard of
these procedures before...
DBCC UPDATEUSAGE - has no effect.
When I do DBCC SHOWCONTIG it shows that the table is MASSIVELY
fragmented (at least I think that's what it's saying) - results:
- Pages Scanned........................: 429774
- Extents Scanned.......................: 429755
- Extent Switches.......................: 429754
- Avg. Pages per Extent..................: 1.0
- Scan Density [Best Count:Actual Count]......: 12.50% [53722:429755]
- Extent Scan Fragmentation ...............: 98.13%
- Avg. Bytes Free per Page................: 7712.6
- Avg. Page Density (full)................: 4.71%
There are no indexes on the table at all (I am trying this routine on
a test database before running it elsewhere and I assumed that leaving
off the indexes would increase the rate of the inserts) - are you
suggesting that if I had the indexes on the table it would reduce the
fragmentation as the data is inserted?
you say "This along with the data distribution of the data you are
inserting", but how can I control where the data is physically written
to? (and therefore control the fragmentation)
"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in message news:<uRyrH1iqEHA.1992@.TK2MSFTNGP09.phx.gbl>...
> Out-of-date space usage info? Have you tried DBCC UPDATEUSAGE?
> Also, what indexes do you have on the table. This along with the data distribution of the data you
> are inserting will determine the level of fragmentation you get when you perform your inserts. Did
> you check fragmentation level using DBCC SHOWCONTIG?
> --
> Tibor Karaszi, SQL Server MVP
> http://www.karaszi.com/sqlserver/default.asp
> http://www.solidqualitylearning.com/
>
> "Jackie" <jackiesmith_3@.hotmail.com> wrote in message
> news:6cb5ab33.0410040640.4e8dc430@.posting.google.com...
> > Hi,
> >
> > I've got a real problem inserting approx 1,000,000 rows of data into
> > some SQL server 2000 tables. The data is being inserted 1 row at a
> > time using a T-sql cursor. When I view the table size using
> > sp_spaceused I get the following type of results:
> >
> > reserved = 27021280 KB
> > data = 3376216 KB
> > Unused = 23642952
> >
> > why is the reserved space sooooo much more than the actual amount of
> > data in the table?
> > The table cannot be fragmented as this is the first data which has
> > been inserted into it.
> >
> > I cannot use a different insert method (like DTS) as logic needs to be
> > applied to the data before it is inserted.
> >
> > Is there any way round this?
> >
> > Shrinking the tables after the event is also not an option, as I would
> > run out of disk space way before all of the tables are populated.|||The pages seems indeed very empty, on average. To say anything more conclusive, we would need the
table layout and what indexes you have on the table. You say no indexes, but that means they you
didn't define a primary key (or unique constraint). This is not recommended! Sp_helpindex will list
the indexes you have on the table.
In general every table should have a clustered index. Which column(s) you define in the clustered
index is based on both the data distribution and your queries. It is likely that you will have less
"emptiness" in the pages with a proper clustered index.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Jackie" <jackiesmith_3@.hotmail.com> wrote in message
news:6cb5ab33.0410050059.7bbdb78f@.posting.google.com...
> Thanks for your advice, I am new to SQL server so had not heard of
> these procedures before...
> DBCC UPDATEUSAGE - has no effect.
> When I do DBCC SHOWCONTIG it shows that the table is MASSIVELY
> fragmented (at least I think that's what it's saying) - results:
> - Pages Scanned........................: 429774
> - Extents Scanned.......................: 429755
> - Extent Switches.......................: 429754
> - Avg. Pages per Extent..................: 1.0
> - Scan Density [Best Count:Actual Count]......: 12.50% [53722:429755]
> - Extent Scan Fragmentation ...............: 98.13%
> - Avg. Bytes Free per Page................: 7712.6
> - Avg. Page Density (full)................: 4.71%
> There are no indexes on the table at all (I am trying this routine on
> a test database before running it elsewhere and I assumed that leaving
> off the indexes would increase the rate of the inserts) - are you
> suggesting that if I had the indexes on the table it would reduce the
> fragmentation as the data is inserted?
> you say "This along with the data distribution of the data you are
> inserting", but how can I control where the data is physically written
> to? (and therefore control the fragmentation)
>
> "Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in message
> news:<uRyrH1iqEHA.1992@.TK2MSFTNGP09.phx.gbl>...
>> Out-of-date space usage info? Have you tried DBCC UPDATEUSAGE?
>> Also, what indexes do you have on the table. This along with the data distribution of the data
>> you
>> are inserting will determine the level of fragmentation you get when you perform your inserts.
>> Did
>> you check fragmentation level using DBCC SHOWCONTIG?
>> --
>> Tibor Karaszi, SQL Server MVP
>> http://www.karaszi.com/sqlserver/default.asp
>> http://www.solidqualitylearning.com/
>>
>> "Jackie" <jackiesmith_3@.hotmail.com> wrote in message
>> news:6cb5ab33.0410040640.4e8dc430@.posting.google.com...
>> > Hi,
>> >
>> > I've got a real problem inserting approx 1,000,000 rows of data into
>> > some SQL server 2000 tables. The data is being inserted 1 row at a
>> > time using a T-sql cursor. When I view the table size using
>> > sp_spaceused I get the following type of results:
>> >
>> > reserved = 27021280 KB
>> > data = 3376216 KB
>> > Unused = 23642952
>> >
>> > why is the reserved space sooooo much more than the actual amount of
>> > data in the table?
>> > The table cannot be fragmented as this is the first data which has
>> > been inserted into it.
>> >
>> > I cannot use a different insert method (like DTS) as logic needs to be
>> > applied to the data before it is inserted.
>> >
>> > Is there any way round this?
>> >
>> > Shrinking the tables after the event is also not an option, as I would
>> > run out of disk space way before all of the tables are populated.|||Well, I don't understand at all!!
It was true that the tables did not have indexes or primary keys on -
this was simply because I was running a test and I thought it would
run more quickly - again my not understanding SQL Server properly (not
because I think it's a good idea in terms of design!).
After your latest message - here's what I did (sorry if this is
longwinded).
I loaded the indexes / PKs onto the relevent tables - although I
notice there are no clustered indexes on any of the 5 tables which I
am inserting data into - please bear in mind though that I have no
influence over the design of the database...
I re-ran my routine against THE SAME SET OF DATA - this time the data
loaded in a 100th of the time, and took up approx 350Mb per table!!
(prior to this the tables had 27Gb of reserved space, with maybe 3.5Gb
of data) - HOW CAN THIS BE SO DIFFERENT? (total size of database was
61Gb and now is 3Gb !!)
So, then I tried to run my same routine loading data into a different
database on a different server which already had the indexes on (same
data model). This server only had 12Gb of free space (the total data
in my first db took up 1.9Gb). Afetr a very short space of time the
database ran out of disk space after only a fraction of the data had
been inserted. Looking at SP_SPACEUSED, the reserved space was way out
of sync with the data figure as before.
I ran DBCC UPDATEUSAGE, truncated the 5 tables and re-ran my routine,
now the data is inserting happily and is taking up 353Mb per table
again!
I have no more servers to play with!!
Is the data "behaving properly" the 2nd time around simply because it
is the 2nd time the same routine has been run?
Or is the fact that each time I have run DBCC UPDATEUSAGE relevent?
Surely there must be a way of achieving the proper data figures during
the first time this routine is run (I clearly can't keep running the
same routines and running out of disk space when I come to run this
against production databases).
For reference - here is an example of one of the tables I am inserting
data into (results from sp_help) -
BookingPayment dbo user table 2004-08-24 11:48:44.210
BooRefNo int no 4 10 0 no (n/a) (n/a) NULL
PayID int no 4 10 0 no (n/a) (n/a) NULL
BpyDate datetime no 8 no (n/a) (n/a) NULL
BpyAmount money no 8 19 4 no (n/a) (n/a) NULL
BpyTzoName varchar no 3
yes no no SQL_Latin1_General_CP1_CI_AS
BpyDateUTC datetime no 8 yes (n/a) (n/a) NULL
BpyPayAmount money no 8 19 4 yes (n/a) (n/a) NULL
BpyExrRate float no 8 53 NULL yes (n/a) (n/a) NULL
WrkID varchar no 20 yes no no SQL_Latin1_General_CP1_CI_AS
UseID varchar no 20 yes no no SQL_Latin1_General_CP1_CI_AS
BpyCreatedWkgID varchar no 3
yes no no SQL_Latin1_General_CP1_CI_AS
BpyCreatedUgrID varchar no 3
yes no no SQL_Latin1_General_CP1_CI_AS
BpyCreatedProID varchar no 5
yes no no SQL_Latin1_General_CP1_CI_AS
BpyCreatedPrgID varchar no 3
yes no no SQL_Latin1_General_CP1_CI_AS
BpyInvStatus tinyint no 1 3 0 yes (n/a) (n/a) NULL
PaymentKey nonclustered, unique located on PRIMARY PayID, BooRefNo
PK___3__21 nonclustered, unique, primary key located on
PRIMARY BooRefNo, PayID, BpyDate
PRIMARY KEY (non-clustered) PK___3__21 (n/a) (n/a) (n/a) (n/a) BooRefNo,
PayID, BpyDate
there is a difference in 27Gb (!!!) in total space used between the
1st and 2nd times I insert data into this table...
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment