Showing posts with label stocks. Show all posts
Showing posts with label stocks. Show all posts

Friday, March 23, 2012

many tables vs one big table

Dear experts,

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?

To summarize:

1. Should I use a single table or many tables?

2. If using a single table, how are inserts handled by the DB to
prevent inserts from getting slower and slower as more items are added
to the DB?

Thanks in advance for your help and advice.Use a single table, not many tables. Use a multi-column key that
includes the stock. When a new stock comes along it should be a user
transaction to add a row to the Stocks master table, not a database
administrator's job to add tables. The database design should be
stable, not in a constant state of change.

Performance of INSERTS does not change much between inserting to a
table with 10000 rows or 10000000 rows.

Roy Harvey
Beacon Falls, CT

On 2 Jan 2007 08:57:21 -0800, "Emin" <emin.shopper@.gmail.comwrote:

Quote:

Originally Posted by

>Dear experts,
>
>If I have a number of related groups of data (e.g., stock prices for
>different stocks) is it better to put them into many tables (e.g., one
>table per stock) or into one big table (e.g., a single table where one
>column is the stock id other columns are the price and other data)?
>
>The single table approach with a clustered primary key including the
>stock id and other information that is different for each stock seems
>cleaner to me. But I worry that having a single table could hurt
>preformance when the database gets very large. The clustered primary
>key should make queries on the single table fast, but I'm worried about
>inserts. According to my vague understanding of a clustered primary
>key, records are physically stored on disk based on the primary key. So
>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?
>
>To summarize:
>
1. Should I use a single table or many tables?
>
2. If using a single table, how are inserts handled by the DB to
>prevent inserts from getting slower and slower as more items are added
>to the DB?
>
>Thanks in advance for your help and advice.

|||Emin (emin.shopper@.gmail.com) writes:

Quote:

Originally Posted by

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?


There are two reasons why you would have more than one table:

1) Information is so different for different (groups of) stocks,
that you get different set of columns.

2) The table is really big. Then you would might make into a partioned
view or table, based on condition. But that would not be one table
per stock id.

So, in the logical design, that's one table.

Quote:

Originally Posted by

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?


If the PK is monotonically ascending new rows are just added at the
end, all continues smoothly. If the PK is not ascending, but for instance
a random GUID, then you will get a lots of page splits, and that
takes extra power. But the answer in this case is that you should
not cluster on the GUID, but on something else. It's perfectly possible
to have the clustered index on some other column(s).

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 2 Jan 2007 08:57:21 -0800, Emin wrote:

(snip)

Quote:

Originally Posted by

>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?


Hi Emin,

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:

1. There's still space left on the page. The row is added and the insert
is finished.

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis|||Dear Roy, Erland, and Hugo,

Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?

Thanks again,
-Emin

On Jan 2, 5:53 pm, Hugo Kornelis <h...@.perFact.REMOVETHIS.info.INVALID>
wrote:

Quote:

Originally Posted by

On 2 Jan 2007 08:57:21 -0800, Emin wrote:
>
(snip)
>

Quote:

Originally Posted by

if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?Hi Emin,


>
Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.
>
Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.
>
If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:
>
1. There's still space left on the page. The row is added and the insert
is finished.
>
2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.
>
If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.
>
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis

|||One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?

Thanks again,
-Emin

On Jan 3, 1:42 pm, "Emin" <emin.shop...@.gmail.comwrote:

Quote:

Originally Posted by

Dear Roy, Erland, and Hugo,
>
Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?
>
Thanks again,
-Emin
>
On Jan 2, 5:53 pm, Hugo Kornelis <h...@.perFact.REMOVETHIS.info.INVALID>
wrote:
>

Quote:

Originally Posted by

On 2 Jan 2007 08:57:21 -0800, Emin wrote:


>

Quote:

Originally Posted by

(snip)


>

Quote:

Originally Posted by

Quote:

Originally Posted by

>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?Hi Emin,


>

Quote:

Originally Posted by

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.


>

Quote:

Originally Posted by

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.


>

Quote:

Originally Posted by

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:


>

Quote:

Originally Posted by

1. There's still space left on the page. The row is added and the insert
is finished.


>

Quote:

Originally Posted by

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.


>

Quote:

Originally Posted by

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.


>

Quote:

Originally Posted by

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis

|||Emin (emin.shopper@.gmail.com) writes:

Quote:

Originally Posted by

One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?


Yes, DBCC DBREINDEX on SQL 2000. The preferred syntax on SQL 2005 is
ALTER INDEX WITH REBUILD or something like that.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 3 Jan 2007 10:42:46 -0800, Emin wrote:

Quote:

Originally Posted by

If necessary, where would I get more information about
>low level details like page splits (e.g., how to control the page size,
>how to determine when and how many page splits are occuring, etc.)?


Hi Emin,

For that kind of information, the only serious resource are the books
written by Kalen Delaney. Quite advanced stuff, definitely not entry
level, but that's unavoidable if you want to know about page splits.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis