Managing sql Log File: app

We have an asp.net app with about 200 data entry forms. Customers may
enter data into any number of forms. Each form's data is persisted in
a corresponding sql table. When data entry is complete, it needs to be
processed. Here's where the questions start.

How can we easily determine in which tables a customer has data and how
best to select that data?

We're not opposed to putting all the data in a single table. This
table would wind up having ~15 million records and constantly have CRUD
operations performed against it by up to 5000 users simultaneously.
With sufficient hardware, is this too much to ask of the db?heromull (heromull@.gmail.com) writes:
> We have an asp.net app with about 200 data entry forms. Customers may
> enter data into any number of forms. Each form's data is persisted in
> a corresponding sql table. When data entry is complete, it needs to be
> processed. Here's where the questions start.
> How can we easily determine in which tables a customer has data and how
> best to select that data?
> We're not opposed to putting all the data in a single table. This
> table would wind up having ~15 million records and constantly have CRUD
> operations performed against it by up to 5000 users simultaneously.
> With sufficient hardware, is this too much to ask of the db?

Whether the 200 tables should be 1, 10, 74, or 200 is impossible to tell
from without knowledge about what's in them.

But from a performance point of view, it would not really matter whether
it's one or two hundred tables. Provided, that is, the the single table
has proper indexing.

There are a couple a ways of finding to find data to process:

1) Timestamp column. A timestamp column is automatically updated by SQL
Server with a database-unique value that is monotonically increasing.
(Binary, completely unrelated to date and time). The process that looks
for data would keep track of the most recent timestamp per table, and
retrieve the rows with higher timestamp value. If the process updates the
rows itself, it needs to combine the lookup with a status column. The
drawback with this solution is that the timestamp column must be indexed,
and since it's updated each time the row is updated, there will be a lot
of shuffling around in that index.

2) IDENTITY column. All tables would have an identity column, and then
the process would keep track of the most recently processed value. With
this solution you can only handle inserts, not if users update existing
data.

3) Having triggers on the that enters data about rows to process into a
table. Again, you may need a mechanism to differentiate between user-entered
changes and changes from your processing.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks, the data to select for processing is not time based, it's based
on whatever tables a customer has data in at the time of processing,
which is determined by the customer. So, option three would be the
best and is similar to our current process. Also, the data is not
updated by the processing, only selected. Even with a table that tells
us which tables have data, how would we build the select statement(s)
to only include those tables with data?|||heromull (heromull@.gmail.com) writes:
> Thanks, the data to select for processing is not time based, it's based
> on whatever tables a customer has data in at the time of processing,
> which is determined by the customer. So, option three would be the
> best and is similar to our current process.

Not that see what time has to do with it. I did say timestamp, but
the timestamp datatype has nothing to do with time. Then again, if you
already have a process similar to option three, then go with that.

> Also, the data is not updated by the processing, only selected. Even
> with a table that tells us which tables have data, how would we build
> the select statement(s) to only include those tables with data?

I can not say that. I don't know your tables. I don't know in which context
this process runs etc. I would assume that since there are 200 tables,
that you would have a stored procedure or a SELECT statement for that
table, as I would assume that all tables would generally have a different
set of columns. If they all have the same schema, then there is a strong
indication of that you should have one single table.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||"heromull" <heromull@.gmail.com> wrote in message
news:1130029056.070783.59510@.g14g2000cwa.googlegro ups.com...
> We have an asp.net app with about 200 data entry forms. Customers may
> enter data into any number of forms. Each form's data is persisted in
> a corresponding sql table. When data entry is complete, it needs to be
> processed. Here's where the questions start.
> How can we easily determine in which tables a customer has data and how
> best to select that data?
> We're not opposed to putting all the data in a single table. This
> table would wind up having ~15 million records and constantly have CRUD
> operations performed against it by up to 5000 users simultaneously.
> With sufficient hardware, is this too much to ask of the db?

Don't design a database schema around a user interface. Design your database
schema around your DATA and BUSINESS RULES, then build a data access layer
that supports the UI. You say "Each form's data is persisted in a
corresponding sql table". This make little sense to me as a description of a
data model. I suspect (admittedly on the basis of too little information)
that with a better design your perceived problems would disappear. The
stataement "We're not opposed to putting all the data in a single table"
also suggests a very arbitrary approach to database design. Is your current
design a normalized one? If so, I don't understand your confusion about "How
can we easily determine in which tables a customer has data". The answer is
presumably that you use the Customer key - whatever that is in your model.

--
David Portas
SQL Server MVP
--|||Erland Sommarskog wrote:
...
> If they all have the same schema, then there is a strong
> indication of that you should have one single table.
...

Yes, this is the case. They share the same schema.

David Portas wrote:
> Don't design a database schema around a user interface.

I think this is our first mistake.

> Design your database
> schema around your DATA and BUSINESS RULES, then build a data access layer
> that supports the UI. You say "Each form's data is persisted in a
> corresponding sql table". This make little sense to me as a description of a
> data model. I suspect (admittedly on the basis of too little information)
> that with a better design your perceived problems would disappear.

I think you're right. It's that "better design" that we're
researching.

> The
> stataement "We're not opposed to putting all the data in a single table"
> also suggests a very arbitrary approach to database design.

It does and I say that only to mean that it's not too late for us to
implement a better design.

> Is your current
> design a normalized one? If so, I don't understand your confusion about "How
> can we easily determine in which tables a customer has data". The answer is
> presumably that you use the Customer key - whatever that is in your model.

Here's DDL that describes our current "model" (I hear you laughing
already). Notice that all 200 talbes have the same schema. If we
determine a single table approach would be a better design, what would
be some areas of concern? Obviously indexing is on the list.

CREATE TABLE [Customer] (
[CustomerId] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (100),
CONSTRAINT [PK_Customer] PRIMARY KEY NONCLUSTERED
(
[CustomerId]
)
)
--we have ~200 tables with this table's schema
CREATE TABLE [Table1] ( --Table1 thru Table200
[Table1Id] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NOT NULL ,
[Value] [nchar] (1024),
CONSTRAINT [PK_Table1] PRIMARY KEY NONCLUSTERED
(
[Table1Id]
) ,
CONSTRAINT [FK_Table1_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
--each of the 200 tables have a trigger
--similar to this (i'll omit the delete trigger from the DDL)
create trigger
Table1CreateInterview
on
Table1
for insert
as
begin
declare @.CustomerId int
declare @.RecordId int
select
@.RecordId = Table1Id,
@.CustomerId = CustomerId
from
inserted

insert into
interview
(
CustomerId,

TableId,--represents the table caused the trigger eg. 1=Table1
RecId--the value of the primary key of the record causing the trigger
)
values
(
@.CustomerId,
1,
@.RecordId
)
end
GO
--a record is inserted into this table each
--time a record is written to any of the 200 tables
--a record is deleted from this table each time
--a record is deleted from any of the 200 tables
CREATE TABLE [Interview] (
[InterviewId] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NULL ,
[TableId] [int] NOT NULL ,
[RecId] [int] NULL
CONSTRAINT [PK_Interview] PRIMARY KEY NONCLUSTERED
(
[InterviewId]
),
CONSTRAINT [FK_Interview_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
insert into customer(name) values ('some name')
insert into table1(customerid, value) values (1, 'my value in table1')|||heromull (heromull@.gmail.com) writes:
>> Don't design a database schema around a user interface.
> I think this is our first mistake.

An unusually candid confession!

>> If they all have the same schema, then there is a strong
>> indication of that you should have one single table.
> ...
> Yes, this is the case. They share the same schema.

You should make them one table, adding one more column that specifies
the entity that is now hidden in the table name. That table would look
like:

CREATE TABLE [Answers] ( --Table1 thru Table200
[CustomerId] [int] NOT NULL ,
TableNo int NOT NULL,
RowNo smallint NOT NULL,
[Value] [nchar] (1024),
PRIMARY KEY (CustomerID, TableNo, RowNo)

I've added RowNo, beause I don't know if one customer can add more
than one value in the same form. If he can't RowNo should not be
there. I've very deliberate taken out the IDENTITY column, because
this table should have a composite key. My RowNo is indeed a surrogate,
but only in the sub-realm of CustomerId, TableNo. (And TableNo is just
a name I use, in lack of knowledge about the business domain.)

I put the key on CustomerID, TableNo, but depending how you use the
table, you may also have a need for an index on (TableNo, CustomerID).
CustomerID first is good for queries like "what is customer 1233 up to"?,
but not for "What do we have in table 12?".

> It does and I say that only to mean that it's not too late for us to
> implement a better design.

That's great to hear!

> begin
> declare @.CustomerId int
> declare @.RecordId int
> select
> @.RecordId = Table1Id,
> @.CustomerId = CustomerId
> from
> inserted

Uh-uh, classic mistake. A trigger fires once per statement, not once
per row. Yes, as long as data through that form, it will only come
one by one, but then suddenly there is a batch processing loading lots
of data at the same time. So write your trigger as:

INSERT interview (...)
SELECT ...
FROM inserted

> insert into
> interview

And INSERT without a column list is very poor practice in production code.
Someone adds a column to the table, and the statement blows up. That's
bad.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog wrote:
> An unusually candid confession!

Acceptance is the first step right?

Anyway, thank you so much for the support! I think the suggestions are
going to help us out a lot. A lightbulb went off over my head when I
read your composit key explaination.|||Thank you so much for the support! I think this will work for us. My
only other question would be about managing the surrogate column
(RowNo). During Inserts, is there a way to manage it within SQL Server
or would my client apps need to select max(RowNo) and increase it by 1?

Again, thanks!|||On 26 Oct 2005 07:55:17 -0700, heromull wrote:

>Thank you so much for the support! I think this will work for us. My
>only other question would be about managing the surrogate column
>(RowNo). During Inserts, is there a way to manage it within SQL Server
>or would my client apps need to select max(RowNo) and increase it by 1?

Hi heromull,

A typical INSERT statement would roughly look like this:

INSERT INTO Answers (CustomerId, TableNo, RowNo, [Value])
SELECT @.CustomerId, @.TableNo, COALESCE(MAX(RowNo),0) + 1, @.Value
FROM Answers
WHERE CustomerId = @.CustomerId
AND TableNo = @.TableNo

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks! Works perfectly.

I have to run a dynamic sql that i save in the database as a TEXT data type(due to a large size of the sql.) from a .NET app. Now i have to run this sql from the stored proc that returns the results back to .net app. I am running this dynamic sql with sp_executesql like this..

EXEC sp_executesql@.Statement,N'@.param1 varchar(3),@.param2 varchar(1)',@.param1,@.param2,
GO

As i can't declare text,ntext etc variables in T-Sql(stored proc), so i am using this method in pulling the text type field "Statement".

DECLARE@.Statement varbinary(16)
SELECT@.Statement = TEXTPTR(Statement)
FROM table1
READTEXT table1.statement@.Statement 0 16566

So far so good, the issue is how to convert @.Statment varbinary to nText to get it passed in sp_executesql.

Note:- i can't use Exec to run the dynamic sql becuase i need to pass the params from the .net app and Exec proc doesn't take param from the stored proc from where it is called.

I would appreciate if any body respond to this.

Yes, the limitation of using NTEXT as variable causes the problem. I know a workaround, but... you still need EXEC to warp the execution of sp_executesql. Read this:

sp_executesql and Long SQL Strings in SQL 2000
There is a limitation with sp_executesql on SQL 2000 and SQL 7, since you cannot use longer SQL strings than 4000 characters. (On SQL 2005, use nvarchar(MAX) to avoid this problem.) If you want to use sp_executesql despite you query string is longer, because you want to make use of parameterised query plans, there is actually a workaround. To wit, you can wrap sp_executesql in EXEC():

DECLARE @.sql1 nvarchar(4000),
@.sql2 nvarchar(4000),
@.state char(2)
SELECT @.state = 'CA'
SELECT @.sql1 = N'SELECT COUNT(*)'
SELECT @.sql2 = N'FROM dbo.authors WHERE state = @.state'
EXEC('EXEC sp_executesql N''' + @.sql1 + @.sql2 + ''',
N''@.state char(2)'',
@.state = ''' + @.state + '''')
This works, because the @.stmt parameter to sp_executesql is ntext, so by itself, it does not have any limitation in size.

You can even use output parameters by using INSERT-EXEC, as in this example:

CREATE TABLE #result (cnt int NOT NULL)
DECLARE @.sql1 nvarchar(4000),
@.sql2 nvarchar(4000),
@.state char(2),
@.mycnt int
SELECT @.state = 'CA'
SELECT @.sql1 = N'SELECT @.cnt = COUNT(*)'
SELECT @.sql2 = N'FROM dbo.authors WHERE state = @.state'
INSERT #result (cnt)
EXEC('DECLARE @.cnt int
EXEC sp_executesql N''' + @.sql1 + @.sql2 + ''',
N''@.state char(2),
@.cnt int OUTPUT'',
@.state = ''' + @.state + ''',
@.cnt = @.cnt OUTPUT
SELECT @.cnt')
SELECT @.mycnt = cnt FROM #result
You have my understanding if you think this is too messy to be worth it.

So you can break the NTEXT statement from your table into several NVARCHAR(4000) strings, then pass the strings as parameter for sp_executesql. The original wonderful artilc can be found here:

http://www.sommarskog.se/dynamic_sql.html

|||Thanks, I had to break the query into pieces to get this working.

Wednesday, March 28, 2012

mapping data types

Wednesday, March 21, 2012

Many data entry tables - which ones hold records?

Monday, March 12, 2012

Manipulating Text,nText data types filed in tsql

Saturday, February 25, 2012

Managing a large row size

Managing sql Log File

Blog Archive

About Me