Showing posts with label forms. Show all posts
Showing posts with label forms. Show all posts

Wednesday, March 21, 2012

Many data entry tables - which ones hold records?

We have an asp.net app with about 200 data entry forms. Customers may
enter data into any number of forms. Each form's data is persisted in
a corresponding sql table. When data entry is complete, it needs to be
processed. Here's where the questions start.

How can we easily determine in which tables a customer has data and how
best to select that data?

We're not opposed to putting all the data in a single table. This
table would wind up having ~15 million records and constantly have CRUD
operations performed against it by up to 5000 users simultaneously.
With sufficient hardware, is this too much to ask of the db?heromull (heromull@.gmail.com) writes:
> We have an asp.net app with about 200 data entry forms. Customers may
> enter data into any number of forms. Each form's data is persisted in
> a corresponding sql table. When data entry is complete, it needs to be
> processed. Here's where the questions start.
> How can we easily determine in which tables a customer has data and how
> best to select that data?
> We're not opposed to putting all the data in a single table. This
> table would wind up having ~15 million records and constantly have CRUD
> operations performed against it by up to 5000 users simultaneously.
> With sufficient hardware, is this too much to ask of the db?

Whether the 200 tables should be 1, 10, 74, or 200 is impossible to tell
from without knowledge about what's in them.

But from a performance point of view, it would not really matter whether
it's one or two hundred tables. Provided, that is, the the single table
has proper indexing.

There are a couple a ways of finding to find data to process:

1) Timestamp column. A timestamp column is automatically updated by SQL
Server with a database-unique value that is monotonically increasing.
(Binary, completely unrelated to date and time). The process that looks
for data would keep track of the most recent timestamp per table, and
retrieve the rows with higher timestamp value. If the process updates the
rows itself, it needs to combine the lookup with a status column. The
drawback with this solution is that the timestamp column must be indexed,
and since it's updated each time the row is updated, there will be a lot
of shuffling around in that index.

2) IDENTITY column. All tables would have an identity column, and then
the process would keep track of the most recently processed value. With
this solution you can only handle inserts, not if users update existing
data.

3) Having triggers on the that enters data about rows to process into a
table. Again, you may need a mechanism to differentiate between user-entered
changes and changes from your processing.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thanks, the data to select for processing is not time based, it's based
on whatever tables a customer has data in at the time of processing,
which is determined by the customer. So, option three would be the
best and is similar to our current process. Also, the data is not
updated by the processing, only selected. Even with a table that tells
us which tables have data, how would we build the select statement(s)
to only include those tables with data?|||heromull (heromull@.gmail.com) writes:
> Thanks, the data to select for processing is not time based, it's based
> on whatever tables a customer has data in at the time of processing,
> which is determined by the customer. So, option three would be the
> best and is similar to our current process.

Not that see what time has to do with it. I did say timestamp, but
the timestamp datatype has nothing to do with time. Then again, if you
already have a process similar to option three, then go with that.

> Also, the data is not updated by the processing, only selected. Even
> with a table that tells us which tables have data, how would we build
> the select statement(s) to only include those tables with data?

I can not say that. I don't know your tables. I don't know in which context
this process runs etc. I would assume that since there are 200 tables,
that you would have a stored procedure or a SELECT statement for that
table, as I would assume that all tables would generally have a different
set of columns. If they all have the same schema, then there is a strong
indication of that you should have one single table.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||"heromull" <heromull@.gmail.com> wrote in message
news:1130029056.070783.59510@.g14g2000cwa.googlegro ups.com...
> We have an asp.net app with about 200 data entry forms. Customers may
> enter data into any number of forms. Each form's data is persisted in
> a corresponding sql table. When data entry is complete, it needs to be
> processed. Here's where the questions start.
> How can we easily determine in which tables a customer has data and how
> best to select that data?
> We're not opposed to putting all the data in a single table. This
> table would wind up having ~15 million records and constantly have CRUD
> operations performed against it by up to 5000 users simultaneously.
> With sufficient hardware, is this too much to ask of the db?

Don't design a database schema around a user interface. Design your database
schema around your DATA and BUSINESS RULES, then build a data access layer
that supports the UI. You say "Each form's data is persisted in a
corresponding sql table". This make little sense to me as a description of a
data model. I suspect (admittedly on the basis of too little information)
that with a better design your perceived problems would disappear. The
stataement "We're not opposed to putting all the data in a single table"
also suggests a very arbitrary approach to database design. Is your current
design a normalized one? If so, I don't understand your confusion about "How
can we easily determine in which tables a customer has data". The answer is
presumably that you use the Customer key - whatever that is in your model.

--
David Portas
SQL Server MVP
--|||Erland Sommarskog wrote:
...
> If they all have the same schema, then there is a strong
> indication of that you should have one single table.
...

Yes, this is the case. They share the same schema.

David Portas wrote:
> Don't design a database schema around a user interface.

I think this is our first mistake.

> Design your database
> schema around your DATA and BUSINESS RULES, then build a data access layer
> that supports the UI. You say "Each form's data is persisted in a
> corresponding sql table". This make little sense to me as a description of a
> data model. I suspect (admittedly on the basis of too little information)
> that with a better design your perceived problems would disappear.

I think you're right. It's that "better design" that we're
researching.

> The
> stataement "We're not opposed to putting all the data in a single table"
> also suggests a very arbitrary approach to database design.

It does and I say that only to mean that it's not too late for us to
implement a better design.

> Is your current
> design a normalized one? If so, I don't understand your confusion about "How
> can we easily determine in which tables a customer has data". The answer is
> presumably that you use the Customer key - whatever that is in your model.

Here's DDL that describes our current "model" (I hear you laughing
already). Notice that all 200 talbes have the same schema. If we
determine a single table approach would be a better design, what would
be some areas of concern? Obviously indexing is on the list.

CREATE TABLE [Customer] (
[CustomerId] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (100),
CONSTRAINT [PK_Customer] PRIMARY KEY NONCLUSTERED
(
[CustomerId]
)
)
--we have ~200 tables with this table's schema
CREATE TABLE [Table1] ( --Table1 thru Table200
[Table1Id] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NOT NULL ,
[Value] [nchar] (1024),
CONSTRAINT [PK_Table1] PRIMARY KEY NONCLUSTERED
(
[Table1Id]
) ,
CONSTRAINT [FK_Table1_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
--each of the 200 tables have a trigger
--similar to this (i'll omit the delete trigger from the DDL)
create trigger
Table1CreateInterview
on
Table1
for insert
as
begin
declare @.CustomerId int
declare @.RecordId int
select
@.RecordId = Table1Id,
@.CustomerId = CustomerId
from
inserted

insert into
interview
(
CustomerId,

TableId,--represents the table caused the trigger eg. 1=Table1
RecId--the value of the primary key of the record causing the trigger
)
values
(
@.CustomerId,
1,
@.RecordId
)
end
GO
--a record is inserted into this table each
--time a record is written to any of the 200 tables
--a record is deleted from this table each time
--a record is deleted from any of the 200 tables
CREATE TABLE [Interview] (
[InterviewId] [int] IDENTITY (1, 1) NOT NULL ,
[CustomerId] [int] NULL ,
[TableId] [int] NOT NULL ,
[RecId] [int] NULL
CONSTRAINT [PK_Interview] PRIMARY KEY NONCLUSTERED
(
[InterviewId]
),
CONSTRAINT [FK_Interview_Customer] FOREIGN KEY
(
[CustomerId]
) REFERENCES [Customer] (
[CustomerId]
)
)
GO
insert into customer(name) values ('some name')
insert into table1(customerid, value) values (1, 'my value in table1')|||heromull (heromull@.gmail.com) writes:
>> Don't design a database schema around a user interface.
> I think this is our first mistake.

An unusually candid confession!

>> If they all have the same schema, then there is a strong
>> indication of that you should have one single table.
> ...
> Yes, this is the case. They share the same schema.

You should make them one table, adding one more column that specifies
the entity that is now hidden in the table name. That table would look
like:

CREATE TABLE [Answers] ( --Table1 thru Table200
[CustomerId] [int] NOT NULL ,
TableNo int NOT NULL,
RowNo smallint NOT NULL,
[Value] [nchar] (1024),
PRIMARY KEY (CustomerID, TableNo, RowNo)

I've added RowNo, beause I don't know if one customer can add more
than one value in the same form. If he can't RowNo should not be
there. I've very deliberate taken out the IDENTITY column, because
this table should have a composite key. My RowNo is indeed a surrogate,
but only in the sub-realm of CustomerId, TableNo. (And TableNo is just
a name I use, in lack of knowledge about the business domain.)

I put the key on CustomerID, TableNo, but depending how you use the
table, you may also have a need for an index on (TableNo, CustomerID).
CustomerID first is good for queries like "what is customer 1233 up to"?,
but not for "What do we have in table 12?".

> It does and I say that only to mean that it's not too late for us to
> implement a better design.

That's great to hear!

> begin
> declare @.CustomerId int
> declare @.RecordId int
> select
> @.RecordId = Table1Id,
> @.CustomerId = CustomerId
> from
> inserted

Uh-uh, classic mistake. A trigger fires once per statement, not once
per row. Yes, as long as data through that form, it will only come
one by one, but then suddenly there is a batch processing loading lots
of data at the same time. So write your trigger as:

INSERT interview (...)
SELECT ...
FROM inserted

> insert into
> interview

And INSERT without a column list is very poor practice in production code.
Someone adds a column to the table, and the statement blows up. That's
bad.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Erland Sommarskog wrote:
> An unusually candid confession!

Acceptance is the first step right?

Anyway, thank you so much for the support! I think the suggestions are
going to help us out a lot. A lightbulb went off over my head when I
read your composit key explaination.|||Thank you so much for the support! I think this will work for us. My
only other question would be about managing the surrogate column
(RowNo). During Inserts, is there a way to manage it within SQL Server
or would my client apps need to select max(RowNo) and increase it by 1?

Again, thanks!|||On 26 Oct 2005 07:55:17 -0700, heromull wrote:

>Thank you so much for the support! I think this will work for us. My
>only other question would be about managing the surrogate column
>(RowNo). During Inserts, is there a way to manage it within SQL Server
>or would my client apps need to select max(RowNo) and increase it by 1?

Hi heromull,

A typical INSERT statement would roughly look like this:

INSERT INTO Answers (CustomerId, TableNo, RowNo, [Value])
SELECT @.CustomerId, @.TableNo, COALESCE(MAX(RowNo),0) + 1, @.Value
FROM Answers
WHERE CustomerId = @.CustomerId
AND TableNo = @.TableNo

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thanks! Works perfectly.

Wednesday, March 7, 2012

Managing Multiple Excel incoming files?

Managing Multiple Excel incoming files?
I have this situation. I have a client which have several locations they
work primary with excel forms and they are thinking in doing reports with
them. What they want is to import those Excel files into SQL Server for
later using crystal reports or maybe asp.net for reporting. Those files
will come every week from diferent locations then go to SQL Server and then
they should be stored in a folder in the server for backup. So this means
that i have to deal with the importing but also with the file naming. This
is because they want the reports to be date based. So i have to ask them to
include the date as part of the file name or changing the file name after
procesing the file if I want to store them all in the same folder after
procesing.
So the real question is how do I deal with multiple file in regular basis
bye changing the name of the file or enforcing they send me the file with
the date already included. If the recomendation is the second one how can i
tell the system to change the files after the procesing?
Is there any easy way or tool to achive this? is there any way i can
configure DTS to do that?
EliezerThe easiest way would probably be to use a script task in DTS and us the
File System Object to manipulate the files. It can be done from TSQL but its
pretty kludgy. Using FSO from DTS will allow you a lot more flexibility and
control.
HTH
Jasper Smith (SQL Server MVP)
I support PASS - the definitive, global
community for SQL Server professionals -
http://www.sqlpass.org
"Eliezer Figueroa" <efigueroa@.propoint.com.do> wrote in message
news:Xns9479AA48BF55Aefigueroapropointco
m@.216.196.97.132...
quote:

> Managing Multiple Excel incoming files?
> I have this situation. I have a client which have several locations they
> work primary with excel forms and they are thinking in doing reports with
> them. What they want is to import those Excel files into SQL Server for
> later using crystal reports or maybe asp.net for reporting. Those files
> will come every week from diferent locations then go to SQL Server and

then
quote:

> they should be stored in a folder in the server for backup. So this means
> that i have to deal with the importing but also with the file naming. This
> is because they want the reports to be date based. So i have to ask them

to
quote:

> include the date as part of the file name or changing the file name after
> procesing the file if I want to store them all in the same folder after
> procesing.
> So the real question is how do I deal with multiple file in regular basis
> bye changing the name of the file or enforcing they send me the file with
> the date already included. If the recomendation is the second one how can

i
quote:

> tell the system to change the files after the procesing?
> Is there any easy way or tool to achive this? is there any way i can
> configure DTS to do that?
>
> Eliezer

Managing Multiple Excel incoming files?

Managing Multiple Excel incoming files?
I have this situation. I have a client which have several locations they
work primary with excel forms and they are thinking in doing reports with
them. What they want is to import those Excel files into SQL Server for
later using crystal reports or maybe asp.net for reporting. Those files
will come every week from diferent locations then go to SQL Server and then
they should be stored in a folder in the server for backup. So this means
that i have to deal with the importing but also with the file naming. This
is because they want the reports to be date based. So i have to ask them to
include the date as part of the file name or changing the file name after
procesing the file if I want to store them all in the same folder after
procesing.
So the real question is how do I deal with multiple file in regular basis
bye changing the name of the file or enforcing they send me the file with
the date already included. If the recomendation is the second one how can i
tell the system to change the files after the procesing?
Is there any easy way or tool to achive this? is there any way i can
configure DTS to do that?
EliezerThe easiest way would probably be to use a script task in DTS and us the
File System Object to manipulate the files. It can be done from TSQL but its
pretty kludgy. Using FSO from DTS will allow you a lot more flexibility and
control.
--
HTH
Jasper Smith (SQL Server MVP)
I support PASS - the definitive, global
community for SQL Server professionals -
http://www.sqlpass.org
"Eliezer Figueroa" <efigueroa@.propoint.com.do> wrote in message
news:Xns9479AA48BF55Aefigueroapropointcom@.216.196.97.132...
> Managing Multiple Excel incoming files?
> I have this situation. I have a client which have several locations they
> work primary with excel forms and they are thinking in doing reports with
> them. What they want is to import those Excel files into SQL Server for
> later using crystal reports or maybe asp.net for reporting. Those files
> will come every week from diferent locations then go to SQL Server and
then
> they should be stored in a folder in the server for backup. So this means
> that i have to deal with the importing but also with the file naming. This
> is because they want the reports to be date based. So i have to ask them
to
> include the date as part of the file name or changing the file name after
> procesing the file if I want to store them all in the same folder after
> procesing.
> So the real question is how do I deal with multiple file in regular basis
> bye changing the name of the file or enforcing they send me the file with
> the date already included. If the recomendation is the second one how can
i
> tell the system to change the files after the procesing?
> Is there any easy way or tool to achive this? is there any way i can
> configure DTS to do that?
>
> Eliezer