Managing sql Log File: many DISTINCT queries

Hi,

I have a table that contains log data, usually around a million records. The
table has about 10 columns with various attributes of the logged data,
nothing special. We're using SQL Server 2000.

Some of the columns (for example "category") have duplicate values
throughout the records. We have a web page that queries the table to show
all the unique columns, for example:

select distinct CATEGORY from table TEST

Obviously the server has to scan all rows in order to get all unique columns
which takes quite a while, especially since that web page contains several
of these types of queries. We also have a MAX(DATE) and MIN(DATE) query that
also add to the load.

I already created indexes on the CATEGORY (actually on all categories)
column which might help a little but I'm pretty sure that there has got to
be a better way.

I also create a view (select distinct CATEGORY from table TEST) and tried to
index it, but it won't let me index a query that contains a DISTINCT
statement.

Isn't there a way to create an index that contains only the distinct values?
Is there another way to speed this up?

Thanks for any hints!"Florian" <REMOVEUPPERCASEwizard_oz@.gmx.net> wrote in message
news:_cOUb.13574$F23.3296@.newsread2.news.pas.earth link.net...
> Hi,
> I have a table that contains log data, usually around a million records.
The
> table has about 10 columns with various attributes of the logged data,
> nothing special. We're using SQL Server 2000.
> Some of the columns (for example "category") have duplicate values
> throughout the records. We have a web page that queries the table to show
> all the unique columns, for example:
> select distinct CATEGORY from table TEST
> Obviously the server has to scan all rows in order to get all unique
columns
> which takes quite a while, especially since that web page contains several
> of these types of queries. We also have a MAX(DATE) and MIN(DATE) query
that
> also add to the load.
> I already created indexes on the CATEGORY (actually on all categories)
> column which might help a little but I'm pretty sure that there has got to
> be a better way.
> I also create a view (select distinct CATEGORY from table TEST) and tried
to
> index it, but it won't let me index a query that contains a DISTINCT
> statement.
> Isn't there a way to create an index that contains only the distinct
values?
> Is there another way to speed this up?
>
> Thanks for any hints!

If only you load/update the data relatively infrequently, then you could
create a 'lookup' table for each attribute, and populate them from the main
table after loading it (rather like the dimensions in a star schema):

insert into dbo.Categories (Category)
select distinct Category
from dbo.Test

Your client code could then query the lookup tables instead of the log
table. If you want to use indexed views, then you could create a view like
this:

create view dbo.Categories
with schemabinding
as
select Category, count_big(*) as 'Occurrences'
from dbo.Test
group by Category

That should be indexable, although there are quite a few other restrictions,
so you would need to check them out. But having multiple indexed views on
the table would make data modifications much slower, so if the data changes
frequently you might have to use some sort of lookup table approach anyway.

Simon|||> select distinct CATEGORY from table TEST
Try experimenting with group by instead of distinct. Also in query
analyzer, enable view execution plan, to get an idea what mssql is
doing. "select category from mytable group by category"|||"Simon Hayes" <sql@.hayes.ch> wrote in message
news:4023d837$1_1@.news.bluewin.ch...
> "Florian" <REMOVEUPPERCASEwizard_oz@.gmx.net> wrote in message
> news:_cOUb.13574$F23.3296@.newsread2.news.pas.earth link.net...
> > Hi,
> > I have a table that contains log data, usually around a million records.
> The
> > table has about 10 columns with various attributes of the logged data,
> > nothing special. We're using SQL Server 2000.
> > Some of the columns (for example "category") have duplicate values
> > throughout the records. We have a web page that queries the table to
show
> > all the unique columns, for example:
> > select distinct CATEGORY from table TEST
> > Obviously the server has to scan all rows in order to get all unique
> columns
> > which takes quite a while, especially since that web page contains
several
> > of these types of queries. We also have a MAX(DATE) and MIN(DATE) query
> that
> > also add to the load.
> > I already created indexes on the CATEGORY (actually on all categories)
> > column which might help a little but I'm pretty sure that there has got
to
> > be a better way.
> > I also create a view (select distinct CATEGORY from table TEST) and
tried
> to
> > index it, but it won't let me index a query that contains a DISTINCT
> > statement.
> > Isn't there a way to create an index that contains only the distinct
> values?
> > Is there another way to speed this up?
> > Thanks for any hints!
> If only you load/update the data relatively infrequently, then you could
> create a 'lookup' table for each attribute, and populate them from the
main
> table after loading it (rather like the dimensions in a star schema):
> insert into dbo.Categories (Category)
> select distinct Category
> from dbo.Test
> Your client code could then query the lookup tables instead of the log
> table. If you want to use indexed views, then you could create a view like
> this:
> create view dbo.Categories
> with schemabinding
> as
> select Category, count_big(*) as 'Occurrences'
> from dbo.Test
> group by Category
> That should be indexable, although there are quite a few other
restrictions,
> so you would need to check them out. But having multiple indexed views on
> the table would make data modifications much slower, so if the data
changes
> frequently you might have to use some sort of lookup table approach
anyway.

Thanks, I tried the indexed view and that seems to work OK now, pretty
fast - can't complain. Data shouldn't be updated that often so that should
be OK. Otherwise I might have to go with a lookup table - but it's not a
real good solution for our scenario for reasons I'm not going to bore
anybody with :)

Thanks!|||"louis nguyen" <louisducnguyen@.hotmail.com> wrote in message
news:b0e9d53.0402061216.5d4f6e56@.posting.google.co m...
> > select distinct CATEGORY from table TEST
> Try experimenting with group by instead of distinct. Also in query
> analyzer, enable view execution plan, to get an idea what mssql is
> doing. "select category from mytable group by category"

Yes, the group thing worked great - I also analyzed the execution plan and
now it's only returning the actual number of rows I'm getting - not the
whole table anymore.

Thanks.

Wednesday, March 21, 2012

many DISTINCT queries

No comments:

Post a Comment

Managing sql Log File

Blog Archive

About Me