Showing posts with label groups. Show all posts
Showing posts with label groups. Show all posts

Friday, March 30, 2012

Mapping two measure groups with different dimension usage

Hello.

I am looking for a solution for the following situation:

I have a measure group for my business facts and a measure group for tolerances and targets.

The first measure group uses a date dimension, a site dimension, a project dimension and an employee dimension.

The second group shares all dimensions except the employee dimension.

Lets say I have the business fact sales per hour.
The target value for this fact is stored in the second measure group without the employee info. The targets are valid for all employees at that site, project and day.

When i browse the cube i want see the targets next to the actual data - even on employee level.

Therefore i have to set IgnoreUnrelatedDimensions in the properties of the second measure group to True.
But when i do this i get an entry for every employee present in the employee dimension, showing the tolerance and target.

I am lost at the moment and belive that there must be an elegant solution. Hopefully someone has an idea?

Thanks,
ThomasFrom your description, it's not clear in which cases the IgnoreUnrelatedDimensions doesn't meet your needs - could you describe some scenarios? You could selectively use the MDX ValidMeasure() function in calculations, instead of applying IgnoreUnrelatedDimensions.|||Hello Deepak.

Thanks for your reply.

I'll try to explain my problem better with an example.

Data in my Tolerance FactTable:

DateProjectSiteTolerance_Sales_per_Hour20070320Upsell1Hamburg220070321Upsell1Hamburg220070322Upsell1Hamburg3

Data in my Sales FactTable:

DateProjectSiteEmployeeSales_Per_Hour20070320Upsell1HamburgBob1.520070320Upsell1HamburgMike2.120070321Upsell1HamburgBob2.720070321Upsell1HamburgMike3.320070322Upsell1HamburgBob1.2

When I select 20070322 as Date in the Browser, i get a row for Mike, although he dind't work on 22th.

DateProjectSiteEmployeeSales_Per_HourTolerance_Sales_per_Hour20070322Upsell1HamburgBob1.2320070322Upsell1HamburgMike3

Thanks to a tip from Markus i am using Scope-Statements at the moment

SCOPE([Measures].[Tolerance_Sales_Per_Hour]);
THIS=IIF (ISEMPTY([Measures].[Sales_Per_Hour]),NULL,[Tolerance_Sales_Per_Hour]);
END SCOPE;


But we are not sure, if this is the best solution.

Best regards,
Thomas

Monday, March 26, 2012

Many-to-Many MDX

I have a measure called MyMeasure.

There is a dimension called Account

There is a dimension called AccountGroup that groups many accounts so it references the cube through a Many-to-Many relationship. The account group has a field called Group Level. If for example, I want to know the number of accounts for a region I would do something like this:

SELECT {[Account Group].[Account Group].[New South Wales] } ON ROWS,

{[Measures].[MyMeasure]} ON COLUMNS

FROM [MyCube]

WHERE ([Account Group].[Group Level].[Region])

This will give me:

--

| Account Group | MyMeasure |

--

| New South Wales | 35 |

--

I need the MDX query to get MyMeasure by grade which is another Account Group with Group Level 'Grade' for a particular region (Another Account Group with Group Level 'Region'). The result I am looking for, will look something like this

--

| Account Group | Account Group | MyMeasure |

--

| New South Wales | Grade A | 15 |

| New South Wales | Grade B | 4 |

| New South Wales | Grade C | 6 |

| New South Wales | Grade D | 10 |

-

It is like getting the measures for all accounts that belong to both group levels (one with a known name) grouping by accounts. I appreciate any help ...

Any ideas?...|||

Hi Mario,

Could you explain the AccountGroup and the intermediate measure group source data a bit more - I still can't understand the purpose of this many-many dimension, from a data modelling perspective?

|||

In the DW: every fact table has an AccountKey:

DimAccount (AccountKey, Name,...)

BridgeAccountToAccountGroup (AccountKey, AccountGroupKey)

DimAccountGroup (AccountGroupKey, Name,...)

In the cube, I have DimAccountGroup as a dimension but "behind the scenes" it is referencing DimAccount as a Many-to-Many dimensions through the BridgeAccountToAccountGroup "Measure Group".

|||

That's the technical detail, but from your example, it seemed like the members of AccountGroup were so diverse (eg: New South Wales, Grade A) that they normally would belong in different dimensions. Any insight or reason for that?

An easier option might be to translate your requirement into Adventure Works - for example, using the SalesReason dimension, which has many-many relations. You did mention this in your new post, but can you give a more detailed scenario, using actual data from Adventure Works?

|||

Deepak,

I know attributes like region and all that should be part of the Account dimension, the thing is that some accounts can belong to many region groups or grades. Groups are created by users in the OLTP system so we don't have any control over these, this is why I had to create a Many-to-Many relationship.

For example the DimAccountGroup has the following attributes.

DimAccountGroup (AccountGroupKey, Name, Level)

This dimension could have the following attributes:

1, New South Wales, Region

2, Queensland, Region

3, Grade A, Grade

4, Grade B, Grade

The bridge table will tell me the relationships between Accounts and Account Groups. So what I am trying to do is to get some measures for a set of accounts that belong to two or more (intersection?) predefined account groups but I don't want to see the measures by account, just by account group.

If I wanted to do the same thing with different dimensions it would be very easy:

SELECT
NON EMPTY {
[Product Group].[Product Group].[All].Children
} ON ROWS,
{
[Measures].[Sales]
} ON COLUMNS
FROM [MyCube]
WHERE (
[Account Group].[Group Level Identifier].[Region],
[Account Group].[Account Group].[New South Wales],
[Product Group].[Group Level Identifier].[Brand]
)


The problem is I am using the same dimension twice and it doesn't have a direct relation to the fact table, it is indirectly referenced through the bridge table and the account table (which is another dimension). It is like trying to do something like this (I know this query is wrong):

SELECT
NON EMPTY {
[Account Group].[Account Group].[All].Children //Account group "2"
} ON ROWS,
{
[Measures].[Sales]
} ON COLUMNS
FROM [MyCube]
WHERE (
[Account Group].[Group Level Identifier].[Region], //Account Group "1"
[Account Group].[Account Group].[New South Wales], //Account Group "1"
[Account Group].[Group Level Identifier].[Grade] //Account Group "2"
)

|||

Mario,

Here are sample SQL and MDX queries from Adventure Works, using the SalesReason many-many dimension. These compute Internet Sales Amount for all orders which have both a [Quality] and a [Manufacturer] reason. Hopefully, it is similar to what you want to do with [AccountGroup]:

SQL:

>>

select count(*) as SalesOrders,

sum(fs.SalesAmount) as SalesAmount

from dbo.FactInternetSalesReason fr1

join dbo.FactInternetSalesReason fr2

on fr1.SalesOrderNumber = fr2.SalesOrderNumber

and fr1.SalesOrderLineNumber = fr2.SalesOrderLineNumber

join dbo.FactInternetSales fs

on fr1.SalesOrderNumber = fs.SalesOrderNumber

and fr1.SalesOrderLineNumber = fs.SalesOrderLineNumber

where fr1.SalesReasonKey = 5

and fr2.SalesReasonKey = 9

-

1551 5549896.77

>>

MDX:

>>

select {[Measures].[Internet Sales Amount],
[Measures].[Internet Order Quantity]} on 0
from [Adventure Works]
where Intersect(Exists(
[Internet Sales Order Details].[Internet Sales Orders].[Internet Sales Order],
{[Sales Reason].[Sales Reason].[Quality]}, "Internet Sales"),
Exists(
[Internet Sales Order Details].[Internet Sales Orders].[Internet Sales Order],
{[Sales Reason].[Sales Reason].[Manufacturer]}, "Internet Sales"))

Internet Sales Amount Internet Order Quantity
$5,549,896.77 1,551

>>

|||

Thank you very much Deepak,

I knew I had to use Intersect but I had no idea where to start. Here is the query I needed:

SELECT
NON EMPTY {
[Account Group].[Account Group].[All].Children
} ON ROWS,
{
[Measures].[MyMeasure]
} ON COLUMNS
FROM [MyCube]
WHERE
(
[Date].[Calendar Month Year].[Jul 2005],
Intersect
(
Exists
(
[Account].[Account].[Account],
{
(
[Account Group].[Account Group].[New South Wales],
[Account Group].[Group Level Identifier].[Region]
)
},
"Account To Account Group"
)
,
Exists
(
[Account].[Account].[Account],
{
(
[Account Group].[Account Group].[All],
[Account Group].[Group Level Identifier].[Grade]
)
},
"Account To Account Group"
)
),
[Account Group].[Group Level Identifier].[Grade]
)

In the DW the Bridge table is called BridgeAccountToAccountGroup which is a measure group in the cube called "Account to Account Group".

|||

Mario, I'm glad you found a solution - just wanted to add a nested approach, which might be more efficient:

SELECT
NON EMPTY {
[Account Group].[Account Group].[All].Children
} ON ROWS,
{
[Measures].[MyMeasure]
} ON COLUMNS
FROM [MyCube]
WHERE
(
[Date].[Calendar Month Year].[Jul 2005],
Exists
(
Exists
(
[Account].[Account].[Account],
{
(
[Account Group].[Account Group].[New South Wales],
[Account Group].[Group Level Identifier].[Region]
)
},
"Account To Account Group"
),
{
(
[Account Group].[Account Group].[All],
[Account Group].[Group Level Identifier].[Grade]
)
},
"Account To Account Group"
),
[Account Group].[Group Level Identifier].[Grade]
)

Friday, March 23, 2012

many tables vs one big table

Dear experts,

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?

To summarize:

1. Should I use a single table or many tables?

2. If using a single table, how are inserts handled by the DB to
prevent inserts from getting slower and slower as more items are added
to the DB?

Thanks in advance for your help and advice.Use a single table, not many tables. Use a multi-column key that
includes the stock. When a new stock comes along it should be a user
transaction to add a row to the Stocks master table, not a database
administrator's job to add tables. The database design should be
stable, not in a constant state of change.

Performance of INSERTS does not change much between inserting to a
table with 10000 rows or 10000000 rows.

Roy Harvey
Beacon Falls, CT

On 2 Jan 2007 08:57:21 -0800, "Emin" <emin.shopper@.gmail.comwrote:

Quote:

Originally Posted by

>Dear experts,
>
>If I have a number of related groups of data (e.g., stock prices for
>different stocks) is it better to put them into many tables (e.g., one
>table per stock) or into one big table (e.g., a single table where one
>column is the stock id other columns are the price and other data)?
>
>The single table approach with a clustered primary key including the
>stock id and other information that is different for each stock seems
>cleaner to me. But I worry that having a single table could hurt
>preformance when the database gets very large. The clustered primary
>key should make queries on the single table fast, but I'm worried about
>inserts. According to my vague understanding of a clustered primary
>key, records are physically stored on disk based on the primary key. So
>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?
>
>To summarize:
>
1. Should I use a single table or many tables?
>
2. If using a single table, how are inserts handled by the DB to
>prevent inserts from getting slower and slower as more items are added
>to the DB?
>
>Thanks in advance for your help and advice.

|||Emin (emin.shopper@.gmail.com) writes:

Quote:

Originally Posted by

If I have a number of related groups of data (e.g., stock prices for
different stocks) is it better to put them into many tables (e.g., one
table per stock) or into one big table (e.g., a single table where one
column is the stock id other columns are the price and other data)?


There are two reasons why you would have more than one table:

1) Information is so different for different (groups of) stocks,
that you get different set of columns.

2) The table is really big. Then you would might make into a partioned
view or table, based on condition. But that would not be one table
per stock id.

So, in the logical design, that's one table.

Quote:

Originally Posted by

The single table approach with a clustered primary key including the
stock id and other information that is different for each stock seems
cleaner to me. But I worry that having a single table could hurt
preformance when the database gets very large. The clustered primary
key should make queries on the single table fast, but I'm worried about
inserts. According to my vague understanding of a clustered primary
key, records are physically stored on disk based on the primary key. So
if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?


If the PK is monotonically ascending new rows are just added at the
end, all continues smoothly. If the PK is not ascending, but for instance
a random GUID, then you will get a lots of page splits, and that
takes extra power. But the answer in this case is that you should
not cluster on the GUID, but on something else. It's perfectly possible
to have the clustered index on some other column(s).

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 2 Jan 2007 08:57:21 -0800, Emin wrote:

(snip)

Quote:

Originally Posted by

>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?


Hi Emin,

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:

1. There's still space left on the page. The row is added and the insert
is finished.

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis|||Dear Roy, Erland, and Hugo,

Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?

Thanks again,
-Emin

On Jan 2, 5:53 pm, Hugo Kornelis <h...@.perFact.REMOVETHIS.info.INVALID>
wrote:

Quote:

Originally Posted by

On 2 Jan 2007 08:57:21 -0800, Emin wrote:
>
(snip)
>

Quote:

Originally Posted by

if I insert a record for a given stock does the database have to
physicall "move down" all the records that are below that or does it do
something smarter?Hi Emin,


>
Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.
>
Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.
>
If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:
>
1. There's still space left on the page. The row is added and the insert
is finished.
>
2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.
>
If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.
>
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis

|||One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?

Thanks again,
-Emin

On Jan 3, 1:42 pm, "Emin" <emin.shop...@.gmail.comwrote:

Quote:

Originally Posted by

Dear Roy, Erland, and Hugo,
>
Thank you very much for your excellent responses. It sounds like a
single logical table is the way to go since the frequency of inserts is
not very high. If necessary, where would I get more information about
low level details like page splits (e.g., how to control the page size,
how to determine when and how many page splits are occuring, etc.)?
>
Thanks again,
-Emin
>
On Jan 2, 5:53 pm, Hugo Kornelis <h...@.perFact.REMOVETHIS.info.INVALID>
wrote:
>

Quote:

Originally Posted by

On 2 Jan 2007 08:57:21 -0800, Emin wrote:


>

Quote:

Originally Posted by

(snip)


>

Quote:

Originally Posted by

Quote:

Originally Posted by

>if I insert a record for a given stock does the database have to
>physicall "move down" all the records that are below that or does it do
>something smarter?Hi Emin,


>

Quote:

Originally Posted by

Roy answered most of your questions, but was friendly enough to leave
this bit for me to comment on.


>

Quote:

Originally Posted by

Yes, it does something smarter. Rows are *NOT* physically stored in the
order imposed by the clustered index. The clustered index imposes a
*logical* ordering on the data, which is implemented by pointers that
form a logical chain of database pages.


>

Quote:

Originally Posted by

If a row in inserted, SQL Server first navigates the clustered index to
find the correct location (database page). Then, there are two
possibilities:


>

Quote:

Originally Posted by

1. There's still space left on the page. The row is added and the insert
is finished.


>

Quote:

Originally Posted by

2. There's not enough free space on the page. A page split occurs: half
of the data on the page is moved to a newly allocated page and pointers
are updated to reflect the location of the new page in the chain. After
that, the new row is inserted.


>

Quote:

Originally Posted by

If your database has to process *LOTS* of inserts (in the order of
thousands per second or more), it makes sense to define your clustered
index such that new rows are always inserted at the logical end of the
pointer chain. In that case, page splits will never happen.
For lower amounts of inserts, the overhead of a page split is
insignificant.


>

Quote:

Originally Posted by

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog:http://sqlblog.com/blogs/hugo_kornelis

|||Emin (emin.shopper@.gmail.com) writes:

Quote:

Originally Posted by

One more question, is there a process similar to defragmentation that I
should run periodically to collect split pages together?


Yes, DBCC DBREINDEX on SQL 2000. The preferred syntax on SQL 2005 is
ALTER INDEX WITH REBUILD or something like that.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||On 3 Jan 2007 10:42:46 -0800, Emin wrote:

Quote:

Originally Posted by

If necessary, where would I get more information about
>low level details like page splits (e.g., how to control the page size,
>how to determine when and how many page splits are occuring, etc.)?


Hi Emin,

For that kind of information, the only serious resource are the books
written by Kalen Delaney. Quite advanced stuff, definitely not entry
level, but that's unavoidable if you want to know about page splits.

--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis