MDX: Block Evaluation and Surrogate Attributes

by Admin20. July 2014 19:19

In this article we'll show you one interesting approach which we discovered recently: we call it "surrogate attributes". It speeds up the performance of many calculated measures, especially those which run through dimension members of the lowest granularity level.

Let us look at the example:

SUM(    

FILTER(

NonEmpty(

existing [Customer].[Customer].[Customer].Members

, [Measures].[Internet Sales Amount]    

)

, [Measures].[Internet Order Quantity] >= 2

)

, [Measures].[Internet Order Count]

)

 

You may find that this measure is artificial. Yes, it is: I've done this measure on Adventure Works database just to demonstrate the approach. The goal is to calculate something by running on low-level objects.

In this example we need to find out how many orders made those customers whose quantity of ordered products is more than 2.

If we build a report like this

we'll get 7 seconds execution.

By analyzing the request in MDX Studio and looking at the performance counters, we found this:

  • Time : 7 sec 607 ms
  • Calc covers : 0
  • Cells calculated : 43641
  • Sonar subcubes : 4103
  • NON EMPTYs : 995
  • Autoexists : 6
  • EXISTINGs : 2422
  • SE queries : 2318
  • Flat cache insert : 0
  • Cache hits : 2649
  • Cache misses : 2098
  • Cache inserts : 982
  • Cache lookups : 4747
  • Memory Usage KB : 16884

 

43641 calculations, 4103 sonars, 2318 storage engine queries. If we look at some interesting performance counters in the case if we run a similar MDX request on a real customer's database, we notice this:

 

Unfortunately, there is no way to tell to the MSAS "please, read all the measuregroup into memory because you will need it WHOLE later." There is no such MDX or XMLA command.

 

So, what shall we do in this case? The answer is next: we need to improve our formula AND to add one trick to the dimension.

The construct "SUM(FILTER(set, condition), measure)" is a very ineffective one, and it should be replaced with "SUM(set, IIF(condition, measure, null))" to allow block evaluation working.

So, if we change the formula like this

SUM(    

existing [Customer].[Customer].[Customer].Members,

    IIF(

[Measures].[Internet Sales Amount] > 0

AND

[Measures].[Internet Order Quantity] >= 2,

 

[Measures].[Internet Order Count],

null

)

)

 

we will get much better performance:

  • Time : 179 ms
  • Calc covers : 0
  • Cells calculated : 250
  • Sonar subcubes : 72
  • NON EMPTYs : 1
  • Autoexists : 0
  • EXISTINGs : 71
  • SE queries : 159
  • Flat cache insert : 0
  • Cache hits : 285
  • Cache misses : 0
  • Cache inserts : 0
  • Cache lookups : 285
  • Memory Usage KB : 0

 

We can stop out optimization efforts, of course. But this is not enough: we run 159 storage engine queries instead of one, and I still don't like it.

Why this happens? Just because for every non-empty cell the set

existing [Customer].[Customer].[Customer].Members

 

is different. But how can we make so that this set is the same for every cell?

This can be done with the help of a trick: let us add a new attribute to the Customers dimension with the same key as the key attribute and with no specified name:

 

Process the cube, and rewrite the formula like this:

SUM(    

[Customer].[Customer Surr].[Customer Surr].Members,

IIF(

[Measures].[Internet Sales Amount] > 0

AND

[Measures].[Internet Order Quantity] >= 2,

 

[Measures].[Internet Order Count],

null

)

)

 

Now look at the result:

  • Time : 25 ms
  • Calc covers : 0
  • Cells calculated : 250
  • Sonar subcubes : 1
  • NON EMPTYs : 1
  • Autoexists : 0
  • EXISTINGs : 0
  • SE queries : 3
  • Flat cache insert : 0
  • Cache hits : 3
  • Cache misses : 0
  • Cache inserts : 0
  • Cache lookups : 3
  • Memory Usage KB : 0

 

Isn't it impressive?

We've removed "existing" keyword because we did not need it any longer: the surrogate attribute evaluates to null in all those non-existing coordinates. So, if we look at this cell

 

It will sum to

(British Columbila, Customer1_Surrogate, measure) +

(British Columbila, Customer2_ Surrogate, measure) +

. . . + (enumerated all customers) +

(British Columbila, CustomerN_ Surrogate, measure)

 

But for all those customers who do not belong to British Columbia, such tuples will automatically evaluate to null, because attributes "State-Province" and "Customer Surr" are INDEPENDENT on each other.

If you used the real dimension key (instead of surrogates) in the formula, then IT WOULD NOT EVALUATE TO NULL, because real key "vanishes" the British Columbia (because the "State-Province" is dependent on the real key). That is why real key attribute doesn't suit us.

Conclusion

We optimized everything from 7.6 seconds to 179 milliseconds – in 42.4 times, and we COULD STOP on this, but we did not. Instead, we made one more optimization – from 179 milliseconds to 25 milliseconds, that is in 7.16 more times.

On this example with Adventure Works this may seem as "not important". But believe me, on real huge databases this plays a big role.

From now we try to make surrogate attributes every time in every dimension, and we always consider using them instead of real keys in all calculations where we have enumeration of the values.

P.S.

You can get acquainted with our BI suite "Business Analysis Tool" at our website www.bitimpulse.com

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

About company

BIT Impulse - a software development company, a vendor of a proprietary BI system called "Business Analysis Tool".

Web site: www.bitimpulse.com

Calendar

<<  April 2024  >>
MoTuWeThFrSaSu
25262728293031
1234567
891011121314
15161718192021
22232425262728
293012345

View posts in large calendar