Skip to main content

Dealing with High Cardinality columns

Are you still struggling with slow reports and a complex data model and you have tried every approach to optimise it? One of the major reason behind your slow reports are high cardinality fields. What is this cardinality now? The simplest definition for it would be the unique values present in a particular column. 

Do note we are talking about the unique values not distinct values there is a slight difference between these two.

Want to learn more about cardinality here is a great article from Arno. If you have a background in data analysis then the term cardinality may not be alien for you. When we talk about optimising the data model then it is always advised to reduce the cardinality as more the unique values available in a particular column  more it will take the memory and impacts the performance.

In this article, we will focus on a few easy steps to reduce the cardinality of the data model. The very fundamental and the simplest step will be to only include the fields that are necessary and identify the ones which are not used at all in the report. Discard them it will give your report a slight boost.

How to identify unused column? You can do it via the help of external tools such as DAX Studio. If you are beginner to DAX Studio then do visit our most visited article on it. You can connect to DAX Studio and view the metrics that will give you this information.


In this case, we have identified we are not using Product Key. When we get rid off the Product Key the size of your report reduced to half from 12mb to 5mb.


Second, try to reap the benefits of splitting columns such Date/Time. Fundamental step will be to check do you need the time at all? In most cases you do not need it and you can change the data type to only date but if you need it then make sure date and time are split in 2 columns. 

Isn't this easy? This article mainly details with the fundamentals to optimize your reports. Another fundamental step is to check if you are dealing with the correct data type or not. How to check this? Longer route will be to go over each and every field and see the data type but as a reader of analyst in action you are PRO. 

How PROs do it? Once you are connected to DAX Studio you can view at a glance in view metrics. In the contoso dataset we have changed the data type for Quantity to decimal number. Let's see how the metrics look with the decimal number data type.


Do notice the total size column for Quantity. Now, we are reverting back to whole number and you can see the drop in the total size.


Lastly, try to avoid the use of calculated columns in your reports. Let's take an example from Contoso data, I have created a calculated column to get the cost. Now let's take a look what is exactly is happening behind the scenes. 


Whhaattt!! suddenly you can see the Cost is on the top in your metrics which is actually not good for your data model. If you take a close look at the cardinality column for the Cost it is the highest and hence slowing down your data model. Now, imagine if you have 10-20 of those calculated columns. Avoiding calculated columns is always considered as a best practice in Power BI.


"All our articles are intended to address all the frequently asked questions related to a topic. Do leave a comment if it answers any of your question. "


Thanks for Reading Let's connect on LinkedIn. For more such blogs and pro tips do follow us





Comments

Popular posts from this blog

Ultimate Beginners Guide to DAX Studio

There are zillions of external tools available with Power BI but DAX Studio is one of the most commonly used tools to work with DAX queries. It is a perfect tool to optimize the DAX and the data model. In this blog let's shed some light on the basic functionalities that can take your report to the next level. ARE YOU READY?  To start you will need the latest version of the DAX Studio. You can download it from their website . Don't worry you don't have to pay for the license. Fortunately, DAX Studio is a free tool As a BI Developer, I am using DAX Studio regularly. Based on my experience I use it for several purposes but in this blog, I will highlight the most common ones. Extracting a dump of all the measures used in your PBIX. Why do we need to do this? It can be used for documentation purposes also sometimes we try to reuse the DAX and such a dump comes in handy in this scenario. How to achieve it? Open the DAX Studio it is located under the external tools once you open t...

Identify and Delete Unused Columns & Measures

Heavy dashboards and a bad data model is a nightmare for every BI Developer. Heavy dashboards can be slow due to multiple reasons. It is always advised to stick with best practices. Are you still figuring out about those best practices then you should definitely have a quick read on Best Practice Analyser ( link ). One of the most common issues with slow dashboards is unused columns and unused measures.  It is very normal to load some extra columns and create some test measures in your dashboard but as a part of cleanup process those unused columns and unused measures should be removed. Why we are removing them? Because if you keep them then ultimately it will increase the size of your data model which is not a good practice.  How to identify the culprits (unused columns and unused measures)? In today's blog we will provide you with 2 most common external tools which will help you in identifying the culprits. More external tools😒. Who's going to pay for this? To your surprise...

Best Practice Analyser (BPA) Guide

Do you want to save tons of efforts to check if your data model and PBIX file follows the standard best practices and norms? Then this blog is for you. If you are a follower of our channel we already deep dive into the importance of the DAX Studio as an external tool. If you are a beginner I would highly recommend to visit this blog . In today's blog we will check how Tabular Editor can help to optimize the data model.  Best Practice Analyser allows to define or import best practices. It will make sure that we do not violate the best practices while developing a dashboard. Isn't it exciting!! Before we start make sure you already have Tabular Editor version 2.24.1 installed on your system. To install it do visit this link and select the link for windows installer. Once Tabular Editor is installed it will reflect in your PBIX file under external tool. Also, we need to define the standard rules. To do so in your advanced scripting or C# script copy this and save it via Ctrl+S. An...