Skip to main content

Exploratory Data Analysis (EDA)

Have you heard of EDA? Let's, deep dive into Exploratory data analysis (EDA). It is the analysis used by a data scientist to spot the patterns, trends, and hypotheses to manipulate the data for getting your questions answered. It is a vital step to get the most out of any data and also it is critical because it provides a better picture of the relationship between different variables. EDA is mainly practiced to get a better understanding of your data before making any assumptions. It answers some basic questions related to data such as the confidence interval, standard deviation, and relationship that exist between different variables. EDA is a soul for any data analysis. According to John Tukey


Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.



There are mainly 4 types of EDA-

  • Univariate non-graphical- It only deals with a single variable so it is obvious that there will be no relationship that can be identified using it. It is used to identify the patterns in data. Overall there is no graph in such analysis so to get a clear picture you need another analysis.
  • Univariate- graphical- It resolves the issue as it provides different graphics such as histograms and boxplots. 
  • Multivariate non-graphical- Unlike univariate non-graphical, it deals with multiple variables and therefore it helps in explaining the relationship that exists between different variables.
  • Multivariate graphical- It takes the help of different graphics to show the relationship between different variables. Graphics such as scatter plots, heat maps, and bubble charts.

Let's focus on the tools which help us to perform EDA. All data visualization tools can perform EDA. It can be performed in Tableau, R, QlikView, Python, and many others. In my opinion, I have performed EDA in Tableau and I have realized it is better to perform it before any machine learning model.  The main question is how to do EDA?

To start off we need to import the right dataset, identify the columns in the dataset, check the number of observations, and check whether the dataset contains any null value. After these steps, you need to focus on your categorical variables. Before doing this you need to clean your data in this step all the waste or unwanted data is removed. Predictive analysis can be performed on the clean dataset.

Mostly EDA is a basic step that is overlooked many times but it is necessary to figure out the best-suited model for your analysis. EDA plays a crucial role in determining the research hypothesis. It is an asset to any data scientist as it is the first step before performing any machine learning operations. Results obtained from EDA should be valid and can be applied to the business context.

Comments

Popular posts from this blog

Identify and Delete Unused Columns & Measures

Heavy dashboards and a bad data model is a nightmare for every BI Developer. Heavy dashboards can be slow due to multiple reasons. It is always advised to stick with best practices. Are you still figuring out about those best practices then you should definitely have a quick read on Best Practice Analyser ( link ). One of the most common issues with slow dashboards is unused columns and unused measures.  It is very normal to load some extra columns and create some test measures in your dashboard but as a part of cleanup process those unused columns and unused measures should be removed. Why we are removing them? Because if you keep them then ultimately it will increase the size of your data model which is not a good practice.  How to identify the culprits (unused columns and unused measures)? In today's blog we will provide you with 2 most common external tools which will help you in identifying the culprits. More external tools😒. Who's going to pay for this? To your surprise...

Best Practice Analyser (BPA) Guide

Do you want to save tons of efforts to check if your data model and PBIX file follows the standard best practices and norms? Then this blog is for you. If you are a follower of our channel we already deep dive into the importance of the DAX Studio as an external tool. If you are a beginner I would highly recommend to visit this blog . In today's blog we will check how Tabular Editor can help to optimize the data model.  Best Practice Analyser allows to define or import best practices. It will make sure that we do not violate the best practices while developing a dashboard. Isn't it exciting!! Before we start make sure you already have Tabular Editor version 2.24.1 installed on your system. To install it do visit this link and select the link for windows installer. Once Tabular Editor is installed it will reflect in your PBIX file under external tool. Also, we need to define the standard rules. To do so in your advanced scripting or C# script copy this and save it via Ctrl+S. An...

Everything about Paginated Reports

We know all the multitudes of Power BI and how it evolved over the years but have you ever thought what if you do not require all those fancy visuals and features but you need a flat table fitting in a single page and can go on for more than 100s of pages. If you are looking for something like this then Paginated Reports is the answer for you. In today's blog we will pen down everything you need to know about Paginated Reports. To know more about the Paginated Reports do read this interesting article from Microsoft. Before we start you will require - Report Builder (external tool but a FREE one) and Power BI Pro license along with the contributor rights to publish the paginated reports to different workspaces. Also, we will be working with Sample Superstore dataset. We are aiming to create a paginated report which will look something like this. Let's get started then, make sure you have published your dashboard in a workspace. For this demo, we have published the dashboard to...