Skip to main content

Exploratory Data Analysis (EDA)

Have you heard of EDA? Let's, deep dive into Exploratory data analysis (EDA). It is the analysis used by a data scientist to spot the patterns, trends, and hypotheses to manipulate the data for getting your questions answered. It is a vital step to get the most out of any data and also it is critical because it provides a better picture of the relationship between different variables. EDA is mainly practiced to get a better understanding of your data before making any assumptions. It answers some basic questions related to data such as the confidence interval, standard deviation, and relationship that exist between different variables. EDA is a soul for any data analysis. According to John Tukey


Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.



There are mainly 4 types of EDA-

  • Univariate non-graphical- It only deals with a single variable so it is obvious that there will be no relationship that can be identified using it. It is used to identify the patterns in data. Overall there is no graph in such analysis so to get a clear picture you need another analysis.
  • Univariate- graphical- It resolves the issue as it provides different graphics such as histograms and boxplots. 
  • Multivariate non-graphical- Unlike univariate non-graphical, it deals with multiple variables and therefore it helps in explaining the relationship that exists between different variables.
  • Multivariate graphical- It takes the help of different graphics to show the relationship between different variables. Graphics such as scatter plots, heat maps, and bubble charts.

Let's focus on the tools which help us to perform EDA. All data visualization tools can perform EDA. It can be performed in Tableau, R, QlikView, Python, and many others. In my opinion, I have performed EDA in Tableau and I have realized it is better to perform it before any machine learning model.  The main question is how to do EDA?

To start off we need to import the right dataset, identify the columns in the dataset, check the number of observations, and check whether the dataset contains any null value. After these steps, you need to focus on your categorical variables. Before doing this you need to clean your data in this step all the waste or unwanted data is removed. Predictive analysis can be performed on the clean dataset.

Mostly EDA is a basic step that is overlooked many times but it is necessary to figure out the best-suited model for your analysis. EDA plays a crucial role in determining the research hypothesis. It is an asset to any data scientist as it is the first step before performing any machine learning operations. Results obtained from EDA should be valid and can be applied to the business context.

Comments

Popular posts from this blog

Copying Bookmarks from one Power BI report to another

Let's think of a scenario, where you want to copy the bookmarks from one report to another. Most obvious approach is to just do a copy paste of the bookmarks. What's wrong with this approach? This approach only works for all visuals but not for bookmarks and field parameters.  If you are not familiar with basics of bookmarks and field parameters do refer to the beginners guide for bookmarks  and introduction to field parameters . Then how do you copy the bookmarks? Power BI enhanced report format (PBIR) for Power BI Project files (PBIP) will help you in achieving this. Let's check it out, I have 2 reports one contains the bookmark called Bookmarks PBIR Test (origin) and other one is Rolling Average PBIR Test (destination) .  Before we get started, you have to enable Power BI Project save option under preview features. Once enabled, restart Power BI desktop. There is a TMDL icon appearing on the left pane. What is TMDL and what's in it for me? There's a lot of possi...

Playing with Totals in Power BI

Are you a fan of matrix visual in Power BI? If you are as I am, I always struggle to get the correct totals and get something else instead of the totals it can be average. After a lot of research and going over different community posts, finally we have found 3 common scenarios that can elevate your matrix to the next level. To start with, I am using Sample Superstore data. Let's first explain the 3 different scenarios that we will tackle - We  would like  to show both Total Sales and Average Monthly Sales across different categories and different periods. We  would like  to show the Average Sales in the row subtotals and Total Sales in the column subtotals. Last and the  most interesting scenario is to show the Total sales excluding the furniture sales in the row subtotals and total sales in the column subtotals. Let's start by getting the correct totals in a matrix. Generally, if  use  basic Sum, Average... functions in your measures then most likely...

Introduction to Power Ops

Power BI, combined with all external tools, at least the basic ones, can be a perfect BI tool to meet all your needs. Today's blog will focus on another external tool. What?? Not another one!! In my regular practice, I always rely on basic external tools such as DAX Studio, Tabular Editor, and Measure Killer. But what if I say this new tool is a transformer, combining all the basic tools in one place? If you are looking for an introductory guide for the basic external tools, we got you covered.  Beginner's Guide to DAX Studio Advanced Guide to DAX Studio Guide to Measure Killer Yes, you read it right and we will give you an introductory guide to all the functionalities that can be achieved with  Power Ops . Okay, first things first to download Power Ops you can visit their  website  and I would recommend exploring it with one of your reports. You can download the free version. It will be available under the external tools section in Power BI Desktop. Before we move f...