Skip to main content

Exploratory Data Analysis (EDA)

Have you heard of EDA? Let's, deep dive into Exploratory data analysis (EDA). It is the analysis used by a data scientist to spot the patterns, trends, and hypotheses to manipulate the data for getting your questions answered. It is a vital step to get the most out of any data and also it is critical because it provides a better picture of the relationship between different variables. EDA is mainly practiced to get a better understanding of your data before making any assumptions. It answers some basic questions related to data such as the confidence interval, standard deviation, and relationship that exist between different variables. EDA is a soul for any data analysis. According to John Tukey


Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there.



There are mainly 4 types of EDA-

  • Univariate non-graphical- It only deals with a single variable so it is obvious that there will be no relationship that can be identified using it. It is used to identify the patterns in data. Overall there is no graph in such analysis so to get a clear picture you need another analysis.
  • Univariate- graphical- It resolves the issue as it provides different graphics such as histograms and boxplots. 
  • Multivariate non-graphical- Unlike univariate non-graphical, it deals with multiple variables and therefore it helps in explaining the relationship that exists between different variables.
  • Multivariate graphical- It takes the help of different graphics to show the relationship between different variables. Graphics such as scatter plots, heat maps, and bubble charts.

Let's focus on the tools which help us to perform EDA. All data visualization tools can perform EDA. It can be performed in Tableau, R, QlikView, Python, and many others. In my opinion, I have performed EDA in Tableau and I have realized it is better to perform it before any machine learning model.  The main question is how to do EDA?

To start off we need to import the right dataset, identify the columns in the dataset, check the number of observations, and check whether the dataset contains any null value. After these steps, you need to focus on your categorical variables. Before doing this you need to clean your data in this step all the waste or unwanted data is removed. Predictive analysis can be performed on the clean dataset.

Mostly EDA is a basic step that is overlooked many times but it is necessary to figure out the best-suited model for your analysis. EDA plays a crucial role in determining the research hypothesis. It is an asset to any data scientist as it is the first step before performing any machine learning operations. Results obtained from EDA should be valid and can be applied to the business context.

Comments

Popular posts from this blog

Ultimate Beginners Guide to DAX Studio

There are zillions of external tools available with Power BI but DAX Studio is one of the most commonly used tools to work with DAX queries. It is a perfect tool to optimize the DAX and the data model. In this blog let's shed some light on the basic functionalities that can take your report to the next level. ARE YOU READY?  To start you will need the latest version of the DAX Studio. You can download it from their website . Don't worry you don't have to pay for the license. Fortunately, DAX Studio is a free tool As a BI Developer, I am using DAX Studio regularly. Based on my experience I use it for several purposes but in this blog, I will highlight the most common ones. Extracting a dump of all the measures used in your PBIX. Why do we need to do this? It can be used for documentation purposes also sometimes we try to reuse the DAX and such a dump comes in handy in this scenario. How to achieve it? Open the DAX Studio it is located under the external tools once you open t

Append v/s Merge in Power BI

Let's discuss another problem of the week. As a Power BI user, there are times when you want to combine queries. What are the ways to do so? In most cases, you can attain it by using either append or merge and both serve different purposes. Let's understand what do these terms mean in Power BI and how they are functionally different from each other.  It is quite common to get data from various sources and you need to combine those data depending on a particular column which is common in both tables so that you can add extra information or column to your big table. In such cases, we use merge queries. How to perform merge queries? For instance, I am considering Sample Superstore data and we will merge the returns table to the order table. You will find both merge and append in the home tab in extreme right in the power query editor. ProTip - You will find two options when you click on the drop-down in merge which are merge queries and merge queries as new. When you use merge que

Use Relationship in DAX

Data modeling is an essential part of creating perfect visuals. While creating complex data models there can be a case where you can find an inactive relationship represented by dotted lines and it occurs because you already have an active relationship between the two tables. But as a developer, you need to use both the relationship. How can it be done? You can use "Use Relationship" in such cases. Use relationship can be added to your DAX and act as a modifier or enhancer for calculation. It activates the inactive relation. But make sure you have an inactive relationship in place before using the use relationship function. Let's see how it works on Sample Superstore data. In my fact table I have two dates- Order date and Ship date. I am making the two relations between my date table and fact table. The relation between the sample superstore (date) to date table (date) is active while the relation between the sample superstore (ship date) to date table (date) is inactive