How Companies Use Exploratory Data Analysis On Dataset.

How Companies Use Exploratory Data Analysis On Dataset.

Jan 19, 2022, 6:36:40 AM Business



Supermarket Shopping has become a routine nowadays. The wide range of the items available, low pricing, and ease of shopping will result in a large flow of customers and so there will be a significant increase in the sales income.

The sales and location dataset of a supermarkets is analyzed in this blog, which spans the years 2017 through 2020. The aim is to determine the company’s strengths and weaknesses, maximize profits, loss reduction, and make future recommendations.

This is a comprehensive capstone project that necessitates a working grasp of Python as well as basic statistical principles. This project’s environment comprises of:

IDE- JupyterLab

File Type- XLS

Exploratory Data Analysis

Exploratory Data Analysis

Let’s take a closer look at the information in data frame.

Let’s take a closer look at the information in data frame

From the above dataset we can analyze that:

21 Columns

9994 rows/ entries

All are Null- values excluding Postal codes

Single entry for ‘Country’

Filtering Data

Deleting Unnecessary Columns

Postal code

• Country

• Row ID

• Order ID

Adding New Columns

• Year

• Month

• Profit Percentage

Analyzing Data

Visualizing the patterns of sales in the last few months and years

Analyzing Data

Check The Visualizations As Per The Profits

Check The Visualizations As Per The Profits

The organization is most productive, as we can see from the two charts. Second, we see that average earnings have no direct connection with sales volume.

Category Comparison

Comparing the sales and profit by category

Comparing the sales and profit by category

It is observed that a huge number of orders does not equal several sales. ‘Technology’ has the smallest number of units sold but the highest number of sales.

number of sales

Even though the Furniture category has the highest sales volume, it is the least productive of the three.

Comparison By Subcategory- ‘Furniture’

Comparison By Subcategory

It is clear that ‘Tables,’ although having a large volume of sales, are losing a lot of money, which is why the profits are so low.

Comparison By Subcategory- ‘Office Supplies’

Comparison By Subcategory- ‘Office Supplies’

The same pattern can be seen previously. Even though the fact that sales are rising, supplies are losing money.

Comparison By Subcategory- ‘Technology’

Comparison By Subcategory- ‘Technology’

Copiers have proven to be the most successful subcategory so far. Above and beyond the competition. Machines have increased the sales, but too less profit.

By Product

We may utilize the data to determine which products are the most and least expensive.

By Product

The data may be used to determine which goods have the highest and lowest profitability.

highest and lowest profitability

Regional Comparison

Regional Comparison

The average sales in all four areas are nearly the same, but the profit margins differ. The profits in the East, South, and West are comparable and strong, however, the profits in the Central Region are very low.

Central Region

We can observe that sales are decreasing in the Central, East, and West, but increasing in the South. Despite this, the South’s profit average is falling along with Central’s.

South’s profit average

South’s profit average

Above shown are the best and worst-performing states.

Developing A New Data-Frame

On noting the loss of money, we can eliminate the entries, to compare the subsequent profit to the original.

• Deleting the 30 least Profitable products

• Deleting the 30 products with low profit-margins

• Deleting the entries of ‘Tables’ as it shows the sub-category with maximum incurred losses.

Depending on the findings, we can see by eliminating the sales of a few Products and Tables, earnings improved by nearly $50,000 over the course of four years.

To demonstrate the difference, we may plot the findings by year.

earnings improved

It is clear that if there is no selling of the products with the lowest earnings and profit margins, the company would have saved $50,000 over four years and generated an additional yearly income of more than $10,000.


Few products are the result of the loss to the company. The sale of the thirty least successful products must be discontinued immediately to decrease the loss.

Similarly, sales of the thirty goods with the worst profit margins should be discontinued.

Selling of the table should be banned because they are the category with the most losses.

The corporation should focus its efforts on the most profitable products by offering incentives, discounts, and other promotions.

Reduce sales in states with large losses and fix the underlying problem.

Final Words

As shown above, following the proposal based on current datasets might save up to $50,000 in profit over the period of four years. Earnings are increasing every year; hence we can confidently predict that if the firm follows the advice, it will save more than $50,000 over the next four years.

Published by Locationscloud

Reply heres...

Login / Sign up for adding comments.