Join 1,000s of professionals who are building real-world skills to truly harness the power of AI technologies like Microsoft's Copilot in Excel.

Issue #26 - Market Basket Analysis Part 1: Introduction

This Week’s Tutorial

This week's issue is the first in a new tutorial series on one of my favorite analytics techniques you might not have heard about - market basket analysis.

When you hear “market basket analysis,” you probably think of a grocery store tracking which items are bought together, like peanut butter and jelly.

And you wouldn’t be wrong. That’s a classic example.

But here’s the surprising part:

Market basket analysis is valuable to any professional in any domain.

It’s a technique that helps you uncover relationships between items, behaviors, or events, regardless of your industry.

I've successfully used market basket analysis (MBA) in my work supporting marketing, customer service, product management, and finance.

Here’s the idea.

Imagine you’re looking at thousands of transactions and you notice this:

  • People who buy peanut butter often (but not always) buy jelly.

  • But people who buy jelly don't necessarily buy peanut butter.

That “if X, then Y” insight is called an association rule.

Intuitively, you can think of association rules taking this general form:

  • If [X] occurs, then [Y] tends to occur too.

In the retail industry, it might be:

  • If someone buys a printer, they often buy ink.

In healthcare:

  • If a patient has condition A, they’re more likely to develop condition B.

In product management:

  • If a user doesn't engage with Feature X, they’re more likely to churn within 30 days.

These rules help you spot useful patterns, without needing to build complex machine learning models.

But that's not all.

Market basket analysis can mine powerful multi-factor associations from your data. Here's an example from the world of finance:

  • When a transaction is a customer's first, uses a debit card, and is for more than $1,000 it is five times more likely to result in a chargeback.

The above example shows how the interaction of three factors (e.g., first-time transaction) are highly associated with a particular outcome (i.e., a chargeback).

While this hypothetical example uses three factors, you can actually decide how many factors you would like to use (e.g., five factors).

BTW - In case you're unfamiliar, a chargeback is when a customer disputes a transaction and asks their bank for a refund.

Usually, merchants lose these disputes and then are also charged a penalty. This makes chargebacks expensive and something to minimize.

How do we know if a pattern is worth paying attention to?

This is the best part of MBA in my experience. The following metrics are very easy for your stakeholders to understand:

  • Support - How often does this combination happen in your data?

  • Confidence - When X happens, how often does Y also happen?

  • Lift - How much more likely is Y to happen given X, compared to Y happening due to random chance?

Remember - X in the above metrics can be multiple factors happening at the same time.

The first step in applying MBA is to structure your dataset in the right way. I'm going to start with the classic grocery store example because it's very intuitive.

When using MBA with grocery data, each row of the dataset is a transaction (i.e., a single customer's basket of goods). For example:

  • Hamburger, Cheddar Cheese, Mustard, Beer

  • Carrots, Onions, Toilet Tissue, Beer, Ketchup, Bread, Shampoo, Orange Juice

  • Toothpaste, Soy Milk, Butter, Bread, Rice

Notice in the above examples that each transaction can consist of many goods. However, each good is only listed once (e.g., Bread is listed once even if three loaves were purchased).

More technically, MBA works with binary (or True/False) indicators of some behavior (e.g., a customer's basket contained Bread).

Behind the scenes, the MBA algorithm transforms the set of transactions into a table where each row is still a transaction, but the columns become the unique set of all products purchased across all transactions.

Assuming the only transactions were the three above, here would be the columns:

  • Hamburger

  • Cheddar Cheese

  • Mustard

  • Beer

  • Carrots

  • Onions

  • Toilet Tissue

  • Ketchup

  • Bread

  • Shampoo

  • Orange Juice

  • Toothpaste

  • Soy Milk

  • Butter

  • Rice

The contents of each cell in the table would be a 1 (True) or 0 (False) to indicate the presence of a good.

While this representation of the data might seem exceedingly simple, don't underestimate the power of MBA.

Consider the following column examples (i.e., features) from the domain of product management:

  • Purchased With Promo 1

  • Used Feature X

  • Last Subscription Charge Failed

  • Had Session Within Last 30 Days

  • Used Mobile Browser Within Last 30 Days

Hopefully you get the idea here. The possibilities for engineering features is endless and MBA can be used in any domain.

I'm going to be completely honest.

While I can craft cluster analyses and machine learning predictive models, some of my biggest successes with business stakeholders came from applying MBA and communicating the insights.

If you want to stand out at work, you can't go wrong learning MBA!

This Week’s Book

I've recommended this book previously, and here's another reason why it's one of my favorites - it has great coverage of market basket analysis (i.e., association rule mining):

While this is a university textbook, it's accessible to a broad audience. Yes, it has a bit of math, but that shouldn't scare you off if you're serious about learning DIY data science.

Oh, and it covers so many other useful techniques (e.g., cluster analysis) as well!

That's it for this week.

Next week's newsletter will continue the tutorial series on market basket analysis with a focus on MBA metrics that are easily understood by your business stakeholders.

Stay healthy and happy data sleuthing!

Dave Langer

Whenever you're ready, here are 3 ways I can help you:

1 - The future of Microsoft Excel is AI-powered with Copilot. What Microsoft won’t say is that the AI future of Excel is dependent on skills with Python. My Python in Excel Accelerator online course will teach you the fundamentals you need to understand the Python code Copilot generates.

2 - Are you new to data analysis? My Visual Analysis with Python online course will teach you the fundamentals you need - fast. No complex math required, and Copilot in Excel AI prompts are included!

3 - Cluster Analysis with Python: Most of the world's data is unlabeled and can't be used for predictive models. This is where my self-paced online course teaches you how to extract insights from your unlabeled data. Copilot in Excel AI prompts are included!