SitePoint
  • Blog
  • Forum
  • Library
  • Login
Join Premium
  1. Details
  2. Creator
  3. Content
  4. Reviews
  5. FAQ
book cover

Cluster Analysis in Python

Understand your data with cluster analysis

Created by

Shaumik Daityari

Published by

SitePoint

Last Updated

8 November 2022

Details

If you’ve browsed Google News, you’ll have noticed that articles on similar topics are grouped together. Google scans for new articles, and then analyzes the text to decide which articles talk about the same topic. This is an example of cluster analysis, which we'll learn about in this tutorial.

Description

If you’ve browsed through Google News, you’ll most likely have noticed that news articles on similar topics are grouped together. Google scans the Internet for new articles, and then analyzes the text within each of them to decide which articles talk about the same topic! It’s essentially a result of an unsupervised machine learning algorithm. Cluster analysis is an example of an unsupervised learning algorithm. Another example of clustering is trending topics on Twitter. Twitter goes through new tweets in real time to determine words, hashtags, and a combination of them, to determine what’s causing a buzz.

Unsupervised learning involves the use of machine learning techniques to find patterns and make inferences on unlabelled data. Labelled data has some kind of a tag associated with it—such as a name or an attribute. This label can be categorical or numerical data. Unlabelled data contains no tags. Going back to our example on Google News, news articles are examples of unlabelled data. A cluster of news articles has similar words and word associations repeating in it.

Common unsupervised learning algorithms are cluster analysis, anomaly detection, and neural networks. Cluster analysis involves analyzing unlabelled data and grouping similar data points together. The data points are grouped on the basis of certain characteristics. These groups are called clusters. A common example of clustering is customer segmentation. Cluster analysis can group similar customers on the basis of their demographics and spending patterns, and help businesses cater their offerings differently to different clusters.

In this tutorial, we’ll look at the following:

  • what cluster analysis is
  • when cluster analysis should be used
  • how data can be prepared for cluster analysis
  • two techniques for performing cluster analysis
  • how to interpret the results of cluster analysis

Creator

Shaumik Daityari avatar

Shaumik Daityari

Shaumik is an optimist, but one who carries an umbrella. He is currently pursuing his MBA at IIM Lucknow, after completing his M.Tech at IIT Roorkee. A co-founder of The Blog Bowl, he loves writing.
Shaumik Daityari avatar

Content

1
Preview

Reviews

Log in to write a review!

Frequently Asked Questions

book cover

Cluster Analysis in Python

  • Unlimited access to this title and 600+ others in our library

  • New titles added frequently

  • Cancel anytime

Stuff we do

  • Premium
  • Newsletters
  • Forums

About

  • Our Story
  • Terms of use
  • Privacy Policy
  • Corporate Memberships

Contact

  • Contact us
  • FAQ
  • Publish your book with us
  • Write an article for us
  • Advertise

Connect

© 2000 – 2024 SitePoint Pty. Ltd.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.