Unsupervised Learning

This is still imperfect—assuming that the human classification counts as "perfect", but it's clearly better than the attempts with K-means. And because this is semi-supervised learning here, in which you have some of the original scores, you can use some of sklearn's metrics to find how good (or bad) the model is:

from sklearn import metrics
labels_true = iris.target
labels_pred = model.predict(X)

Let's find out how well it did:

metrics.homogeneity_score(labels_true, labels_pred)

metrics.completeness_score(labels_true, labels_pred)

Hey, pretty good! Not perfect (that is, 1.0), but not bad at all. And if you compare this against the K-means model:

labels_pred = k.labels_
metrics.homogeneity_score(labels_true, labels_pred)

metrics.completeness_score(labels_true, labels_pred)

In other words, my intuition was right. The GaussianMixture model was better at clustering the flowers than the K-means model.


In many ways, unsupervised learning is the true magic and potential in the machine-learning world. By using computers to identify patterns and groups in your data, more quickly and accurately than you could do yourself, you can start to identify and predict all sorts of things. As with supervised learning though, unsupervised learning requires that you try a variety of models, compare them against one another and understand that each model has its own advantages, disadvantages and biases.

The world of data science in general, and machine learning in particular, continues to grow at an extremely rapid rate, with new ideas, techniques and tutorials available all of the time. The Resources section here describes several places where you can learn more and start your journey in this set of concepts and technologies.


I used Python and the many parts of the SciPy stack (NumPy, SciPy, Pandas, Matplotlib and scikit-learn) in this article. All are available from PyPI or from SciPy.org

I recommend a number of resources for people interested in data science and machine learning.

One long-standing weekly email list is "KDNuggets". You also should consider the "Data Science Weekly" newsletter and "This Week in Data", describing the latest data sets available to the public.

I am a big fan of podcasts, and I particularly love "Partially Derivative". Other good ones are "Data Stories" and "Linear Digressions". I listen to all three on a regular basis and learn from them all.

If you're looking to get into data science and machine learning, I recommend Kevin Markham's Data School and Jason Brownlie's Machine Learning Mastery" where he sells a number of short and dense, but high-quality ebooks on these subjects.