2731: "K-Means Clustering"

This is, technically, an xkcd forum.
Post Reply
User avatar
ratammer
Gatekeeper
Posts: 736
Joined: Tue Aug 24, 2021 8:22 pm
Location: London

2731: "K-Means Clustering"

Post by ratammer »

Image
Title text: According to my especially unsupervised K-means clustering algorithm, there are currently about 8 billion types of people in the world.

Yeah, this is just one of those where I really don't know enough about the maths involved.
User avatar
chridd
Scorekeeper-Keeper
Posts: 159
Joined: Tue Aug 24, 2021 8:02 pm
Location: west coast US
Contact:

Re: 2731: "K-Means Clustering"

Post by chridd »

k-means is an unsupervised machine learning algorithm. Kind of long explanation below (and some images from when I implemented it):
Spoiler (Show/Hide)
Suppose you're trying to get a computer to recognize the difference between cats and dogs. One way to do it is with supervised machine learning, which means that you tell the computer that these photos are cats, those photos are dogs, and it'll try to pick out what it is that the cat photos have in common that the dog photos don't. Another way is with unsupervised machine learning, which means that you just give the program a bunch of photos, but don't tell it which is which, and ask it to split the photos into two groups. Maybe it'll split them into cats vs. dogs, or maybe it'll split them into big vs. small, or light fur vs. dark, or you'll get two groups where it claims to have found some pattern but you don't really know what it is.

For k-means, in particular, you have to tell the computer how many different kinds of things it should look for; that's what the k is. So if k = 2, that means that it's looking for two kinds of things, k = 3 it's looking for three kinds of things, and so on. Setting k higher than the number of groups you're actually looking for can make sense, because maybe it'll have an easier time if, for instance, it can treat big dogs and small dogs as different kinds of thing.

I wrote an implementation of k-means for a class a while back. I gave it a bunch of handwritten digits and told it to look for 10 different digits (k = 10) and 30 different digits (k = 30), and here are the best results from each (this shows what it thinks each digit looks like, on average; bottom row is the final answer, everything above that is intermediate steps):
k = 10
k = 10
trial-2.png (5.34 KiB) Viewed 475 times
k = 30
k = 30
trial-9.png (13.13 KiB) Viewed 475 times

In this comic, the person told the computer "there are three kinds of people, find what they are", and the computer found that one of those kinds of people is people who tell the computer that there are three kinds of whatever they're looking for. She's not sure what's different about the other two groups, though; maybe the computer found some pattern that we aren't able to see, or maybe it just arbitrarily divided that group.
~ chri d. d. /tʃɹɪ.di.di/ · she · Forum game scores · My website
ratammer wrote: Thu Sep 09, 2021 4:56 pm Hoping to find something quotable on this new forum!
User avatar
ratammer
Gatekeeper
Posts: 736
Joined: Tue Aug 24, 2021 8:22 pm
Location: London

Re: 2731: "K-Means Clustering"

Post by ratammer »

Right, I see. Thanks!
User avatar
somitomi
Posts: 274
Joined: Tue Aug 24, 2021 8:04 pm
Contact:

Re: 2731: "K-Means Clustering"

Post by somitomi »

I wonder if you could make it recognise police dogs by using k=9
User avatar
heuristically_alone
Posts: 167
Joined: Tue Aug 24, 2021 10:33 pm

Re: 2731: "K-Means Clustering"

Post by heuristically_alone »

somitomi wrote: Tue Jan 31, 2023 1:29 pm I wonder if you could make it recognise police dogs by using k=9
This comment should receive the chuckle it deserves. :lol:

Also thanks Chridd for being so informative. I learned something interesting today.
If you think tough men are dangerous, wait til you see what weak men are capable of.
----
he/him/they
Post Reply