About K-Means Shortest Path S-I-R Model Contact

Curious about what the web app above is doing?

K-means is an unsupervised machine learning method for clustering that separates observations, or datapoints, in k clusters, where each observation will belong to the nearest mean, or centroid. In the web app above, you can edit the value for k to examine the impact on the resulting clusters.

This web app creates random data by drawing random numbers to find a point on the canvas, and then about 100 more random points that are randomly distributed around that point on both the X and Y axis. It repeats this process about ten times and then displays it. Interestingly, JavaScript, the language the web app is written in, does not have a built in function for random normally distributed values, also called gaussian random numbers. To get around this, the web app makes use of a method called the Box-Muller Transformation, which can convert uniformly distributed values to normally distributed values.

The k-Means algorithm implemented here uses the Forgy method of initializing. That means it selects k observations, or data points, and uses their positions as initial centroids, or cluster centers. The algorithm itself can be described plainly as:

  Initialize k Centroids
  Initialize k "Prior Centroids" that are different values
  While the Centroids do not equal the Prior Centroids:
    Set Prior Centroids equal to the current Centroids
    Assign each data point, or observation to a Centroid
    Set each Centroid equal to the mean of the observations assigned to it

The algorithm will stop running when the Centroids become stable. Because this may occur in different places depending on the starting location it is common to run the algorithm several times and use the centroids that emerge as the most frequent cluster centers.

You may have noticed that the algorithm described differs slightly from the web app. To make it easier to follow each step, the web app separates the new centroid calculation and the reassignment of observations to centroids. The web app also has a 10-iteration button as k-means can at times take many iterations to complete.

Additional Resources: