Clustering with the SOM/Kohonen Map
Clustering algorithms can be used to group together objects or conditions with similar characteristics. Unlike classification, the groupings associated with clustering are typically more abstract and not easily defined. Examples of where clustering has been used include identifying shopping patterns between visitors and grouping types of web page or e-mail content.
Clustering in NeuroSolutions
The NeuroSolutions Neural Expert uses the two-dimensional SOM (a.k.a. Kohonen map) for clustering. Clustering with the SOM requires some work, but is also much more powerful than many other clustering methods. The SOM has a few unique properties that make it very effective for clustering, including:
1) density matching: the number of SOM processing elements (PEs) placed in an area of input space is similar to the density of inputs in that area and
2) neighborhood relationships: the SOM processing elements have an intrinsic neighborhood relationship where inputs mapped to PEs that are close (e.g. PE (1,1) and PE (1,2)) are also close in input space (e.g. similar inputs).
The Basics of Clustering
The first important concept in SOM clustering is that a single PE does not normally define a cluster and that the clustering is not predefined by the number of PEs. Typically, a SOM is created of size N x N (where N is dependent upon the number of data points and the "resolution" of your desired mapping) and each logical cluster of input data is located in REGIONS (subsets) of PEs in that NxN map. For instance, a cluster can be in the top left region of the map (say PEs (1,1) (1,2) (1,3) (2,1) and (2,2)). Since clustering is unsupervised, there is no predefined number of clusters in the dataset and the clustering is left to the interpretation of the user. There are situations where in one application it may be beneficial to split the top left region of the map into three smaller clusters where as other times it may be beneficial to consider it a single cluster.
For information on the visual tools available in NeuroSolutions for splitting these clusters, please see part 2 of this article at
http://www.nd.com/newsletter/somarticle.html
Using the Clustering Information
Once you have completed your analysis of the map, new inputs can be mapped by simply running the new data through the map in a "testing" or "production" set and finding the winning processing element for each input. This winning processing element's location will determine which natural cluster it belongs in and the user can then use this information in the manner of his/her choosing. For example, if a new website visitor is clustered with visitors who have purchased certain products, then the new visitor could be shown advertisements for those types of products.