K-nearest Neighbors

The k-nearest neighbors (KNN) algorithm uses the entire dataset as the training set, rather than splitting the dataset into a trainingset and testset.

When an outcome is required for a new data instance, the KNN algorithm goes through the entire dataset to find the k-nearest instances to the new instance, or the k number of instances most similar to the new record, and then outputs the mean of the outcomes (for a regression problem) or the mode (most frequent class) for a classification problem. The value of k is user-specified.

The similarity between instances is calculated using measures such as Euclidean distance and Hamming distance.