Intro
This article will provide an introduction to the mathematics of similarity classifiers, specifically Jaccard, Extended Jaccard, and Cosine. We will attempt to introduce the topics in layman's terms and provide examples that readers can play with and watch the results. While this intro series of articles will go into theory and formulas full of Greek letters, they will first build upon simple explanations and real world examples. Hopefully in reading this article you can gain an understanding of both the application and the math involved.
The Example
For the first set of classifiers I'm stealing an example from the great book "Programming Collective Intelligence" by Toby Segaran. But rather than focus on the code, this article will use the example to simplify and explain the math.
The premise is that a store like Amazon.com wants to recommend items a customer might like. In order to make a decision on what to recommend we can take two different approaches. We could find similar movie buyers, and recommend movies that those similar buyers like. This is akin to getting movie recommendations from a friend who has similar tastes as you. Alternatively, we could try to classify how similar movies are to each other, and recommend similar movies to the movies that you already like. For example, one could argue Batman is similar to Superman because both are superhero movies, so we could recommend Batman to buyers who purchase Superman.
For brevity this article is going to stick with only talking about the similarity of movie viewers rather than item similarity, but the algorithms are the same. Click the link below to play with a simple app that calculates the similarity between movie viewers using the mathematics we will discuss in detail.