Feb 23, 2013

Intro to Mathematics in Similarity Classifiers Part 2

Intro

     This article will provide an introduction to the mathematics of similarity classifiers, specifically Jaccard, Extended Jaccard, and Cosine.  We will attempt to introduce the topics in layman's terms and provide examples that readers can play with and watch the results.  While this intro series of articles will go into theory and formulas full of Greek letters, they will first build upon simple explanations and real world examples.  Hopefully in reading this article you can gain an understanding of both the application and the math involved.

 

The Example

     For the first set of classifiers I'm stealing an example from the great book "Programming Collective Intelligence" by Toby Segaran.  But rather than focus on the code, this article will use the example to simplify and explain the math.

     The premise is that a store like Amazon.com wants to recommend items a customer might like.  In order to make a decision on what to recommend we can take two different approaches.  We could find similar movie buyers, and recommend movies that those similar buyers like.  This is akin to getting movie recommendations from a friend who has similar tastes as you.  Alternatively, we could try to classify how similar movies are to each other, and recommend similar movies to the movies that you already like.  For example, one could argue Batman is similar to Superman because both are superhero movies, so we could recommend Batman to buyers who purchase Superman.

     For brevity this article is going to stick with only talking about the similarity of movie viewers rather than item similarity, but the algorithms are the same.  Click the link below to play with a simple app that calculates the similarity between movie viewers using the mathematics we will discuss in detail.

button_0

Jan 21, 2013

Introduction to the Math Behind Similarity Classifiers – Part 1

Abstract
     This article provides an introduction to the world of similarity classifiers in a way that is simple and fun.  We will attempt to introduce the topics in laymen's terms and provide examples that readers can play with and watch the results.  While this intro series of articles will go into theory and formulas full of Greek letters, they will first build upon simple explanations and real world examples.  Hopefully in reading this article you can gain an understanding of both the application and the math involved.

 

Introduction to Similarity Classifiers

     Similarity-based classifiers estimate the similarity between *things*.  Once we know things are similar, we can begin to group them and make decisions based on that grouping.  Commercial businesses can make more money knowing that people who buy one type of item, are very often interested in "similar" items.  Doctors who can successfully treat a specific pathology with specific medications can often successfully treat "similar" patients with those same medications.  And the list goes on and on.
  • Pandora: "Is this song similar to 'Linkin Park' such that we ought to group it with the ‘Link Park’ radio station?"  
  • Google News:  "Is this unknown web article similar to these other top news articles such that we ought to group them together in our news aggregator?"
  • Amazon:  "Is the movie 'Speed 2' similar to the movie 'Cinderella' and we should recommend it as a purchase to 'Speed 2' buyers?"
  • Image Organizer: "Is this image similar to these other images in a collection, and thus we think this new image is of the same person?" 

Jul 11, 2012

ArrayCollection ‘source’ attribute vs ListCollectionView ‘list’ attribute


     I recently stumbled upon an interesting problem that highlighted the difference between ArrayCollection ‘source’ attribute and the ListCollection ‘list’ attribute. I was working through some code that was not paying close attention to how it re-wrapped collection data. In some cases a collection was created to wrap the ‘source’ of another collection as in Figure 1. In other cases a new collection was created to wrap the ‘list’ of the original collection as in Figure 2. The code was wrapping the data in new ArrayCollections in order to apply different sorting and filtering, which is an extremely common use for collection views, but was stumbling upon some interesting runtime results… Not all of the collection views were updating with the collection changes.

var ac1:ArrayCollection = new ArrayCollection();
var ac2:ArrayCollection = new ArrayCollection(ac1.source);
Figure 1 – Clone ArrayCollection using the ‘source’ attribute

var ac1:ArrayCollection = new ArrayCollection();
var ac2:ArrayCollection = new ArrayCollection();
ac2.list = ac1.list;
Figure 2 – Clone ArrayCollection using the ‘list’ attribute

How ‘source’ and ‘list’ relate and why it matters

     Jumping into the Flex SDK source code the setter method ArrayCollection.source creates a new ArrayList around the original source to create the ‘list’ attribute of the new collection view. What this means is that depending on how your new collection view wraps the original data, you may end up with the same source, but different lists (See figure 3 below).

Blog - ArrayCollection  list vs source

Figure 3 – Possible relationships in related ArrayCollections

Jul 10, 2012

Why doesn't tweetmeme work anymore?

??? Switching to Twitter's own re-tweet button because TweetMeme doesn't work with blogspot anymore.

Feb 1, 2012

Unit Tests as a Measure of Code Cleanliness

 

    I recently attended the 2011 Adobe MAX conference and sat in on the Unit Testing Adobe Flex session by Michael Labriola of Digital Primates.  I expected the session to be a repeat of what I already know and strongly believe.  Unit tests simplify testing, document your code, improve maintainability, blah blah blah.  The aspect of the lecture that I found most fascinating was on writing clean code and using unit tests as a means to verify code cleanliness. 

    My training regarding unit tests has been pretty informal.  Through experience I’ve found that when I write automated tests my bug count goes way down, and when I do have a bug I have a great starting point to isolate and fix issues.  I’ve always understood the difference between unit, integration, and functional testing but never much cared because all 3 have the benefits mentioned above.  The simple discovery, however, that being able to unit test your code in the strictest sense is a very good measurement as to the cleanliness of your code was pretty eye-opening.  If your code base is only testable at the integration level, your code is probably too tightly coupled and does not have good encapsulation.

Sep 13, 2011

A Simple and Effective Agile Process

 

Introduction

    Whenever I hear other software leads complain about their waterfall-ish development process, I always ask why haven’t they tried something more agile.  The most common answer I’ve heard back is that they don’t know where to begin.  Entire books are written on the subject of agile development, and it’s a pretty big sell to project management that you want drop everything and try something that seemingly requires chapters to describe.  That type of change tends to make project management cringe.

    The truth is books are written about agile development because people want to make money selling books about agile development.  Agile development is phenomenally simple and you make modifications as you go until it’s perfect for your team.  I work in defense where you aren’t supposed to write a line of code until you’ve eliminated the woodland homes of dozens of species through deforestation for all the paperwork you are supposed to generate.  Over the course of several releases my teams now follow a very effective agile process that is tailored to fit our “defense” environment.  This article describes what we do, and how we migrated to it.

Sep 5, 2011

The Tide Client Framework Part II: Tide Subcontexts

 

Introduction

    In the last tutorial I covered contexts in the Tide Client Framework (TCF) provided with GraniteDS.  In this tutorial I want to expand on the Tide Client Framework and introduce the little known feature “Subcontexts”.  Subcontexts escalate the TCF to being a full blown micro-architecture with capabilities comparable to Parsely or Swiz, with a fraction of the setup.  While the last tutorial covered all the great things about the Tide Context, this tutorial will begin by discussing some of it’s shortfalls, and how subcontexts fill out those missing pieces.

 

 

Shortfalls of Tide with a Single Context

    Data Injection and Tide Events are fantastic capabilities that can greatly simplify your application design and code, however, as your application becomes more complex you will start to hit a wall due to the single grouping of components.

Injection:  Too much shared data requires too much application knowledge

    Using data injection with a single context starts to fall over for a number of reasons.  As your application continues to grow and components inject more and more data into the context, you run into all the problems that make global data a bad practice.

    For example, suppose in my 100KLOC stock portfolio application I want to write a control that will perform analysis based on the user’s total net worth that has been added to the context.  Which shared object should I inject into my control?  “net”, “assetsValue”, “worth”, “totalVal”, “netV”?  Without delving into all the components to see what each shared object is and how it is changed, it is not possible to know which one to use.  You would also be opening up your project to lots of hard to debug side effects caused by developers accidentally misusing the shared global data.  Those types of problems will surely bring back bad memories if you are old enough to have done much development in C.

image

Figure 1 - Example of lots of tide components that register and share lots of data

Aug 7, 2011

GraniteDS Tutorial: Intro to The Tide Client Framework

 

Introduction to Tide Client Framework

    The Tide Client Framework is a very simple and lightweight framework that can greatly simplify developing your Flex/GraniteDS application. Tide allows you to declaratively or programmatically auto-wire your software components (UI widgets, controllers, etc) together, making data sharing and event routing a much simpler task.

    While Tide has many features and capabilities, including tight integration with server side capabilities, this tutorial will briefly introduce the Tide Context, and how it enables data injection and simplified event routing.  

 

The Tide Context

    The Tide Context is the manager for everything that is to be handled and auto-wired by the framework, and there is only one per Application (or one per Seam conversation). Once a component is registered with the Tide Context singleton, that component is registered to work with injection and Tide events.  What that means is Tide will automatically inject shared data and correctly route events without requiring the application developer to write their own mechanism for sharing information or explicitly registering event listeners.

Tide Context - Intro

Figure 1 – Example of several components and views registered with the Tide Context

Jun 13, 2011

Getting Started with Rational Team Concert: A Quick and Simple Tutorial to get you up and running with RTC

 

    Rational Team Concert (RTC) is a very good product lifecycle management tool, especially if your team is following an agile development process (NOTE:  Agile does not mean without any process at all.  When done correctly, agile processes actually have more processes and quality gates than traditional waterfall.  But I’ll go on that tangent some other time).  This tutorial aims to be a crash-course guide to get people up and running using RTC, and since RTC is free for teams less than 10 people, it is worth downloading and trying if for no other reason than to play with agile planning. 

 

Definitions

    Before I delve further, I want to define some words that will show up throughout this tutorial.

Backlog A collection of unfinished work.
Iteration A timeline that is a subdivision of the project timeline.  E.g. A 6 month schedule could be broken down into 6 1-month iterations.
Plan A view of tasks assigned to a timeline.  Could be filtered further to show only the tasks assigned to a particular team or category.
Product Backlog A collection of all the known tasks to complete the product. 
Sprint See iteration.
Sprint Backlog Plan A view of all the tasks assigned to a particular iteration or sprint.
Team A grouping of people assigned to work related tasks.
Timeline A period of time in which work will be planned.
Timeline (project) The period of time that runs from the time the project starts until the project ends.
Timeline (current) The current timeline or iteration that is currently being executed.
Work Item A unit of work that is a sub-task of the project.

 

Jun 2, 2011

Drawing on a Google Map in Adobe Flex

    Not too long ago I had cause to add drawing capabilities to a Google map in my Adobe Flex project.  Seems like this would be a pretty common use of the Google map, but I never saw any simple tutorials that covered it.  So in an effort share the knowledge I put together this simple tutorial and example explaining how to allow users to draw on your Google map.

The Example

  • Click the “Rectangle” button to be able to draw a rectangle via click-dragging with the mouse.
  • Click the “Normal” button to return the Google map to normal mode, e.g. click-drag causes the map to pan.

    Source can be downloaded at Source