Maximum Bipartite Matching

Note: This post is one of a series that I'm rewriting since the great blog wipe of 2014.

Recently, I've begun work on a new app called Schedulize. Essentially, the idea is to automatically assign hourly employees to shifts based on their availability. It's going to use mongodb, jade, express, and, most notably, a variation on the Ford-Fulkerson algorithm.

To illustrate my plan for the program, I've devised a relatively simple example. Rather than going through the whole algorithm here, I'll follow this up with a more extensive blog post detailing the solution down the road.

In this example, we are planning a talent show for a group of students. Each student can only perform a specific set of talents, and no two students can perform the same talent. Our goal is to find the optimal pairing of students to talents in order to have the most acts in our talent show.

talents

As you can see, we need to find a way to assign each person to a talent. To facilitate this process, we'll need to add two nodes to our graph, one called a 'source' and another called a 'sink'. In the context of maximum flow, we will pass units of 'water' from the source to the sink, finding the best path through the entire graph. To clarify, see the following picture.

source to sink

In addition to adding the source and the sink, we also add a capacity to each edge(the number to the right of the '/'). This capacity refers to how much flow we can pass through each edge. In our case, each edge has a capacity of one because each person can only be assigned to one talent, and no talent can be performed twice.

With this setup, we can apply the Ford-Fulkerson algorithm to find a maximum path through the graph, thus assigning people to talents. Our final output will look like this, and have a maximum flow of 5:

output

As you can see, Rishi will dance, Peter will do math, Moxi will snowboard, Jose will paint, and Adam will play the piano. The beauty of Ford-Fulkerson is that it will make the maxiumum number of assignments each time.

Circling back to my scheduling app, my graph will look something like this:

schedulize

The Schedulize graph is naturally much more complex than the talent show example, but it follows the same principles. One notable difference, however, is in the capacity of each edge from the sink to a person.

In the case of Schedulize, each edge capacity refers to the number of shifts an employee would ideally work. This will allow the algorithm to then assign them to multiple shifts by passing one unit of flow along multiple paths from users to shifts. Each shift can only be worked by one person, so the capacity of the edges from the shifts to the sink is one.

While the Schedulize example is more complex, it follows the same principles, and demonstrates the application of Ford-Fulkerson to Bipartite Matching.

I plan to write a more comprehensive post here on the actual algorithm I use, so check back soon for a more comprehensive deep dive into Bipartite Matching!