Index ¦ Archives ¦ Atom

My attempts at MapReduce using MongoDB

I was sorting a tree in my (python) webapp instead of having the database to do it for me. This is how I moved it back to the database by using a MapReduce job. I had a collection structured like the following:

{
  "_id" : ...,
  "feed_oid" : "4fd268d2ab87b2d8927d7eee",
  "title" : "blah blah",
  "updated" : 1339702524,
  "watchers" : [ "4fd276fc66224c1ee8000006" ]
}

Such a collection is called articles and in there I get a document for every article published by an rss feed. feed_oid is an identifier I assign to every rss feed that I'm crawling and watchers contains a list of identifiers assigned to the people voting on such an article.

I wanted to find out the number of times a particular watcher appeared in the full list of articles and than, group that by the rss feed, so that I could end up with the number of times a person voted on articles published by the same feed.

The following are my map and reduce functions:

m = function () {
  emit(this.feed_oid, 1);
}
r = function (oids, vals) {
  var sum = 0;
  for (var v in vals) {
    sum += vals[v];
  }
  return sum;
}

The map function is called for every object matching the query filter, so it gets access to 'this'. The reduce function receives an array of values (all set to 1, by my map function) for every feed_oid emitted.

Here is how I spawn the MapReduce job (querying by the watcher id):

res = db.articles.mapReduce(m, r, {
  query: {watchers: "4fd276fc66224c1ee8000006"},
  out: "mapreduceout"
});

The results:

db.mapreduceout.find()
{ "_id" : "4fd268d2ab87b2d8927d7eea", "value" : 1 }
{ "_id" : "4fd268d2ab87b2d8927d7eee", "value" : 4 }

Looks like this guy voted 4 times on articles appeared on the feed 4fd268d2ab87b2d8927d7eee and 1 4fd268d2ab87b2d8927d7eea :P

© Giulio Fidente. Built using Pelican. Theme by Giulio Fidente on github. Member of the Internet Defense League.