MongoDB is part of the NoSQL generation of databases. As described on the best use case for it is:

    If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.

For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

I agree with most of the things written above, well most but one thing: mongodb and map/reduce.

The use case above was described somewhere in Q1-Q2 this year(2013). Of course MongoDB had a lot of limitations:

  • slow map/reduce
  • newly introduce Javascript engine(should provide at least some parallelism in terms of jobs/operations on the same collection)
  • slight improve in performance

But this was just the first major release with the new Javascript engine. I’ve tested a Map/Reduce on a collection that was growing around 1 million documents a day. On version 2.2 which is described in the article it took about one hour to achieve this task with around 30 million existing records.

A simple migration to MongoDB 2.4.5 earlier this year drastically reduced the time to around 15 minutes top. But still this wasn’t enough. One of our requirements was to provide analytics at a fast pace, i would say almost “real time” (in terms of map/reduce real time is relative, in our case it can be anywhere from 5 to 15 minutes).

With the volume of data increasing at a fast rate I had to see if there is a better way of achieving this…and after 2 weeks i finally came up with a solution. After searching the World Wide Web like crazy i managed to work out a solution. I will talk more about it in 2 future blog posts which describe the two mechanisms that i had to build for this solution.

So stay tuned for more info this week 😉