Screen Shot 2012-10-16 at 11.00.07 AM

Experiment: Optimize Node.js + MongoDB in Every Way Possible

This post is technical in nature, and aimed at programmers. Casual readers might still get something out of this, but I’m not making any attempts to remove jargon in here.

Node.js and MongoDB are great; they’re super-scalable and a joy to work with. However, there were plenty of growth pains as I implemented the Streamified API with these technologies. It took me a LOT of experimentation, and in the process I ended up exploring more optimizations and speed-ups than I ever thought I would have to consider.

Still, I have a StackOverflow question open representing the last of my issues. I wonder if it might actually be a bug in node 0.8.x. The purpose of this post, then, is to share what I’ve learned — and ask for help fixing the imperfections.

 

1. Check your Sockets

This is a rather arbitrary point to start, but it was one of the first bottlenecks I ran into, so it seems as good as any. If your node.js app is establishing outbound connections to other services (eg, Streamified connects to Facebook, Twitter, etc), you’re going to need to make sure that your servers are configured to handle it.

There is no “one-size-fits-all” solution here. You’ll need to consider the needs of your server. Here are a few good starting points:

  • Use the request module, where possible. It’s more robust than coding http.clients
  • Consider this post from LinkedIn, which suggests turning off socket pooling (note: I would take this with a grain of salt; in fact, the author of the request module vehemently implied that doing so was a horrible idea when I contacted him).
  • Consider keep-alive vs. close connections. What is best for your scenario?
  • Consider increasing the number of sockets used by node.js, eg:
    require(‘http’).globalAgent.maxSockets = 9999; 

 

2. Monitor your Performance

Our servers are on the Amazon Cloud and we use RightScale, so I have some built in monitoring tools:

Our Mongo Server Health (RightScale Dashboard)

 

Free EBook Download
Enter your email address to download "The 10 Steps to Nonstop Accomplishment."
Free Ebook Download
Enter your email to get the free "Nonstop Accomplishment" Ebook. It's loaded with all the best mind and body hacks, including how to be more productive and happier.

But I needed something more node & mongo specific, so I ended up going with nodetime. If you’re not using nodetime, start using it now, at least in your testing environment. I won’t go into all the  features here (readable heap snapshots… mmm), but I will say that their support is excellent. Not only did they help me with some installation issues, they also assisted in reading the heap snapshots, and even jumped onto a StackOverflow question and helped me to debug my code. Woah.

 

3. NoSQL Doesn’t Mean No Worries

Like many programmers, when I first started coding with NoSQL (MongoDB) I was ecstatic. True to claims, it handled so much of the headaches I had known in using technologies like MySQL. Gone were the days that I needed to be designing sophisticated schemas! Hooray!

Not so fast.

Let me draw a comparison, if I may:

Using a NoSQL option is like using a language with automatic garbage collection. True, the compiler/interpreter will take care of most of the headache for you, but if you understand what is going on, you will be much better off.

This is exactly why I recommend that many new programmers learn a “harder” or “older” language like C before moving on to Java. I know plenty of programmers who started on Java who are quite good, but they still think of the whole “memory management” thing like it’s a bit of voodoo going on behind the scenes. These are not dumb people; they’re great programmers who have never been forced to think about memory management. Contrast that against someone who started coding in assembly language and remembers the days of multiple-addition-instead-of-multiplication in order to optimize processor clock cycles (what can I say, I’m oldschool). Both might be just as good at coding, except when a memory leak appears, or the app gets slow — who would you rather be the one to debug the code?

But I digress. The point is that it is important to understand the MongoDB architecture. Here are a few quick axioms I put into place, though:

 

  • Never query without an index; compound indexes are your friend
  • Given the choice, more collections with less data each are preferable when you need to query those collections frequently (as opposed to fewer collections with more data in each)
  • Make good use of schema validation, mongoose middleware, etc.
  • Monitor your slow queries; there’s no “hard cutoff” point, but > 100 ms is probably bad

 

4. Get some Hints

I don’t care how good of a programmer you are; if you’re writing in a scripting language (node.js => javascript) and not employing some sort of code validation, you’re nuts. It is beyond hubris; it’s just dumb.

Let’s look at an example of a code mistake I made:

 

var user = this.getUser(),
params = {‘username': user.getUsername()};
cache = this.getCache();

this.updateUser( … );

 

Did you catch it? The 2nd line of the code has a semicolon at the end, instead of a comma. In the process of declaring my variables (presumably inside a function closure), I had accidentally scoped “cache” (and any variables coming after it, for that matter) to globals. This is exactly the sort of problem that is not a problem at all, right until it completely kills your code, takes down the servers, and generally starts an apocalypse (and we’re not talking a Terry Pratchett apocalypse here).

Another good example of a bad practice I had unwittingly used was the implementation of function closures (callbacks) within loops.
JSHint (with –show-non-errors) is your friend. Sure, as one guy in the #nodejs IRC chat said, “jshint is a dick.” That’s true – it likes to scream at you for the most banal things (do I seriously need a radix param every time I call parseInt()? I’m pretty sure node knows I’m looking for base10 unless otherwise specified). Regardless, it took me about 6 hours to go through all my code and get rid of every single warning. If nothing else, it was good peace of mind.

 

5. Take Out the Garbage

V8 is the garbage collector used by node.js. Like most garbage collectors, it works perfectly well, except when it doesn’t. I found that V8 was locking the node thread for up to 4 seconds, at times.

Weekly Love
Sign up for the newsletter to receive unique tips, recommendations for books and podcasts, and much more.

 

6. Modules are Toys for Node Coders

Here are a few of my favorite modules, just for good measure:

  • Forever or Naught (your choice) to keep the Node process alive in a production environment
  • Mongoose for easy mongoDB schema modeling
  • Cluster for taking advantage of multi-core servers
  • Express for making a nice RESTful API
  • Jade for creating dynamic HTML pages

 

Still Imperfect

As I said at the beginning of this post, I’m still not 100% satisfied with the stability of our servers. We’re still getting Gateway Timeouts from time to time, though each of the items in this list have helped performance. In fact, the actual load on our servers is quite low, considering that we’re serving thousands of users! Based upon the resources consumed (RAM, CPU, etc), I expect that we could scale to hundreds of thousands of users without increasing the size of our node cluster or MongoDB replica set, if only we could get this last issue resolved.

  • Albeit the 1

    I definitely see how being a badass assembly programer is helping you, kudos for nothing

    • http://LifeByExperimentation.com Zane the Experimenter

      “I believe that they who wish to do easy things without trouble and toil must previously have been trained in more difficult things”
      — Rhetorica ad Herennium

  • Anonymous

    Very interesting, cheers!

  • http://twitter.com/ivanbreet Ivan Breet

    Nice article!

  • http://twitter.com/ivanbreet Ivan Breet

    Thank you for this article! Also have a look at Monk as an alternative for Mongoose:
    https://github.com/LearnBoost/monk

    I started to use Monk for some of my little project, and it’s a bit more readable.

  • Christoph Walcher

    A little experiment concerning the statement “I’m pretty sure node knows I’m looking for base10 unless otherwise specified”:

    Try 

    parseInt(‘0452′);

    versus

    parseInt(‘0452′, 10);

    The first parseInt call returns the value 298 while the second statement returns 452 as expected. My advice is to really use the radix argument!

    • http://LifeByExperimentation.com Zane the Experimenter

      Fascinating, thanks for the tip!

  • http://gurjeetguri.blogspot.com/ gurjeet

    The 2nd line of the code has a semicolon at the end.

    • http://LifeByExperimentation.com Zane the Experimenter

      That was exactly the point. I was showing the bug!

Back to Top