Breaking: The Yahoo board has approved a deal to pay $1.1 billion in cash for the blogging site Tumblr: http://t.co/mtNZEg6c1V
Most Popular Ever
- Experiment: Stay in EU/Schengen for More than 3 Months (How-To) 61 comment(s)
- Sleep Hack: Fall Asleep in 2 Minutes or Less 10 comment(s)
- Google+, A Programmer’s First Experience (Loaded with Screenshots) 138 comment(s)
- Why You Should (Not) Have Your Own Mobile App 33 comment(s)
- Experiment: Sleep Less, Do More 30 comment(s)
About
Tags
Facebook Recommendations
Categories
This post is technical in nature, and aimed at programmers. Casual readers might still get something out of this, but I’m not making any attempts to remove jargon in here.
Node.js and MongoDB are great; they’re super-scalable and a joy to work with. However, there were plenty of growth pains as I implemented the Streamified API with these technologies. It took me a LOT of experimentation, and in the process I ended up exploring more optimizations and speed-ups than I ever thought I would have to consider.
Still, I have a StackOverflow question open representing the last of my issues. I wonder if it might actually be a bug in node 0.8.x. The purpose of this post, then, is to share what I’ve learned — and ask for help fixing the imperfections.
1. Check your Sockets
This is a rather arbitrary point to start, but it was one of the first bottlenecks I ran into, so it seems as good as any. If your node.js app is establishing outbound connections to other services (eg, Streamified connects to Facebook, Twitter, etc), you’re going to need to make sure that your servers are configured to handle it.
There is no “one-size-fits-all” solution here. You’ll need to consider the needs of your server. Here are a few good starting points:
- Use the request module, where possible. It’s more robust than coding http.clients
- Consider this post from LinkedIn, which suggests turning off socket pooling (note: I would take this with a grain of salt; in fact, the author of the request module vehemently implied that doing so was a horrible idea when I contacted him).
- Consider keep-alive vs. close connections. What is best for your scenario?
- Consider increasing the number of sockets used by node.js, eg:
require(‘http’).globalAgent.maxSockets = 9999;
2. Monitor your Performance
Our servers are on the Amazon Cloud and we use RightScale, so I have some built in monitoring tools:
But I needed something more node & mongo specific, so I ended up going with nodetime. If you’re not using nodetime, start using it now, at least in your testing environment. I won’t go into all the features here (readable heap snapshots… mmm), but I will say that their support is excellent. Not only did they help me with some installation issues, they also assisted in reading the heap snapshots, and even jumped onto a StackOverflow question and helped me to debug my code. Woah.
3. NoSQL Doesn’t Mean No Worries
Like many programmers, when I first started coding with NoSQL (MongoDB) I was ecstatic. True to claims, it handled so much of the headaches I had known in using technologies like MySQL. Gone were the days that I needed to be designing sophisticated schemas! Hooray!
Not so fast.
Let me draw a comparison, if I may:
Using a NoSQL option is like using a language with automatic garbage collection. True, the compiler/interpreter will take care of most of the headache for you, but if you understand what is going on, you will be much better off.
This is exactly why I recommend that many new programmers learn a “harder” or “older” language like C before moving on to Java. I know plenty of programmers who started on Java who are quite good, but they still think of the whole “memory management” thing like it’s a bit of voodoo going on behind the scenes. These are not dumb people; they’re great programmers who have never been forced to think about memory management. Contrast that against someone who started coding in assembly language and remembers the days of multiple-addition-instead-of-multiplication in order to optimize processor clock cycles (what can I say, I’m oldschool). Both might be just as good at coding, except when a memory leak appears, or the app gets slow — who would you rather be the one to debug the code?
But I digress. The point is that it is important to understand the MongoDB architecture. Here are a few quick axioms I put into place, though:
- Never query without an index; compound indexes are your friend
- Given the choice, more collections with less data each are preferable when you need to query those collections frequently (as opposed to fewer collections with more data in each)
- Make good use of schema validation, mongoose middleware, etc.
- Monitor your slow queries; there’s no “hard cutoff” point, but > 100 ms is probably bad
4. Get some Hints
I don’t care how good of a programmer you are; if you’re writing in a scripting language (node.js => javascript) and not employing some sort of code validation, you’re nuts. It is beyond hubris; it’s just dumb.
Let’s look at an example of a code mistake I made:
var user = this.getUser(),
params = {‘username’: user.getUsername()};
cache = this.getCache();this.updateUser( … );
Did you catch it? The 2nd line of the code has a semicolon at the end, instead of a comma. In the process of declaring my variables (presumably inside a function closure), I had accidentally scoped “cache” (and any variables coming after it, for that matter) to globals. This is exactly the sort of problem that is not a problem at all, right until it completely kills your code, takes down the servers, and generally starts an apocalypse (and we’re not talking a Terry Pratchett apocalypse here).
Another good example of a bad practice I had unwittingly used was the implementation of function closures (callbacks) within loops.
JSHint (with –show-non-errors) is your friend. Sure, as one guy in the #nodejs IRC chat said, “jshint is a dick.” That’s true – it likes to scream at you for the most banal things (do I seriously need a radix param every time I call parseInt()? I’m pretty sure node knows I’m looking for base10 unless otherwise specified). Regardless, it took me about 6 hours to go through all my code and get rid of every single warning. If nothing else, it was good peace of mind.
5. Take Out the Garbage
V8 is the garbage collector used by node.js. Like most garbage collectors, it works perfectly well, except when it doesn’t. I found that V8 was locking the node thread for up to 4 seconds, at times.
- You can use the –trace-gc command line argument to keep an eye on V8. I discovered that I had times where the garbage collection was taking > 4 seconds. This can be a major problem in a single-threaded environment like node, since it means that the thread is blocked (and not accepting connections).
- These 3 articles do a great job explaining how (and why) to use command line arguments like –nouse-idle-notifications. Playing with some of these tweaks seemed to help our performance.
Scaling Node to 100k connections
Scaling Node to 250k connections
Escaping the 1.4Gb Heap Limit - FWIW, I had to find a solution to passing command line arguments to node when running a forever/naught service to keep the task alive
- If you absolutely need to (I’m not recommending it), you can enable manual garbage collection. To do so, pass the –expose-gc flag to node; you’ll now be able to simply call the gc(); function.
6. Modules are Toys for Node Coders
Here are a few of my favorite modules, just for good measure:
- Forever or Naught (your choice) to keep the Node process alive in a production environment
- Mongoose for easy mongoDB schema modeling
- Cluster for taking advantage of multi-core servers
- Express for making a nice RESTful API
- Jade for creating dynamic HTML pages
Still Imperfect
As I said at the beginning of this post, I’m still not 100% satisfied with the stability of our servers. We’re still getting Gateway Timeouts from time to time, though each of the items in this list have helped performance. In fact, the actual load on our servers is quite low, considering that we’re serving thousands of users! Based upon the resources consumed (RAM, CPU, etc), I expect that we could scale to hundreds of thousands of users without increasing the size of our node cluster or MongoDB replica set, if only we could get this last issue resolved.
Like this post? Share it, or comment below!
-
Albeit the 1
-
http://LifeByExperimentation.com Zane the Experimenter
-
Anonymous
-
http://twitter.com/ivanbreet Ivan Breet
-
http://twitter.com/ivanbreet Ivan Breet
-
Christoph Walcher
-
http://LifeByExperimentation.com Zane the Experimenter
-
http://gurjeetguri.blogspot.com/ gurjeet
-
http://LifeByExperimentation.com Zane the Experimenter


