Posted 10 months ago
BSON and Symbols and Segfaults..OH MY!!
If you’re like many people using NodeJS, you use it as a supplement to your current application to help with concurrency or socket streaming to your clients. For me, NodeJS is part of a large stack, one that includes Rails. One common practice when writing Ruby app is to use symbols, in place of strings, as much as possible. This helps conserve memory as well as keep consistent looking code. MongoDB, per the BSON Spec, supports the use of symbols as a datatype for values.
After deploying about a dozen new features to our stack, we noticed that, soon after all of our NodeJS instances started failing with no error code than “Segmentation Fault”, no stack trace no lines of code, nothing. After hours of troubleshooting our NodeJS servers we came to the conclusion that the problem had to be corrupt data, as it was failing on specific documents. So I did what anyone would do, open up the Mongo shell and dive into the document to see if I could find what was wrong. It looked clean, no problems, the data was fine. I then wanted to see how our Rails stack was handling the suspected corrupt documents, so I dove into the Rails console and found my bad document and compared it to a good document and viola! I found that one change we made was that we started using symbols, and the Ruby driver was storing the symbols in Mongo as the symbol datatype. This wasn’t apparent in the Mongo shell as the shell displays any value with a symbol datatype as a string. I updated the document and ran it through the NodeJS server again and it worked! No segfault.
So what does this mean? Well it means that the Node MongoDB Native driver doesn’t support the deserialization of symbols in BSON, the most probable reason being that Javascript doesn’t have a symbol datatype. Now the segfaults only happen when using the “native=true” in the driver, thus telling the driver to use the C++ BSON extension rather than JS. If you use the JS version, there is no segfault but that field will not be included int the document retrieved from Mongo.
The solution: well there is none. As of the writing of this post, you have to be sure to store all your symbols as strings when you are going to use that document in NodeJS, I have filed a bug report with the driver so stay tuned and I will post updates if there are any. If you want to see the issue in action, checkout this Gist