@DevDazed

BSON and Symbols and Segfaults..OH MY!!

If you’re like many people using NodeJS, you use it as a supplement to your current application to help with concurrency or socket streaming to your clients. For me, NodeJS is part of a large stack, one that includes Rails.  One common practice when writing Ruby app is to use symbols, in place of strings, as much as possible.  This helps conserve memory as well as keep consistent looking code.  MongoDB, per the BSON Spec,  supports the use of symbols as a datatype for values.  

After deploying about a dozen new features to our stack, we noticed that, soon after all of our NodeJS instances started failing with no error code than “Segmentation Fault”, no stack trace no lines of code, nothing.  After hours of troubleshooting our NodeJS servers we came to the conclusion that the problem had to be corrupt data, as it was failing on specific documents. So I did what anyone would do, open up the Mongo shell and dive into the document to see if I could find what was wrong.  It looked clean, no problems, the data was fine.  I then wanted to see how our Rails stack was handling the suspected corrupt documents, so I dove into the Rails console and found my bad document and compared it to a good document and viola! I found that one change we made was that we started using symbols, and the Ruby driver was storing the symbols in Mongo as the symbol datatype.  This wasn’t apparent in the Mongo shell as the shell displays any value with a symbol datatype as a string.  I updated the document and ran it through the NodeJS server again and it worked!  No segfault.  

So what does this mean?  Well it means that the Node MongoDB Native driver doesn’t support the deserialization of symbols in BSON, the most probable reason being that Javascript doesn’t have a symbol datatype.  Now the segfaults only happen when using the “native=true” in the driver, thus telling the driver to use the C++ BSON extension rather than JS. If you use the JS version, there is no segfault but that field will not be included int the document retrieved from Mongo.  

The solution: well there is none.  As of the writing of this post, you have to be sure to store all your symbols as strings when you are going to use that document in NodeJS, I have filed a bug report with the driver so stay tuned and I will post updates if there are any. If you want to see the issue in action, checkout this Gist

How to add Google +1 to Your tumblr Theme

You can easily add Google +1 for sites to your tumblr theme, by following these easy steps:

At the top of your theme in the <head> block, add this tag:

<meta name="if:Show Google Plus One Buttons" content="0" />

Now find the area in the {block:Posts} section of your theme, (you probably want it near the tweet or like buttons) and add this code:

{block:IfShowGooglePlusOneButtons}
  <g:plusone href="{Permalink}"></g:plusone>
  <script type="text/javascript" src="http://apis.google.com/js/plusone.js"></script>
{/block:IfShowGooglePlusOneButtons}

Now go to the appearance section and check the checkbox that says ‘Show Google Plus One Buttons’, click save+close and you will be good to go with Google +1 buttons for all of your posts.

Mongo Vs Redis, the Increment Battle

I was having a friendly dispute with a co-worker the other day, we were talking about what technology to use to count the number of impressions viewed in a certain time period.  I was of the opinion that MongoDB could handle the load just fine, and he was of the mind that Redis was the solution to go with.  I decided to put it to the test and here are my results.

I wrote the following quick ruby class and ran it on my MacBook Pro:

here are the results:

REDIS: 10723.352221581272 increments per second
MONGO: 14809.955155994303 increments per second

WOW, MongoDB is almost a full 40% faster at incrementing than Redis.  I’m curious to hear of other people’s results, @devdazed

Using Redis to Manage Surrogate Keys

In the ad-tech industry, we get a a lot of traffic.  One company I work with receives upwards of 3 billion events per month.  The big guys do around 20 billion per day.  To manage this amount traffic there are a lot of different data warehouse techniques that people employ to get every last bit of speed out of their apps, while preserving as much precious disk space as possible.  While natural keys employ an automatic sense referential integrity, storing them is not very efficient.  Surrogate keys, on the other hand, can be expensive to create and manage, plus a lot of datawarehouse solutions dont even offer the ability to auto-increment or even provide constraints.  Such is the case of InfoBright.

Continue reading…