Why is it important to always consider how you escape or encode newlines, carriage returns, apostrophes, quotes and other special characters from text input by users? It is important as we do not necessarily know what they are entering. Sure, you can validate or restrict what a user types. However, you may want to allow some freedom with what is entered by your users.
Malformed JSON and database changes
I had a problem today that wreaked havoc on our systems. This problem resulted in thousands of items backing up one of our queues. What was the cause? It was a malformed JSON message in the queue.
And if we were trying to add this string to a database, we may encounter the same issue. Although ASP.NET Core’s use of parameters mitigates some of these issues.
We recently introduced a feature to our customer portal where a customer could send us private messages.
A message is submitted via an online form in the portal by the customer where they type a message in a text box. The message is added to a queue (using SQS in Amazon AWS) and processed by an internal service that saves the message as a note to the system and raises a task for our team to action (respond).
This process was straightforward, but our big mistake was not escaping the content of the message that is input by the user. Information that we have little control over in terms of what was input. Users were entering newlines, carriage returns, apostrophes, quotes and other special characters.
Big problems from something so simple
How did we miss such a critical point that allowed us to break an entire service? The problem was not that the message stored in a string was invalid; it was when we added it to a JSON object it became malformed.
It was at the point we were adding the message to the JSON object that we needed to escape or encode the special characters and quotes.
Escaping and encoding strategies
So, how do we escape or encode newlines, carriage returns, apostrophes, quotes and other special characters? There are a number of tools you can use to escape and encode strings, but which tool you use will depend on what you are trying to escape or encode.
System.Web namespace contains a utility called
HttpUtility. Within this utility, there is a number of encoders for encoding a string.
This was perfect for my scenario above as we were using it to ensure our JSON was valid.
If you want to include quotation marks around the encoded string you can use the overloaded method with the
If you need to encode HTML then you need to use the HTML encoder as some characters have a different meaning in the HTML parsers and will be handled differently.
using System.Web; ... string html = HttpUtility.HtmlEncode(input);
If you need to encode a URL, then you need to use the URL encoder as any special characters passed to an HTTP stream without encoding could be misinterpreted at the receiving end.
using System.Web; ... string url = HttpUtility.UrlEncode(input);
We use Newtonsoft to convert our objects to and from JSON, so if you are using Newtonsoft, or would like to, there is a built-in method for making sure a string is escaped and encoded accordingly.
using Newtonsoft.Json; ... message = JsonConvert.SerializeObject(message);
You can also use some of the built-in regex methods provided by .NET. The regex escape method replaces a small number of characters with their escape codes.
using System.Text.RegularExpressions; ... string message = Regex.Escape(input);
This method is suitable if you are looking to store or display regex or if you just want to handle escaping white space. Quotes or apostrophes are not handled either and you will get a null exception if you pass a null value to the escape method.