Improving software security through input validation

In today's world, software is everywhere. It is becoming more and more a part of our every-day lives, whether we are thinking about services we rely on such as online banking or social media, or in our personal situation such as our mobile phones, home assistant devices, smart light bulbs, and so on. Cybersecurity is therefore playing a larger and larger role in the work done to support and secure our lives, and in this article, I'd like to talk to you about one aspect of that work.

When, as a solutions architect, I speak about security, I am speaking of a pervasive property of systems; and as such it is part of our thought processes from the very beginning of defining the business needs which lead to a project being started, through to forming a non-trivial part of the ongoing maintenance burden of any software system.

Just as security is a pervasive property of a system, in order to be considered as effectively as possible it must also be kept in mind by all members of the team dealing with a project. From secure design to the continuous monitoring of deployed systems in some fashion, the whole of a team on a project is responsible for, and should be actively working toward, securing the system as an ongoing whole.

Security can mean many things in the context of a system which includes software, but for the purpose of this article we'll only consider the aspects of it which pertain to defending the software from attack by a malicious third party. We shall not concern ourselves with what might happen should an attack succeed.

When we're thinking about attacks on a software system by a third party, there are two main terms we will come across. The first is the attack surface which is, simply, the sum of all the possible ways an attacker might attempt to exploit security issues to gain access to or maliciously affect a system. The second is the attack vectors themselves which are those individual ways mentioned previously.

The particular vectors which are applicable to a system will vary from system to system, however one thing is fairly universally part of such opportunities for attack - in order for a system to be of use, it must process some inputted user data in some way. Depending on the source of that data, and the mechanisms and layers in place to control who and when such data can be submitted to a system, it can be necessary to treat inputs with great care.

The primary mechanism by which we ensure that input to a software system is not malicious is that of input validation. As software engineers, we often perform such actions on inputs as part of loading it into data structures, or by asserting properties of the inputs to functions. By validating our inputs, and by asserting invariant properties of the data we are processing, we seek to ensure that the inputs we are given cannot cause our program to behave in unexpected or undefined ways. This is an essential part of ensuring that the intent of any particular piece of code cannot be subverted by a malicious actor presenting carefully crafted inputs to any exposed surface of the software system.

When you write software in this way, it is sometimes referred to as defensive programming because you are spending effort during the programming (or, frankly, the design) phase of a project to ensure that the attack surface of the system is smaller and better defended against malicious actors.

The Perl programming language, popular for a very long time though slowly dropping out of fashion, actually builds into the language a concept called taint checking which is a fundamental security feature for Perl. In brief, taint checking is a mechanism whereby data which originates from outside of the program itself is marked as tainted. Any expression which uses or relies on tainted data in any way is marked as tainted, except for some very restricted mechanisms which can be used to validate and then clean (or un-taint) the data. Using this, Perl programmers could be sure that they were never processing inputs which had not been verified as good in some fashion. Naturally if a programmer applied an incorrect or insufficient test then this might not entirely prevent attacks, but it is a competent tool in the arsenal of protections a programmer might deploy.

Another mechanism for protecting software against operating on un-validated inputs is to take advantage of the type system of your language to ensure that inputs are typed in such a way that to transform them into types that your program might operate on normally requires passing them through validation functions. Strongly typed languages might make this quite easy, but even without strong or strict typing, it is possible to design the structure of software to ensure that data is validated.

As an example, the BuildStream project deals with a significant amount of input data in the form of YAML documents. YAML has, in the past, been involved in a large number of security issues, leading to an opinion that YAML is often insecure by default. However it is possible, by defensive design and programming, to use YAML quite safely, and BuildStream does this by a number of careful processing and validation limits on its inputs. For example, BuildStream limits which features of YAML are considered supported in its inputs (no anchors and references, no use of tags to try and invoke internal object types, mappings with only plain string keys, etc) and then it explicitly processes that YAML carefully, transforming it into the primary internal data structures only after ensuring the coherence and consistency of its inputs.

Sadly, as you might expect, every time you implement an explicit check or transformation your software will by necessity get slower in that particular spot. BuildStream's loader is an expensive part of the tool's over all operational runtime. There is, therefore, an argument that excessive input validation may end up being more expensive than coping with an attack causing misbehaviour. This argument is often deployed, along with a "there's no way this input could ever be bad", to remove or relegate to debug-only builds, validation of inputs. Where inputs are otherwise protected (for example if they only come in over authenticated and encrypted connections from other trusted system components) they might be able to be considered pre-validated, but this is not as common as you might expect, nor quite as easy to prove in your design as you might hope.

Where the decision to omit validation in favour of performance or code size, or simply where the idea to validate was never considered, software can end up with insidious bugs which result in security issues. Often-times these can be in places where no normal programmer would consider there to be a likely attack vector. Zephyr is a new RTOS (real-time operating system) which aims at being scalable, is optimised for resource-constrained systems, and claims to be built with security in mind. Even with security at the forefront of the minds of Zephyr's designers and implementors, when the codebase was audited by NCC Group there were a number of security issues identified which boiled down to insufficient validation of untrusted inputs. Several of these were related to sufficiently low levels of network packet handling that pre-authentication protection would not be possible. With physical access to a system running Zephyr there were even more attack vectors including over USB. This is not to say that Zephyr as a whole is bad software, indeed the software quality is high and the project responded to the review from NCC Group rapidly and effectively to fix the high risk problems.

It is, I would hope, therefore evident that validation of your inputs, from the perspective not only of detecting accidental incorrect input (the processes by which you might give good diagnostics to non-malicious users) but also of preventing malicious inputs from causing unexpected or undefined behaviour within your code, is very much a worthwhile investment of your time. I encourage you to think about a project you're currently working on, whether it is in the early design phases, or even in long-term "maintenance only mode", and think about whether it sufficiently guards itself against malicious input, and if not, what malicious input might be able to do to disrupt or even damage the systems that software is deployed into. If, on reflection, your project could be viably attacked through malicious inputs, consider what kinds of input validation it would benefit the most from, and then write some requirements and verification criteria to ensure that you are protected. Don't let anyone tell you that it's not worth it. After all, as Joseph Heller said in Catch-22, “Just because you're paranoid doesn't mean they aren't after you.”.

Improving software security through input validation

Other Content

Get in touch to find out how Codethink can help you

connect@codethink.co.uk +44 161 660 9930