How to Avoid SQL Injections, XSS Attacks, and File Upload Attacks in your Web Application

In this lesson, we will learn why we should validate and sanitise data sent from an (AJAX) request or frontend forms before persisting (storing) them in our database. This was meant to a small part of a lesson about data validation and sanitisation for this series: Full-Stack Google Contacts Clone with Adonis.js/Node.js & Quasar/Vue.js but I decided to dive deep into this topic as a standalone lesson.

Data Sanitisation covers all techniques and technologies employed towards ensuring that data (including files) uploaded by a user of a website/web app to the server for storage are stripped off of all potentially-dangerous contents. Data sanitisation also covers the techniques of ensuring that data is safely rendered on the frontend for a user after being fetched from the database. And if these data will be further processed on the server, data sanitisation ensures that malicious data won't be executed on the server while the data is being processed server-side.

Data Validation covers all techniques employed to ensure that data uploaded to your server are in the correct format you are expecting them to be. For example: ensuring that email is a valid email, or birthday is in yyyy-MM-dd format, or password is strong enough, or a username does not include some characters (like verified symbol, in the case of Twitter), or username is not longer than a fixed length, etc. In the case of files, Data Validation ensures that files of the appropriate type and size are uploaded and stored.

Data Validation & Sanitisation can be done both on the frontend and backend but the backend implementation is the most important. Validating data on the frontend provides excellent user experience as users do not have wait for the form to be submitted before knowing that their username is too long or password isn't strong enough. However, you should still implement the same validations on the backend because the form can be by-passed by disabling JavaScript. In the case of an API service or single-page applications (SPA) where interactions with the server are made mostly via AJAX requests, backend validation and sanitisation is a must-have.

Why Validate & Sanitise our Data Before Persistence?

Simple answer: it is a wild world out there. 😆 The very moment your website/app goes public, there are bad actors (hackers, cyber criminals, etc.) who will be trying all forms of tricks on your app in order to exploit weaknesses and steal sensitive data or even shut down the entire app/server. Some of those tricks include SQL injections Attacks, XSS - Cross Site Scripting, File upload attacks, etc. Let's discuss these attack vectors with a little more context.

SQL injections Attacks

An SQL injections Attack is usually caused by poor data sanitisation before persistence. Most especially, when data containing SQL codes are uploaded by a malicious user, stored on the database, and latter executed during a database query. Let me break these down with lucid examples.

Now imagine our database table named contacts. It contains the following columns: id, first_name, surname, company, job_title, and email. You have an API server. Your API server provides the ability to store data into the contacts table. For the API route - POST /contacts which is used to create new contacts - the following JSON data is expected in the body of the request so that API server to pickup the data:

{
    "firstName": "Idara",
    "surname": "Akande",
    "company": "Gotedo Church Management Software",
    "job_title": "Software Developer",
    "email": "idara.akande@gotedo.com",
}

When the server receives the request, the SQL INSERT command (for saving the data to the database) will look like this (the id value is autogenerated by the server):

INSERT INTO contacts (id, first_name, surname, company, job_title, email)
VALUES ('ckup6nhc70000lgvoc0hi7484', 'Idara', 'Akande', 'Gotedo Church Management Software', 'Software Developer', 'idara.akande@gotedo.com');

However, there is a bad actor who wants to exploit your API server. So instead of supplying safe values, they provide some an exploited value such as " or ""=" as the value in the company field on your frontend form.

So, because you did not provide data validation/sanitisation of the data input, the malicious value for company is persisted to the database as delivered from the frontend. This alone does not harm you. The harm comes when the data for that particular contact is being queried.

The SELECT command for that contact will end up looking like this:

SELECT *
FROM contacts
WHERE company="" or ""="",

Because of the malicious data in the company field, the clause WHERE company="" or ""="" will resolve to WHERE company = false or true and the entire rows and columns of the contacts table will be returned to the malicious actor. Now, imagine this happened in a real software like Facebook or Instagram on the users table, all account information of everyone using the platforms will be dumped.

Preventing SQL Injections Attacks

Escape user-provided data before storing. One way to prevent such SQL injection attacks is to validate and sanitise your data before storing them in the database. Particularly, you must escape the values of some of the fields before storing them. For the example above, we will escape all the values except the value for the email field. So the company value will be stored like this: " or "=" instead of this " or ""=". This escape totally neutralises the bad intention of the malicious actor.

Additionally, for the email field (for example) which cannot be escaped because escaping it will mangle the characters, you should make sure that you validate the email field on both the frontend and backend before storing. What this means is that you should check and ensure that the value provided for email is a valid email address, especially on the backend. Only and only if the validation succeeds should you proceed to store the email. Validating the email only on the frontend is as good as having no validation especially for an API server because requests can also be sent via direct AJAX requests which by-passes forms.
Make Use of Parameter(Variable) Bindings / Parameterized Queries / Prepared Statements. All database engines (MySQL, PostgreSQL, etc) have implemented in-built mechanisms to prevent SQL Injection Attacks by providing ways to safely bind values to parameters during queries.

In NodeJS for example, the unsafe query discussed earlier could be safely executed like this with bindings (?):
```
const connection = getConnection(); // Database connection
const company = "" or ""=""; // Potentially  unsafe value from database
connection.query("SELECT * FROM contacts WHERE company = ?", 
[
  company
], function(error, results){
// other operations on `error` or `results`
});
```
Read more here: How to Prevent SQL Injection in Node.js.

If you are using the Knex ORM, it could read like this:
```
const company = "" or ""=""; // Potentially  unsafe value from database
knex('contacts')
.select('*')
.where(knex.raw('company = ?', [company]))
```
MySQL has prepared statements which offer an efficient way of making large repetitive statements. Prepared statements offers SQL injection mitigation out of the box.
```
mysql> PREPARE stmt FROM 'SELECT * FROM contacts WHERE company = ?';
mysql> SET @company = "" or ""=""; # Potentially  unsafe value from database
mysql> EXECUTE stmt USING @company;
# Results will displayed here
# Then deallocate the prepared statement
mysql> DEALLOCATE PREPARE stmt;
```
Read more about MySQL prepared statements here: Prepared Statements.

Further study links:
1. For Knex, read more here: Raw Parameter Binding.
2. For Ruby/Rails: Using bind parameters in Active Record for SQL security and performance.
3. For Python/Django: Performing raw SQL queries.
4. For Java: Using Prepared Statements.

Cross Site Scripting (XSS) Attacks

Cross Site Scripting Attacks are also a result of lack of validation and sanitisation of user inputs before storing them in the database. It could also be a result of lack of Web Application Firewall (WAF) which is usually implemented on the web server level (or sometimes, within the application). The two most-common types of XSS attacks are: Reflected XSS attacks and Persisted (Stored) XSS attacks. Let's discuss each of them.

Reflected XSS Attacks

Reflected XSS Attacks are usually delivered via the URL of a website or web app. It involves embedding malicious scripts within the query portion of a URL and sending a request to the website or web app with the malicious URL. If the website/web app is not fortified against (Reflected) XSS Attacks, it will return the query portion while responding to the request. Then the browser will execute the content of the malicious scripts. Let's look at an example.

Now, imagine a malicious actor is sending a request to your website/web app with this URL (This can be done via the browser or terminal): www.mysite.com/about-us/?name=<script>window.onload=function(){//embed malicious script}<script/>.
This is a very basic format for Reflected XSS Attacks which is already mitigated by all modern browsers. If you attempt the above URL with a Chrome or Firefox browsers, the browser will escape the script tags within the URL.
But let's assume that the attacker is using a very old browser or the terminal to send the exploit. If your web server (NGINX or Apache, etc) is not installed with a Web Application Firewall (such as NAXSI for NGINX), the web server will return a 200 response with the query portion containing the malicious script still intact.
As seen in the malicious URL, another script will be embedded in the web page when the URL returns. The embedded script could be a script which will fetch stored cookies or data from the browser local storage or listen to keystroke from users using your web site or web app.

How to Prevent Reflecting XSS Attacks

Use a Web Application Firewall (WAF). The good thing is that most modern browsers will escape any < or > found in URLs thereby rendering them ineffective. However, if you are running a web app with lots of user interactions, a Web Application Firewall should be a must for you. If you are using NGINX web server, you should consider building your NGINX web server from source (instead of using the pre-built versions from your operating system's repository) and installing the NAXSI web application firewall plugin. A WAF will detect malicious URLs and block the requests with 40* response thereby preventing a 200 response to the browser. You can also link the WAF to the server's kernel firewall so that the IP address can be banned for a period of time. On Linux, Fail2Ban can monitor your web server logs, detect blocks from the WAF plugin, and then proceed to ban the offending IP Address for a period of time through the system (kernel) firewall.

Links:

Learn more about NAXSI.
Learn more about Fail2Ban.
Learn how to build NGINX from source with NAXSI.

It is worthy to mention that they are paid WAF services which you can setup to sit in front of your application or to act within your application. Few are: AWS WAF and Barracuda.

Persisted (or Stored) XSS Attacks

As the name implies, Persisted (or Stored) XSS attacks are delivered by persisting (or storing) malicious scripts into the database. These scripts are then embedded into the user's browser when the exploited data is fetched and rendered on the browser. It is usually a result of not validating and sanitising user inputs before storing them into the database and rendering/injecting data fetched from the database as HTML entities instead of text. Let's look at an example:

Now, imagine you have a form accepting user inputs which will be stored in the database. The form contains fields for First Name, Surname, Company, Job Title, Email, and Notes. Your API server provides the ability to store data into the contacts table. If proper inputs were provided in the form, the JSON data from the form for the POST /contacts request will look like this:
```
{
"firstName": "Idara",
"surname": "Akande",
"company": "Gotedo Church Management Software",
"job_title": "Software Developer",
"email": "idara.akande@gotedo.com",
"notes": "I met Idara at a tech conference"
}
```

However, there is a malicious actor who exploited the notes field. So the following JSON was delivered to the server for your application:

{
"firstName": "Idara",
"surname": "Akande",
"company": "Gotedo Church Management Software",
"job_title": "Software Developer",
"email": "idara.akande@gotedo.com",
"notes": "<script>window.onload=function(){//embed malicious script}<script/>"
}

Due to lack of input validation and sanitisation, the server ends up saving the malicious contents in the notes field as delivered into the database.
Now, let's assume that the website or web app was developed without consideration for XSS attacks and the contents from the notes field was fetched from the database and injected into the DOM as an HTML entity instead of escaped text, then the embedded malicious script will be executed.
Imagine, again, that this was executed in the context of an administrator, then it is possible for administrator cookies and data in local storage to be stolen. Even if it was executed in the context of a user, it could lead to loss of sensitive user data such as credit cards, username and passwords, authentication tokens, etc.

How to Prevent Persisted (or Stored) XSS Attacks

Data Validation and Sanitisation. As discussed already in the section for SQL Injection Attacks, one of the most effective ways to avoid Persisted XSS Attacks is to make sure that all user inputs are validated, in order to ensure that they conform to the formats expected by the server (such as an email being a valid email, or a username not including some character or exceeding a particular length). Data must be sanitised in order to remove/neutralise potentially malicious content from/within them.
Hard Limiting of Character Length of Fields. Let's take MySQL as a example. When you are creating the columns in the contacts table, you should set hard limits to the lengths of the characters for each columns. For example:
```
CREATE TABLE `contacts` (
`id` varchar(32) NOT NULL,
`first_name` varchar(32) NOT NULL,
`surname` varchar(32) NOT NULL,
`company` varchar(40) DEFAULT NULL,
`job_title` varchar(32) DEFAULT NULL,
`email` varchar(48) DEFAULT NULL,
`notes` text,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
```
The above CREATE statement will ensure that there are hard limits to the size of each column so that long malicious contents are not stored in those columns.

Soft Limiting of Field Lengths. Let's assume that you used varchar(255) to create each of the string fields. Something like below (which is common when using ORM such as Knex):

CREATE TABLE `contacts` (
`id` varchar(255) NOT NULL,
`first_name` varchar(255) NOT NULL,
`surname` varchar(255) NOT NULL,
`company` varchar(255) DEFAULT NULL,
`job_title` varchar(255) DEFAULT NULL,
`email1` varchar(255) DEFAULT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

You can limit the length of each field through validation. The validation can be done on both the frontend and backend. However, the backend validation is most important as it cannot be by-passed. Let's take the AdonisJS Framwork for Nodejs as an example, the validation schema below limits the lengths of some of the fields:

public schema = schema.create({
 firstName: schema.string({ escape: true, trim: true }, [rules.maxLength(30)]),
 surname: schema.string({ escape: true, trim: true }, [rules.maxLength(30)]),
 company: schema.string.optional({ escape: true, trim: true }),
 jobTitle: schema.string.optional({ escape: true, trim: true }),
 email1: schema.string({ escape: true, trim: true }, [
   rules.email(),
   rules.unique({
     table: 'contacts',
     column: 'email1',
     caseInsensitive: true,
     whereNot: this.refs?.id ? { id: this.refs.id } : {},
   }),
 ]),
 email2: schema.string.optional({ escape: true, trim: true }, [rules.email()]),
 phoneNumber1: schema.string({ escape: true, trim: true }, [rules.maxLength(20)]),
 phoneNumber2: schema.string.optional({ escape: true, trim: true }, [rules.maxLength(20)]),
})

As seen, the firstName, surname, phoneNumber1 and phoneNumber2 fields are limited to a maxLength. You can also see the escape options in all the fields. This ensures that potentially malicious contents are escaped (neutralised) before being stored in the database.

Render Data As Text. When you fetch data from the database and want to render them on the frontend, you should interpolate (insert) them into the DOM as texts and not HTML entities. Vanilla JavaScript and all frontend frameworks such as Vue, React, Angular, Solid, etc have ways of interpolating data into the DOM safely. Let's take Vue for example, when you do the following:
```
<template>
// firstName is defined in the script section
<div id="first_name">{{firstName}}<div/>
</template>
```
Vue will automatically escape the content of firstName (which was fetched from the database) and insert as text. In most cases, you should never do this in Vue:
```
<template>
// firstName is defined in the script section
<div id="first_name" v-html="firstName"></div>
</template>
```
And, in Vanilla JavaScript, you should never use innerHTML to inject un-sanitised data:
```
<html>
<body>
<div id="notes"></div>
<script>
const notes = `<div><p>This came from the database<p/><div/>`
const notesNode = document.querySelector('#notes')
notesNode.innerHTML = notes
</script>
</body>
</html>
```
As they could lead to injection of malicious HTML into the DOM.
Sanitise HTML Data Before Storage. You might ask: what about cases where we have a rich-text editor which outputs HTML contents that must be saved to the database, retrieved from the database, and subsequently rendered on the frontend. This is what happened when you are using drag-and-drop builders or rich-text editors (such as quill ).

Well, in this case, you must install additional libraries for sanitisation of the HTML output before saving them to the database. You can consider the following libraries:
1. Nodejs: sanitize-html,
2. Ruby: rgrove/sanitize,
3. PHP: tgalopin/html-sanitize,
4. Java: OWASP/java-html-sanitizer.
These libraries perform the same function for different languages:
1. You define allowed HTML tags for your backend. For example, you can define the following as the allowed tags: p, div, ul, ol, li, b, i, img, h1, h2, h3, and h4,
2. When the user submits the HTML text from the frontend for storage, you will make sure that it passes through the sanitiser library which will examine all the HTML tags in the HTML text. If any tag not allowed is found, they will be stripped off. For example, if a HTML text includes a script or table or html or body tag, these offending tags will be removed.
3. After this, you can confidently save the HTML text to the database.
Since you are now confident that the HTML text was sanitised before storage, during rendering you should use the v-html directive (for Vuejs) to inject the HTML text directly into the DOM as a node (using the normal escaped interpolation will spit out the HTML content as text which isn't what you might want):
```
<template>
// notes is defined in the `script` section
<div id="notes" v-html="notes"></div>
</template>
```
With Vanilla JavaScript, you can now do:
```
<html>
<body>
<div id="notes"></div>
<script>
const notes = `<div><p>This came from the database<p/><div/>`
const notesNode = document.querySelector('#notes')
notesNode.innerHTML = notes
</script>
</body>
</html>
```

File Uploads Attacks

This form of exploit can be used to deliver malicious executable files (often called payloads) such as .exe, .php, .js, .sh, etc. to your server and have them execute malicious scripts on your server. It is popular because it provides the most potent attacking power if the uploaded file is successfully uploaded and executed on the server. The malicious script might have full or restricted access to database, file, and network resources. If successful, the script has unrestricted server-to-server communication between the infected host and the server/computer of the malicious actor.

After a successful execution, the malicious script has the ability to download extra payloads (files) into the infected server and propagate these files across the server while being limited only by the system permission of the running process. By permissions, I mean if the payload was delivered via the NGINX web server, it's permission might be restricted to the permission of the www-data user which is the default user for NGINX web servers. However, even when the www-data user might not have permission to write and execute files within system directories, it might be able to read some system configuration files and send their content across to the server of the malicious actor. So, evidently, file upload attacks is highly sought after but could be the most difficult to accomplish.

A typical File Upload Attack might look like this:

Your application provides a functionality to upload and set profile pictures, and a malicious user sets out to exploit your application through that functionality.
If you have JavaScript validation on the frontend which checks for the file extension via mime types, the malicious actor could disable the JavaScript and by-pass the frontend validation.
If you did not implement server-side validation, the file upload will proceed and the file saved to the server's file system.
Now, the malicious actor will refresh the profile page to load the new profile picture (which is now a malicious file). The profile picture will appear broken on the frontend because it isn't a valid image file.
Now, the bad actor will copy the link to the file from the DevTools, paste the link to the browser's address bar and remotely execute the file from the browser. If the file was a .php, .sh file, the server will attempt to execute the file instead of just fetching and streaming it to the frontend.
The server is now compromised.

How to Mitigate File Upload Attacks

Implement File Validation on Both Frontend and Backend. Why I talk about frontend validation a lot is because for most application you can stop most attempts to attack your website/web app at the frontend before it gets to the server, especially if you have a 100% server-side rendered website where the API server is not opened to the outside. However, even if you have a single-page application (SPA) where you have to interact with the API server via AJAX requests, you should still implement validations on the frontend as first point of deterrance. With respect to File Upload Attacks, you should make sure that your frontend checks for the size of the file and the mime type of the file before you upload the file to the server. This will block most File Upload Attacks (especially if you have implemented CORS so that your API server can only respond to requests from configured origins. At the server side, you should also check for the file size and mime type of the uploaded file. This is important for both server-side rendered applications and single-page applications. Remember that the JavaScript on the frontend can be easily disabled, so the backend validation provides the last and unavoidable checkpoint. Server-side validation is even more important if you run an API service where the only means of interacting with the API server is via AJAX requests.

Validating the mime type of a file starts by checking the extension of the file. You can have an array or list of extensions you are expecting for the particular file. If you are expecting a picture for a profile picture, you might be expecting a PNG or JPG or WEBP file. Your validation library should check for the extension of the incoming file against the list of allowed extensions. But this is not enough. Attackers can rename a malicious file from, say, potential_dangerous_file.sh to potential_dangerous_file.png. In this case, how to do validate the file? This is where mime types come in. Every file created has a magic number which determines the original format of the file. The magic number is constant irrespective of the file name and extension. For JavaScript/ NodeJS, you can use the file-type package to accurately detect the mime type of a file based on its magic number. Read more about mime types:
1. Introduction to File MIME Types
2. Common MIME types
Only and only if the mime type has been thoroughly checked should you allow the file upload to continue. Again, do not only check for the file extensions. Check for the mime type as well using the magic number of the file. After checking for the mime type, do not go back to derive the extension of the file from the extension of uploaded file. Rather derive the extension of the file programmatically from the mime type after checking for the magic number of the file.
Change the File Name of the Uploaded File. This solution might not sound obvious. As a rule, always rename the files uploaded by users to your server unless there is a very pressing reason not to (like if you provide file storage services). Renaming files ensure that actor cannot remotely execute the file as the filename has changed. You should try not to include user identifiers such as user IDs in the filenames. Renaming files ensures that bad actors do not include bash variables in filenames which can translate to resource names. As describe in the previous point, ensure that you validate the file on the server-side and derive the file extension from the magic number of the file.
Harden your Server's Filesystem Permissions. You should ensure that your server's file permissions are hardened to avoid malicious actors uploading files directly into your file system without passing through your API server. Ensure that your web server runs as a low-privilege user. That the directory permissions within the web directory do not permit file uploads by others outside the web server user such as www-data. That the web server is not configured to show directory listing unless you are running a file directory service. That all files saved to the web directory have a non-executable permission by default. For Linux, study how to use umask for setting default permissions of new files. Start from here: What is Umask and How to Use It.
Install a Web Application Firewall. There are lots of vulnerabilities around file upload. One of the best ways to mitigate all of them is to configure your web server with a web application firewall as discussed earlier.
Check this File Upload Vulnerability Checklist: File Upload Vulnerability Tricks And Checklist

I hope this article explained in details how to prevent the SQL injection, Cross-Site Scripting, and File Upload attack patterns in your websites and web applications.

If you have questions and need assistance with how to harden your web application and Linux server, please contact me on Twitter. I have at least 10 years experience with Linux server administration.

Cover image credit: Freepik.