RESTful API Design - The basics

Featured

Recently I had the opportunity to deliver a training course on creating RESTful APIs using Node.js, and during that class, a lot of questions were asked around API design. In this article, we'll take a look at some of the basics of API design, in the context of Node.js and Express.

Note that most of these concepts outlined in this article could be applied to any other framework in and out of the Node.js ecosystem.

This article only covers the basics of RESTful API Design - there are entire books on this topic, a simple web search should reveal them should there be interest to learn more.

Introduction to REST

REST stands for REpresentational State Transfer, and it is an architectural approach to designing web services. At its most basic level it uses hypermedia, and not many people are aware of the fact that REST is not necessarily tied to HTTP but most REST implementations do depend on HTTP as the protocol because it defines a communication protocol that's suitable for REST.

For something to be considered a RESTful API it doesn't necessarily need to run over HTTP, but it can use other transfer protocols such as SNMP, SMTP and so on.

Remember, HTTP is a communication protocol, while REST is an architectural design.

Basic design principles

Let's go through the six basic design principles for REST APIs - these are the most basic things that we need to think about when coming up with the API itself.

Resources

Normally an API is created to solve a business problem of some sort. And the key here is the term business. Before we do anything else, the very first thing that we decide upon should be what entities/resources we want to use. This should be well thought out in the beginning as this is something that will not (an should not) change throughout the API design and that's why it has crucial importance.

So what are these business entities? Orders, products, customers, shipments, departments, employees and so on, you name it. Essentially we are talking about nouns, in a plural form.

These entities don't necessarily need to be equal to a single data entity - meaning that a customer entity could be built up coming from a variety of data sources. What's important is that when presented to the consumer of the API, the customer returns all the data in a single form. So if customer information is stored in two different databases and spread across three tables we can still build that data into a customer entity.

Resource Identifiers

Every resource created should have an identifier. This is what the industry calls a URI - a Uniform Resource Identifier. The most important thing to remember regarding these resource identifiers is that they should be unique: they should uniquely identify a given entity.

Remember that the resource identifiers for individual entities should be truely unique. Either use UUIDs or in the case of MongoDB the ObjectId itself. Please also remember that sequential UUIDs are a performance killer in distributed systems, so avoid them.

Accessing the entities (resources) should then be possible via these URIs - they usually have the following form:

http://domain.com/departments

Executing an HTTP GET request against this URI should retrieve all the departments. We may also think about this as collections - the departments collection contains all the single department entities.

Resource hierarchy

Once the entities and their corresponding URIs are well thought out, we can start to think about resource hierarchy - how we want to organise and represent relationships between various entities.

Let's say that we are developing an API for an HR system. What would our entities be? This may not be a complete list but should give us a good start: Departments, Employees and Salaries.

Now let's think about the hierarchy. Employees belong to Departments and Employees also have Salaries. Therefore we'd have three collections - three URIs in other words - which would allow us to retrieve a list of all Departments, Salaries and Employees.

If we want to retrieve information about an individual department we should be able to pass in a department ID and therefore make the following HTTP GET request:

http://domain.com/departments/12

Taking this to the next level is to represent the relationships mentioned earlier. We should be able to easily list all the employees belonging to a given department. The HTTP GET call should look like this:

http://domain.com/departments/12/employees

We could also get some information (like name, phone number) of a given employee working at a specific department by making the following GET call:

http://domain.com/departments/12/employees/1568

Another example would be to find the salary of a given employee working at a given department:

http://domain.com/departments/12/employees/1568/salaries

This may be stretching things a bit too far. Implementing such complex routing to represent relationships is not necessarily the easiest thing to do and let's face it, it's quite cumbersome as well.

A good approach is to use HATEOAS to allow easy navigation between related resources. We will take a look at this later on in this article.

Resource representation

As a next step, we need to define the data structure that we'll return for requests - what data, in what format to return when someone requests information about a given employee working at a particular department.

These days, it's safe to say that the data format used in RESTful APIs is JSON. (Although it could be XML as well).

It's important to remember that the JSON structure returned by the API should be decoupled from the underlying database - meaning that if we use a relational database, which makes heavy uses of schemas, a change in the schema should not affect the representation of the entity in the API.

Now, how this could be achieved is an entirely separate topic but suffice to say that it shouldn't matter if a relational or a NoSQL database is used, changes at the database level should not impact the API.

There are a lot of approaches to data modelling for database systems, please explore these if this is of interest. In the realm of Node.js most developer would be using NoSQL databases like MongoDB to return data and most of the time the data returned by the API directly comes from the database. This could be a right approach since MongoDB stores a JSON like data structure in its database (it's called BSON).

NoSQL and relational databases are not necessarily competing with each other. Each has different, valid use-cases and each has strengths and weaknesses for some applied cases. A good system can use both databases to achieve the desired functionality for the business. The API can also gather data from both databases and present a final entity representation to a user.

HTTP methods for requests

The next step would be to think about the common HTTP methods. The most commonly used HTTP methods are the following:

  • GET - returns a resource. Either an entire collection or an individual resource. Both are based on a URI (http://domain.com/deparments or http://domain.com/departments/12).
  • POST - creates a new resource at the URI specified as part of the request. This request requires a payload - which contains the details of the resource to be added.
  • PUT - creates or replaces a resource at a URI specified. Again, the payload contains the details of the resource.
  • PATCH - applies a partial update to a resource specified by a URI. The payload contains the details of the changes to apply.
  • DELETE - removes a resource at a specified URI.

There are some subtle, yet important differences between PUT and PATCH - please read this article to learn more about them.

HTTP methods and Express

All of these HTTP methods can be accessed via Express in Node.js. Here's a conceptual implementation of this:

const express = require('express');
const app = express();
const port = 3000;
const router = express.Router();
const bodyParser = require('body-parser');
const cors = require('cors');

app.use(bodyParser.json());

// return all departments
router.get('/departments', handlerFn1);
// return a specific department
router.get('/departments/:id', handlerFn2);
// return all employees from a specific department
router.get('/departments/:id/employees', handlerFn3);
// create a new department
router.post('/departments', handlerFn4);
// create a new employee at a specific department
router.post('/departments/:id/employees', handlerFn5);
// bulk update departments
router.put('/departments', handlerFn6);
// update specific department if exists
router.put('/departments/:id', handlerFn7);
// bulk update employees at a department
router.put('/departments/:id/employees', handlerFn8);
// partially update a specific department
router.patch('/departments/:id', handlerFn9);
// partially update a specific employee at a specific department
router.patch('/departments/:id/employees/:id', handlerFn10);
// remove all departments
router.delete('/departments', handlerFn11);
// remote a specific department
router.delete('/departments/:id', handlerFn12);
// remove all employees at a specific department
router.delete('/departments/:id/employees', handlerFn13);

app.use('/api', router);

app.listen(port, () => console.info(`Server is listening on port ${port}`));

The above code sample is missing a few cases - like an ability to list all employees and so on, but it should give you a good idea on how to implement routes for a RESTful API in Express.

Express exposes a Router object which allows us so specify detailed routing, where the methods exposed on the Router object represent the HTTP method names. Finally, since we have a router object exposed, we could create another instance of it and mount it to another endpoint. Notice the app.use('/api/' router) line which mounts our router object to the /api endpoint therefore to access these endpoints we need to visit http://domain.com/api/departments for example.

HTTP Semantics

When it comes to the API, we have already established the fact that we are using HTTP. This means that we should conform to the HTTP specification itself, with all the status codes involved.

Generally speaking HTTP GET methods should use the 200 (OK) and 404 (Not Found) status codes. Simply put, if a resource is found, return it with HTTP 200, if it's not found return HTTP 404.

For HTTP POST methods we should be using either 201 (Created) or 400 (Bad Request). Normally, with a POST request, we need to send a payload, that will eventually create a resource. If that resource got successfully created, we could safely return a 201. If the client sends some invalid payload, we can reject it and return a status of 400.

For HTTP PUT we have two options for successful scenarios. If the request creates a new resource we need to return a 201 (Created) - just as we saw with HTTP POST. If there is an existing resource and we applied an update, we should return a 204 (No Content) (or a 200). If for whatever reason the update is not possible, a 409 (Conflict) should be returned.

In the case of HTTP PATCH we send a payload that contains the instructions of a partial update. If the client sends a malformed patch document, we should return a 400. If the client sends valid patch data, but we can't do the update, we should send a 409 just as we did with PUT. A successful PATCH should be indicated by a 204.

HTTP DELETE has two cases - either the resource is found (and therefore can be deleted) - in which case we can return a 204 and if the resource is not found we should return a 404.

Additional HTTP status codes that we should be aware of for REST APIs are the following:

  • 401 (Unauthorized) - for cases when a client tries to access a resource that requires authentication.
  • 403 (Forbidden) - for cases when the client makes a valid request, but the server refuses the request which is normally due to lack of permission to access the resource in question. (An example of this would be to access a file that the client doesn't have permission to view)
  • 500 (Internal Server Error) - generally indicates an error at the server-side and the details of the error are unknown.
  • 503 (Service Unavailable) - the server is not available due to an error (or because it's under heavy load)

HTTP status codes are categorised into these five categories:

  • 1xx - Informational messages
  • 2xx - Success messages
  • 3xx - Redirection messages
  • 4xx - Client error messages
  • 5xx - Server error messages

HTTP Status codes and Express

In Express we have the option to manipulate the HTTP status codes in the handler function for a route. Take a look at this excerpt:

const handlerFn = function(req, res) {
  const id = +req.params.id;
  if (id) {
    // lookup the department information
    return res.status(200).json(deptInfo);
  } else {
    return res.status(404).json(`Department with ${id} not found.`);
  }
}
router.get('/departments/:id', handlerFn);

Parameters in REST

Often there are situations when only parts of a given resource or resources should be accessed. In our HR system example, we could end up in a scenario when we don't need all the available information about departments but only their name and physical location. And even if we wanted to retrieve all the information we may end up with a large number of records - think about a GET call to retrieve all the employees.

In REST APIs we should be able to support filtering and pagination.

With filtering, we should be able to specify which parts of a given entity should we return to the requesting client and based on a condition, which entities to return precisely. Furthermore, in the case of a large dataset, we should be able to control how many documents we want to send.

Often there's a debate whether the processing of data should be done at the server or the client side. Let's go back to the employee list. Retrieving all the employees and sending them to the client for further processing (that is - find all employees who earn more than a certain amount per year) would waste bandwidth and processing power in the browser. It is much better to send the exact data requested by the client - that is the filtered list which should be much smaller in size.

Parameters in Express

Query Parameters are supported in express, so this allows us to have endpoints such as http://domain.com/employees?limit=10&fields=name,salary&sort=salary&sortType=desc

This would return the name and salary of the highest earning ten employees.

Express exposes req.query which we can capture and use to modify our logic:

const handlerFn = function(req, res) {
  if (req.query) {
    const limit = req.query.limit;
    const fields = req.query.fields;
    const sort = req.query.sort;
    const sortType = req.query.sortType;
    // only a conceptual implementation
    const filteredEmployees = `SELECT ${fields} FROM EMPLOYEES ORDER BY ${sort} ${sortType} LIMIT ${limit}`.exeute();
    return res.status(200).json(filteredEmployees);
  } else {
    return res.status(200).json(employees);
  }
}
router.get('/employees', handlerFn);

Versioning

For larger APIs, we need to think about versioning ahead of time. What if we introduce breaking changes? How to handle those changes? Usually, there are three types of versioning methods that REST APIs utilise. URI versioning, query string versioning and header versioning.

URI versioning

URI versioning is very straightforward to understand. In this case, the URIs never change, but they are "appended" with a version number.

http://domain.com/departments

becomes

http://domain.com/v2/departments

This is a rather simple versioning method, and it depends on the server itself to allow this routing mechanism.

Earlier we discussed the Router object in Express - this helps us to achieve URI versioning. We can mount new routes to new endpoints:

app.use('/api', router);
app.use('/v2/api', newRouter);

Query string versioning

This versioning means that we append the URI of the request with an additional query string indicating the version of the request that we wish to use.

http://domain.com/departments

becomes

http://domain.com/departments?v=2

In Express we can use req.query to capture this query string.

Header versioning

When sending HTTP requests, nothing is stopping us from placing custom headers with custom values as part of the request/response. And of course we can add version information as part of such custom headers:


This is how a request would look like in Insomnia.

We can capture such custom headers in Express and act on them:

const handlerFn = function(req, res) {
  const versionHeader = req.headers['custom-version-header'];
  if (versionHeader === 'version-1') {
    // do version-1 action
  } else {
    // do version-2 action
  }
}
router.get('/departments', handlerFn);

HATEOAS

HATEOAS - Hypertext As The Engine Of Application State is a mechanism that allows easy navigation between resources/entities. Remember that earlier we said that the relationship and hierarchy between resources could be represented by the URI structure such as /departments/12/employees, but this shouldn't go on to too many deep levels. This is because it's not easy to implement such hierarchies.

A much better approach is to utilise HATEOAS. The relationships can be represented as links inside the resources - essentially this means that full URI resource representations are placed in entities retrieved by the client that allows further navigation and exploration of related entities.

At the time of writing this article there are no standards around how HATEOAS should work.

Let's take a look at an example implementation:

{
  "departmentID": 12,
  "departmentName": "Sales",
  "departmentLocation": "London, UK",
  "links": [
    {
      "rel": "employee",
      "href": "http://domain.com/employee/1268",
      "action": "GET"
    },
    {
      "rel": "employee",
      "href": "http://domain.com/employee/1891",
      "action": "GET"
    },
    {
      "rel": "self",
      "href": "http://domain.com/department/12",
      "action": "POST"
    },
    {
      "rel": "self",
      "href": "http://domain.com/department/12",
      "action": "DELETE"
    }
  ]
}

The above indicates that we have retrieved information about a department and we get an array of links. These are links to related resources - like an employee (that is - the employees working at this department, think along the lines of /department/12/employees), or they are related to self meaning that we can also invoke actions on the same resource but use different HTTP methods.

One API that implements a similar mechanism is the Star Wars API that is accessible via https://swapi.co.

GraphQL

This article wouldn't be complete about the mention of GraphQL. Please read the Introduction to GraphQL post to learn more about this technology. To sum things up, REST and GraphQL do not compete with each other, but instead, they complement each other.

Summary

In this article, we had a look at the basics around REST API design, and there's still a lot more to discuss - we never talked about caching and other essential factors but I hope that there's sufficient information here that helps having a basic understanding of RESTful APIs.