Skip to main content
Exact Realty Blog

Opaque IDs: the ultimate protection against enumeration attacks

·11 mins

In this post, we’ll discuss two types of attacks, timing attacks and enumeration attacks, which can result in disclosing confidential information to attackers when accessing resources. We’ll then introduce a method to neutralise these attacks using AEAD encryption.

Why IDs are used #

A great many applications involve some sort of information retrieval for interactivity. The information is stored in some type of database and each item or resource that is typically assigned a unique identifier, or ID. This identifier can then be used to refer to that item later on.

For example, if you have a web application where users can upload photos, you might store each photo’s metadata in a database and assign each row a unique ID. The ID will then be sent in some form to the user, so that when she wants to see the photo the application can retrieve that file by looking up the ID in the database.

This applies not just to web applications, but rather is a feature of APIs in general. For example, a REST API might provide a user a path like http://api.example/foo/123 to identify a resource of type foo named 123.

How IDs might be exploited to gain access #

Disclosing internal information (such as an ID) to users can result in providing them with access to other information that they are not intended to have access to.

While the ID in itself might seem like a harmless piece of information (and it usually is, in isolation), it could allow users to make inferences about the system and potentially gain access to private data.

German tank problem #

The German tank problem gets it name from a statistical problem the Allies faced during World War Ⅱ. The Germans built a lot of tanks, and the Allies wanted to know how many as well as the monthly rate of production. Since the Germans used a sequential numbering system to label the components for each tank they built, when Allies captured one, they could see its serial numbers. By looking at the highest serial number they saw, and applying some statistical analysis, they could effectively estimate how many tanks the Germans had built.

This problem can be very relevant to information security. Suppose you are using sequential IDs for the photo upload application mentioned earlier and that each photo is given a sequential ID (e.g., 1, 2, 3, etc.). Then, by just observing the IDs being used, an attacker could gain information about the number of photos stored in the system as well as the rate at which photos are uploaded. Whether this is a concern or not depends, of course, on the specifics of the situation. Most likely, the information in this particular example is of little value, but it might be helpful to, say, a potential competitor considering building a similar application.

Enumeration attacks #

While the German tank problem shows how our choice of identifiers might non-public reveal information to an adversary, many or even most applications are interactively accessible to adversaries. This interactivity allows attackers to carry out another attack that can give them the same information directly, without the need to guess: an enumeration attack.

An enumeration attack is a type of attack where an attacker attempts to gather information about a system by systematically trying different values for a particular parameter. In the context of IDs, an enumeration attack might involve an attacker trying a large number of possible IDs to determine which IDs exist and which do not. Once the attacker has determined which IDs exist, they can potentially use that information to carry out further attacks.

Enumeration attacks can be particularly effective when the IDs being used follow some kind of sequence or are represented in a relatively small set and are part of a publicly available URL, as it is trivial to automate the process of trying a large number of possible IDs.

For example, consider the previous API with an endpoint like http://api.example/foo/:id, where :id is a numeric and sequential ID. It is trivial for an actor to try many requests, such as http://api.example/foo/1, http://api.example/foo/2, http://api.example/foo/99, etc., and evaluate the response.

The impact of enumeration attacks depends on the application in question. In many situations, successfully exploiting an enumeration vulnerability will only provide information about which resources are valid, making it similar to the German tank problem. However, in combination with additional vulnerabilities in the application, especially broken access control, it could have farther reaching implications, such as leaking confidential information, including customer data.

Prevention #

Enumeration attacks can be avoided or mitigated by addressing the factors that make them possible: small sets and predictable values. A common solution is to use UUIDs version 4, which consist of 122 random bits and are therefore impossibly impractical to guess or enumerate.

Timing attacks #

A timing attack is a type of side-channel attack where an attacker attempts to gain information about a system by measuring the amount of time it takes to perform certain operations. In the context of IDs, a timing attack might involve an attacker trying to guess a resource’s ID by trying a large number of possible IDs and measuring the time it takes for the system to respond. By measuring the time it takes to receive a response, the attacker can potentially learn information about the system, such as which IDs exist and which do not.

Timing attacks can be a way to extract information even from systems that implement proper access control. Consider the endpoint from earlier: http://api.example/foo/:id. Let’s say that IDs follow a simple numeric sequence (i.e., 1, 2, 3, etc.) and that the no path IDs is publicly accessible. Our adversary is a user to the system with legitimate access to the resources with an ID of 11 and 17, and no access to any other resources. How could this adversary carry out an enumeration attack?

Because of difficult or impossible to avoid circumstances when developing such an application, the endpoint will likely take different time to respond depending on the result. This information, if consistent, can be revealing of the internal state. Our attacker could proceed to make queries against the endpoint to all resources, for example, from 1 to 100 000. The measured response times could look like something like what is shown in the table below.

ID rangeMean response time (ms)
18–7 36349
7 364–100 00045
Table with mean measured response times for different IDs. Note the difference between various IDs. The IDs the attacker has legitimate access to have a mean response time of about 56ms, whereas the remaining resources have response times of about 50ms for IDs under 7 364 and of about 50ms for higher values.

Note that from ID 7364 onwards the response time is lower. This information can reveal to the attacker that the system likely contains 7 363 entries.

Depending on how lookups are carried out, timing information can also be used to identify valid IDs even when they don’t follow a sequence and there is a much larger set of valid IDs (such as UUIDs). This means that the specifics of how data are looked up are important, not just the format of IDs.

Mitigation #

There are several ways to mitigate timing and enumeration attacks. As mentioned earlier, one common solution against enumeration attacks is using random UUIDs instead of sequential IDs. These have a very low probability of being guessed by an attacker. However, UUIDs can be relatively long, which can be a disadvantage in certain contexts, such as when they need to be used in URLs.

Timing attacks, being a side-channel attack, are impossible to eliminate entirely because so doing would require eliminating all possible side-channels. However, there are ways to practically mitigate them. Ideally, we would write our application so that response times are independent from user-provided data. Since this is impractical, we can practice defence-in-depth and make timing information less useful and more difficult to obtain.

For example, if we are concerned about enumeration attacks, we might add a MAC or digital signature to IDs, which (1) users cannot readily forge and (2) we can verify before proceeding further with the request.

The new IDs could look like the original IDs, but with the MAC or signature prepended to it, like this <MAC>.<ID>. So, the internal resource 1 could result in a user-visible ID like 53CUR3C0D3.1, with 53CUR3C0D3 being a value that we can verify as corresponding to resource 1, but which is difficult to forge or guess. If we verify this value before handling the request further, an attempt at the timing attack described earlier could instead have resulted in measurements like the following.

ID rangeMean response time (ms)
18–7 36331
7 364–100 00032
Table with mean measured response times for different IDs after only processing requests with a valid MAC. Note that mean response times are divided into two categories this time. For IDs the attacker has legitimate access to, we see similar, slightly higher, response times of around 60ms. For all other IDs (where an invalid MAC was provided), the response time is of around 32ms.

Note that in this case, an attacker does not gain much information besides what they already know (i.e., the representation of resources 11 and 17).

Encrypted IDs #

While a MAC or signature can go to great lengths towards preventing enumeration and timing attacks to learn about valid IDs, the issue arising from the German tank problem, namely, making inferences from observable IDs, remains. In order to mitigate this, we can use encryption to make internal representation of the ID itself opaque to external parties.

Indeed, we can use various authenticated encryption schemes (e.g., AES-GCM) to hand out users IDs which are both opaque (they do not allow to infer information about their internal value, structure or representation) and unforgeable (IDs not generated by us can be detected and rejected).

However, there are some potential downsides to using this method. One issue is that the encrypted IDs are not human-readable, so it may be difficult for developers to work with these IDs directly. This can make debugging and troubleshooting more challenging.

Another potential issue is that the overhead involved in encrypting and decrypting IDs can be significant both in terms of computing time and space. This can be a concern particularly in applications where a large number of IDs need to be generated and transmitted or in certain resource-constrained applications.

Example solution in TypeScript #

We have developed a small library, @exact-realty/safeid, that implements the techiques discussed here applied to UUIDs, with the possibility of extending it to other arbitrary ID formats.

Approach taken #

@exact-realty/safeid uses AES-GCM to produce IDs that are opaque, unforgeable and stable (i.e., the same internal ID will always result in the same encrypted ID).

To do this, first we provide the library with a secret key that will be used to derive other cryptographic IDs.

Then, before providing an ID to a user, we encrypt the internal representation. This is a multiple-step process:

  1. We derive an HMAC key from the supplied secret key (this step is carried out before any encryption or decryption operation).
  2. We use the HMAC key to produce the an IV to be used for the following encryption step, taking the plaintext internal ID as input to the HMAC function. This is the step that ensures that the resulting output is stable.
  3. We derive an AES256-GCM encryption key from the supplied secret key. In this implementation, we additionally use the IV derived in the previous step an an input to this process, which results in a different encryption key for each ID.
  4. We proceed to encrypt the supplied internal ID, using the derived IVs and encryption keys.
  5. We prepend the IV to the resulting encrypted data and encode it using the base64url encoding and return this as a result (which for UUIDs results in a 48-byte string).

For decryption, the steps are as follows:

  1. We derive an HMAC key from the supplied secret key (this step is carried out before any encryption or decryption operation).
  2. We decode the encrypted ID and split it into two parts: the IV and the encrypted data.
  3. We derive an AES256-GCM decryption key from the supplied secret key. In this implementation, we additionally use the IV, which results in a different decryption key for each ID.
  4. We decrypt the encrypted ID to obtain the plaintext internal ID.
  5. With this information, we derive the IV from the plaintext and verify that the IV in the input that we obtained earlier matches what we expect it to be (i.e., the IV that the encryption function would have derived). This step is a sanity check to ensure that the ID has not been tampered with.
  6. We return the plaintext internal ID, which our application can now use for accessing resources.