Format Preserving Encryption, named FPE from here, is a particular form of encryption with a constraint of preserving the initial format. In other words, the output should keep the same format as the input. The format of data can be defined by a charset (named the domain in the article below) and a length. Here are some examples:
- A 16-digit card number in a 16-digit number.
- A 12 Hexadecimal digit mac address in a 12 Hex digit number.
- A mail address to another mail address.
FPE versus common cipher
In a common block cipher, the ciphered data result will mainly imply having a different output format. For example, ciphering with AES a credit card number will result in one 128-bit block of data (encoded in Hexadecimal or Base64 string). In comparison with FPE, you will have the same charset and length.
Encrypted credit card number, with AES-128:
Encrypted credit card number, with the FF1 algorithm:
Result: 8954 3622 7045 9364
This quick analogy highlights the difference of these types of encryption. In addition, with AES if you set a random Initialization Vector (IV) then you will get a different result with the same input, which can be annoying, in case you want to keep the logic in the key and id.
How can FPE be useful?
FPE encryption is mainly used when you have an existing application with specific data format required, and you want to add encryption on this data. As seen above, when you use a classic encryption method, you will have a completely different output format. It could be annoying in context in which you need to display the value or make value comparisons.
An example of context: “You can’t make a mock database; however, you want to share a database with a provider in order to allow him to perform functional tests.”
With a classic encryption like AES, it will be a collection of hexadecimal values everywhere that will be difficult to leverage for functional testing purposes. Therefore, in such case, FPE can be useful because it will keep the format and the domain.
How does FPE work?
FPE is based on a random permutation, so, a truly random permutation is theoretically the ideal FPE cipher. However, it is not suitable for large applications to generate and store a truly random permutation. Therefore, FPE algorithms aim to make a pseudo-random permutation from a Key. In addition, a good FPE algorithm should not be distinguishable from a truly random permutation.
From now, I will focus on the FPE algorithms named FF1 and FF3-1 because it was approved by the NIST and they are among the most used and documented algorithms. FF1 and FF3-1 use a Feistel network. This function has some advantages like having the same output size as the input data, and it guarantees to be invertible even if the round function (F, see schema below) is not itself invertible.
In FF1 and FF3 algorithms, the Feistel cipher operations are made in the same radix (base) as the domain. For instance, if the domain is 26 all the operations are made on the radix 26. This is how the FF1 and FF3 manage to keep the same output domain as the input one.
A Feistel network is a function who take in input a data block and a key:
- The data part (in green) is split into two parts (L0 and R0).
- The key (K below) is used as input in F function, which is a block cipher, more often AES 128.
In our case the F function is an AES 128 algorithm used to cipher the second part of the data with the key, in fact, it is used as a pseudo-random function. The result is used to modify the other part of the string with a modular addition. Finally, it swaps the two parts for the next round. Then do ten rounds for FF1 and eight for FF3-1. That’s it you get cyphered data!
Why should you be careful with FPE?
Format Preserving Encryption is not a common encryption method, so, we do not have so much feedback like for other classic encryption algorithms. The direct consequence is that it does not benefit from the same enthusiasm from researchers. Remember that FPE is based on a random permutation, thus, in case of small domain it will be more vulnerable to guessing attacks and analytic attacks.
It is the reason why FF1 and FF3-1 are approved by this NIST only in a range of use cases. According to the NIST the domain required for a secure utilization of FF1 and FF3-1 is at least 1 million words. It is also recommended to use a tweak when it’s feasible, it’s like an initialization vector or salt.