After examining the data stored in one of our high-throughput systems, we realized it might include sensitive user data. To reduce the number of people that are technically able to access the data and reduce the risks associated with a potential data theft, we decided to encrypt certain database fields.
Encrypt sensitive fields without any downtime.
While we wanted to encrypt existing rows, new rows were constantly being added. In order to achieve zero downtown we have devised the following path:
- Add new encrypted fields to the database.
- Start populating new encrypted fields with encrypted values while still populating un-encrypted fields.
- In the background, migrate existing records by populating their encrypted fields.
- Refactor to use encrypted fields.
- Drop un-encrypted fields.
Each step above was relatively simple and was tested properly before moving to the next step.
Choosing an Encryption Library
While all these libraries were reliable, we used symmetric-encryption, based on its robust documentation, ease of use and easy integration with other libraries (in our case
ActiveRecord). It provides some useful Rake tasks for configurations inside/outside of Heroku. Overall
symmetric-encryption seemed really 🔒 .
config/symmetric-encryption.yml file is used to define what algorithm to use and where to find the related keys for different environments.
Symmetric Encryption uses OpenSSL to encrypt and decrypt the data which means we are able to use any of the algorithms supported by OpenSSL. We used
aes-256-cbc which is also the recommended default algorithm.
In order to create a new set of keys:
Above command will create an encryption key and an Initialization Vector (IV). Generated key must not be committed into source code. Depending on how your application is deployed, there are two approaches for storing this key. In both scenarios encryption keys are encrypted before storing on file/environment variable. Secret used for encrypting the encryption key itself can be committed into source code.
To access sensitive data, a malicious party would require access to:
- The database,
- Our source code,
- Encryption keys from files or configuration
Outside of Heroku
Key can be stored in a file on disk outside of source code. We can use
key_filename in configuration
yml to point to this file. In this scenario we would use the operating system to limit access to key file.
Here is a sample configuration file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Since the filesystem on Heroku is ephemeral,
symmetric-encryption suggests to set encryption key as an environment variable. Configuration is same as above except we replace
encrypted_key: "<%= ENV['PRODUCTION_ENCRYPTION_KEY1'] %>".
You can use the following rake task to generate a Heroku-specific configuration file:
This creates a
config/symmetric-encryption.yml file and also outputs commands you can run to set the encrypted encryption key on Heroku as an environment variable.
Symmetric Encryption provides a seamless integration with
ActiveRecord. We can use
attr_encrypted helper method to define encrypted fields. Let's say we wanted to encrypt a
Note model that has
subject. You can add the following to your
1 2 3
This means whenever you set
note for this model
symmetric-encryption will set
encrypted_note field in the database. Whenever you retrieve an instance of this model,
symmetric-encryption will decrypt
encrypted_note field and you can access decrypted value by just accessing
In our case we couldn't use this helper immediately. Using encrypted_attr would prevent us from directly accessing the existing, un-encrypted fields in our database (which is necessary through step 3 in our approach). To work around this, we started by adding a
before_validation callback to our model to set encrypted fields based on un-encrypted ones.
1 2 3 4 5 6 7
In the above code
SymmetricEncryption.encrypt(note, true, true, :string) means encrypt
note field, use random IV(Initialization Vector), compress the string and also use string when decrypting.
Once we got to the 4th step and stopped populating/reading un-encrypted fields we can easily switch above to
1 2 3
Query encrypted fields
Generally when we encrypt a field we can't do a partial query on the content of that field. On the other hand if we use the same IV each time we encrypt a value, we can do an exact match query. Using same IV means encrypting the same value always end up with the same encrypted string. If exact match query is not something you need, the recommended approach is to use random IV for each encryption.
With this approach we were able to encrypt a database with ~1.5 million rows without any downtime in about a week.