Andrew Buss, originally published on CIO
We know from our many studies that most IT organisations are stretched operationally while having to balance a whole bunch of competing priorities to balance. One of the major challenges that consistently comes top of the list is coping with the explosion of information, data and content that has happened over the past few years.
Buying more and more storage is not the best long-term solution, although many take this approach simply to keep up with business demand. Archiving can help by removing long term but rarely used data out of the active storage pool. However, archiving is often misunderstood, frequently confused with being part of a backup plan, and all too rarely used. Another drawback is that it works best with an understanding of the data being stored, which few companies have got to grips with.
Another approach has come to prominence over the past couple of years, and which is now being heavily promoted by the community of storage hardware and software vendors, is data deduplication, or dedupe for short.
The aim of dedupe technology – in a very simplified way – is to look for common, redundant patterns of bits, and to ‘collapse’ the number of times the data has to be stored, but still recreate accurately the complete data of the original files. This is called ‘dehydrating’ the data. In order to access and use the data, it needs to be first ‘rehydrated’ to give it its proper form.
The end result is that implementing dedupe technologies can help to significantly reduce the physical storage requirements and therefore reduce the short-term need to invest in more disks or arrays. By implementing dedupe, the increase in efficiency of storage can be great, and the potential savings too big to ignore.
On the other hand, we’ve seen the security environment changing too. We can no longer trust that our internal systems are protected by perimeter security. Attacks are now frequently penetrating into the datacenter and information is being accessed and siphoned off even in the most security conscious company.
We’ve long been vocal advocates of the need keep data secure by moving beyond relying on physical security and basic Access Control Lists (or ACLs) by using technologies such as Data Leakage Prevention [DLP] and encryption.
But there is a fundamental conflict that arises if dedupe and encryption are pushed together as they are not the best of bedfellows – dedupe relies upon being able to readily identify common patterns in data in order to reduce it. Encryption depends on removing predictable and repetitive patterns from data. For many companies, the cost savings of dedupe win out over the increased protection afforded by encryption, and thus the possibilities of encryption may be overlooked.
But this is not to say that encryption cannot work with dedupe, but it does take a bit more thought and care to work out how or where to encrypt data. If the data is encrypted low down in the application protocol stack, it can sit underneath the dedupe engine. The dedupe engine sees the data in unencrypted form for processing.
The data is then secure in storage, or ‘at rest’ as it is commonly known.
This means that the data is then also accessible to other applications, and transmission of sensitive data needs to be protected with encryption, such as TLS or similar protocols. Additional controls are also necessary ensure only authorized applications or users are able to access the data.
If encryption is done above the dedupe engine at a higher layer, such as middleware or the application, then the data is highly secure in use, transit and rest. However, it does mean that the dedupe engine will not be able to work its magic and any data encrypted here will not be able to be dehydrated in storage. For this reason, encryption in the application needs to be kept to a reasonable minimum, or else the investment in dedupe will not be realized.
This upshot of this is getting to work dedupe and encryption together will require understanding the range of different applications and data, and separating out the sensitive information that needs to be protected in the application. This can be done manually through an audit or compliance procedure such as PCI DSS, or using some of the range of automated tools for data classification discovery.