Jon Collins, originally published on The Register
In the spirit of calling a spade a spade, it is fair to say that computer storage is generally perceived to be quite dull – in Douglas Adams terms it would qualify as ‘mostly harmless’.
While this is a bit of a shame for people who look after storage, backups and so on (let’s face it, the job description is never going to break the ice at parties), it also sets expectations: storage should just work without needing too much intervention.
This is more of a challenge than some might think, not least because disk technologies are still largely mechanical. The rest of IT may have long since succumbed to the age of silicon, but storage remains the last bastion of the Victorian age. It actually, really could still be steam-powered. From an engineering standpoint this leads to some quite fascinating discussions – for example that the main limiting factor in physical disk size is the motor.
The downside is reliability. Disk failure is a common theme in most IT environments, and indeed some common storage technologies (RAID for example) exist largely to counter the fact that disks can, and will crash without warning.
Even when storage ‘just works’, it is has a number of hurdles to overcome. First and foremost comes data growth. When we conducted a server infrastructure survey last year, data growth came up as number one driver for updating the server estate, never mind the storage! Data growth is relentless, and dealing with it isn’t made easier by the fact that few organisations are blessed with an up to date, well-managed storage environment.
We can all blame the technology of course, but data duplication and fragmentation remains common themes, sustained by many organisations having a ‘keep everything’ policy when it comes to electronic information. As well as being most likely illegal, this puts additional burden on the storage infrastructure, not to mention the people and processes which need to work with it.
Perhaps the outside-in view isn’t all that wrong when we think about the service storage needs to provide. Firstly, its role is to deliver data to applications and users consistently and efficiently: that is, as and when needed, at the required levels of performance (measured in IOPs), at an appropriate cost.
Second, storage should also be able to recover from failure situations. It is one thing when things are going right; quite another if things go wrong. Here we can think about backup and recovery as well as the ability to replicate between storage arrays, and indeed across sites.
Finally storage needs to be manageable in a way that suits the people trying to manage it. This is not just about having visibility on what storage exists, but also to respond to changing conditions and changing requirements, preferably as automatically as possible.
Some ‘information assets’ are more equal than others
Responding to these needs requires more than buying in a bunch of disks. For a start, storage arrays tend to fall into one of two categories – “high-end” for very expensive, high-performance disks, and “mid-range” for the rest (you don’t hear of “low-end” storage arrays). And, more recently, a third category of disk has emerged, namely solid state disk (SSD). While SSDs might suggest the beginning of the end for spinning disks, they are currently still more expensive than disks – they do have performance characteristics exceeding even the fastest high-end disks, however.
Given these options, deciding what data should go where is quite a skill, particularly as data characteristics change over time. A well-managed tiered storage set-up would match the ratio of high and lower cost storage in use, with the requirements of the data at any given time.
An additional dimension is to ensure appropriate storage and data availability. These are generally achieved by placing a copy of the information in some other, preferably safe, place either online (for example, via a second array to which all data is replicated) or offline (for example on tape, in a fire-proof safe or at an off-site storage facility).
Deciding between all the options and coming up with an appropriately architected storage environment is not trivial. Neither are things standing still – not only are storage technologies evolving, but so too are other areas of IT, upon which storage depends. It’s worth homing in on a few developments, to illustrate the point.
Not least of course, we have virtualisation. Server virtualisation may be the buzz-phrase right now, but its growth places new demands on how storage is built and delivered; the ease at which new servers and whole systems can be provisioned increases the risk of storage bottlenecks, as does the accompanying fluctuation in demand.
To counter this, we have of course storage virtualisation – which enables storage resources to be treated as a single pool, and then provisioned as appropriate. The phrase in vogue at the moment is ‘thin provisioning’, in which a server or application may think it has been allocated a certain disk volume, but in fact the storage array only allocates the physical storage required up to the specified maximum (which may never be reached). This makes for a far more efficient use of storage.
Speaking of efficiency, another trendy term is de-duplication, in which only variations in files or disk blocks are retained, transferred, backed up or whatever, rather than – ahem – duplicating everything. For the non-initiated, this one does sound a bit of a no-brainer – but the fact is that de-duplication can get quite complicated. An index needs to be maintained of everything that is being stored or backed up, so that files can be ‘reconstructed’ as necessary for example.
Other storage related developments worthy of note are on the networking side with iSCSI (bringing together block and file storage, aka storage for databases and unstructured content respectively), and at the higher end, the merging of data and storage networking using 10 Gigabit Ethernet. Meanwhile, in storage software, ‘merger talks’ continue between backup and archiving (they’re both about data movement after all).
Taking an architectural view of storage
All such developments share a common theme, that of convergence: with a following wind and from an interoperability perspective at least, things should get a bit simpler. But making the most of them still requires an architectural overview of the storage environment as a whole.
If you haven’t got one, where should you start? As we have already acknowledged, few organisations have the luxury of starting from scratch. But if you can build a clear picture of your storage requirements (remembering not all information is created equal – the 80/20 rule can be used here), together with a map of what capabilities you already have in place, you’re already halfway there.
From here, the next step is to produce the map of how you’d like your storage capabilities to look, based on your information needs and your broader infrastructure plans, for example vis-à-vis any intentions you might have around virtualisation. Given the plethora of options, each of which comes at a cost, storage planning will always be a compromise between functionality and affordability.
As a result it is worth thinking about how storage technologies might be used in tandem. For example, while it still isn’t cheap (though it’s getting cheaper), implementing de-duplication might provide immediate savings in terms of bandwidth and latency reduction. However, by considering its impact a little more broadly, the bandwidth savings may now allow (for example) data replication to another site, making disaster recovery possible whereas in the past it was not.
Storage then, is like a team of Sherpas; it does the heavy lifting so the rest of IT can make its way up the mountain without having to worry about the provisions. But it needs to be designed for the long haul, if it is to deliver on the value it promises. This may be difficult to do. But it is anything but boring.