
by Yasuhiro Tai
Online
applications such as Facebook and Twitter, paired with smart phones, tablets
and other mobile devices have given us everywhere access to our content and
content sharing capabilities. They have also enabled wonderfully convenient
solutions to help us communicate and stay connected whenever and wherever we
decide. But they have simultaneously created a multitude of nightmarish issues
for the IT managers that are responsible for the care and feeding of these data
monsters.
Because of
appealing characteristics such as low cost/GB and anytime/anywhere access, the
internet is being used more and more to store data and run applications,
replacing traditional tangible servers with private, hybrid and public cloud
computing. As a result of this trend, Gartner has predicted that more than
one-third of global digital content will ultimately be stored in the cloud by
2016.
Key Benefits and Problems with the Cloud
There are
many solid reasons to consider implementing a cloud-based data archive,
including the ability to pay only for what you need to use, rather than being
forced to upgrade in large capacity jumps as you add hardware to the overall
archival system. In addition, because the archive is stored and maintained
outside of the organization, fewer in-house IT staff are required and other
associated data warehouse costs (including real estate, electricity, A/C) are
reduced. Finally, implementing cloud-based data archive services provides
authorized personnel with remote access to stored data and applications,
regardless of where they are working. However, there is a downside. Some key
problem areas to consider before porting over all of your organization’s
valuable data resources to the clouds include availability, security,
performance and compliance.
Outages
happen, everywhere. However, the effect of a power outage on a large cloud
service company can impact hundreds of businesses and thousands of people.
Although Amazon Web Services has solid contingency plans and a superior staff,
even their services were interrupted four times in the 14 months spanning from
April 2011 to June 2012 raising the question: If 100 percent cloud access isn’t
possible, then what percentage of downtime can you afford? The June 15th
outage was traced to the failure of a generator cooling fan while the facility
was on emergency power following a series of failures. It’s no secret that the
power requirements of these behemoth facilities are already over-taxing
available power supplies. And, as our data requirements grow, power
sustainability (and therefore data availability) becomes a key issue that must
be addressed.
Security is another ongoing concern. When using a public cloud service, companies must
balance the competing factors of control, visibility and cost. There are services
that offer private resources for added data security (ensuring only your data
is stored to a section of hardware rather than having hardware space shared
with another client); but even these services are not completely secure. In
many cases, the hardware being designated to the client is not new, and
therefore, not completely scrubbed clean. Often, the hardware has been previously
used as real servers, or hosts for as many as 16 virtual servers at a time,
meaning each of those servers, and each client who passed through the system,
could leave sensitive data behind and available for accidental (or even
malicious) recovery.
Before you think, “not my problem,” what happens to
“your” hardware when your data gets ported over to new hardware or you change
cloud services? If Company XYZ’s accounting information was left behind when
you took over that private cloud server, what sensitive data are you leaving? And
before you decide to just encrypt
the data you store to a cloud provider, make sure your provider can handle it. In
many cases, encrypting data can cause even more problems, showing up as garbage
data rather good, and therefore not properly managed by the provider.
It’s
also critical to realistically consider your organization’s performance
requirements. Utilizing cloud services ultimately means handing over a lot of
control, and when a performance issue arises, you won’t be able to troubleshoot
and find server-related root causes, you will have to wait. As is true with any network, the lack
of proper resources inevitably leads to poor application performance. Meager
performance is generally the result of an application architecture that does
not properly distribute its processes across available cloud resources. Performance is also impacted by limited bandwidth, disk
space, memory and CPU cycles as well as latency caused by poor network
connections.
Cloud
compliance has its own set of concerns. Beyond privacy and industry compliance
(HIPAA, FERPA, PCI-DSS, FIPS and the like), there are geographic compliance
issues as well. To keep costs down, many Cloud Service Providers (CSPs)
maintain their public cloud systems on international soil, where the rent and
other associated costs are cheaper. However, regulations
that ensure privacy in one country, often conflict with regulations requiring
disclosure in another. Before
contracts and SLAs are signed, know where your data and processing will occur
and ensure that you are doing everything you can to meet all of the regulations
in those locations to reduce future setbacks.
Exploding Data, Imploding Budgets
The volume
of data being created is doubling every two years. That’s not news. However,
the fact that the digital universe is expected to reach 7.9 ZB in 2015 (from
1.8 ZB in 2011) is worth consideration. And, while the explosive growth in digital
information continues to demand more efficient archive strategies, reduced IT
budgets is another solid reason organizations are considering cloud services.
Nevertheless, while it may be possible to provide the data migration,
air-conditioning and power necessary to support data volumes today, it will not
be possible in 2015 unless there are some dramatic changes made.
For a
little over a decade, the cost of storing data has always been reviewed,
compared and considered in terms of hardware and cost per Gigabyte. However, as
data volumes have grown, it has become increasingly more obvious that the real
cost isn’t capacity, it’s operational. Total cost of ownership (TCO) estimates
must include expenses such as the cost of electricity to run the hardware, the
cost (and environmental emissions) associated with cooling the warehouse-sized
rooms full of servers necessary to store today’s data volumes, and the
personnel costs associated with long-term storage (maintenance, management,
replication, backup, hardware replacement and data migration).
While it
is true that migrating to a CSP can take many of these burdens off your
corporate shoulders, it doesn’t solve the problem. As mentioned earlier,
overheated servers (whether yours or in the cloud) will go down, taking data
access with them. Current power and cooling availability are at capacity
limits, but our data continues to increase. Consequently, it’s time to take a
long, hard look at how we use our data, and how much of that data we really need
to access instantly.
When researchers discovered that YouTube viewers decide
within the first 10 seconds whether or not they will watch a video in its
entirety, it not only changed how marketers organized their content, it changed
mobile cache settings from 30 seconds to 15, saving time, capacity and
ultimately, money. The same is true for our stored data. We think we need to
have instant access to every byte; but the truth is, at least 80 percent of an
organization’s stored data will seldom (if ever) be accessed. Of course, for
compliance, all data must be stored and protected, while still maintaining
reasonably fast recovery; but in most cases, an organization has hours and even
days to produce files from long-term data stores. This 80 percent is ideally
suited for data archive to a secure, scalable, low-cost (hardware and
operational) solution.
Blu-ray – A Solution that Fits the Complete
Archiving Picture
Today’s
requirements for long-term data archives are about more than simply where to put
the data. Considerations include assessment of accessibility requirements. Defining when data needs to be online
and searchable and whether or not standard access protocols are adequate, given
the data being archived. Cost is another strong determinant; however, depending
on the storage solution, total cost of ownership (TCO) can have a long list of
line items to tally. Ideally, the solution you choose will eliminate
backups/migrations and their associated costs, be easy to manage, energy
efficient, and of course, have a low TCO.
In
comparing 100TB of the top three data archive technologies (traditional hard
disk drives (HDD), tape and optical (BD)), over a projected 20-year archival
plan, BD emerges as the stronger contender with fewer compromises.
Power
consumption requirements per HDD hover between 1000-2000W + A/C, making it easy
to see why CSPs and large data centers are struggling to keep systems up and
running. The power requirements of HDD are as much as 25 x more than BD, in a large
part due to the fact that BD can be powered down when data is not in use. In
addition, BD drives do not require 24 x 7 electricity or A/C to keep the
hardware cool as is necessary with both HDD and tape technologies (which must
be constantly on and running whether data is being accessed or not). As a
result, the annual saving in electricity over 20 years can be in the range of
millions of dollars.
Because long-term data
archiving is hardly a “set it and forget it” strategy, hardware longevity is another
significant consideration. Over time, media and drives fail; and consequently,
preventative data
loss measures include preemptive replacement of both. The more frequently media
and drives have to be replaced, the higher the cost, not only in hardware, but
in human resources for integration, data management and migration, in addition
to the higher the risk of data loss during each migration. Although
specifications indicate HDD and tape should be replaced every 3 and 7 years
respectively, most IT specialists replace their drives every 2 and 5 years. By
comparison, BD drives can safely archive data for 30 years or more without
needing replacement and without generating mountains of technology trash. It
also frees the IT staff for bigger corporate challenges than merely swapping
drives.
For many
organizations and CSPs alike, moving to a RAID/Blu-ray hybrid archive is the
ideal answer to harmoniously balancing data access and storage requirements
with conservative budgets and environmental conscience. Not only can such a
hybrid ensure fast access to younger (expected to be accessed more frequently)
archived data, it still enables organizations and CSPs to take advantage of as
much as a 40 percent reduction in both power consumption and CO2
emissions over a standard RAID solution – important considerations for those
struggling to meet capacity, access and EPA requirements.
As data
mining continues to mature, offering potentially
major benefits in virtually every industry, the yield here too is vast amounts of data. In some cases, there may not be
time or opportunity to fully glean every bit of data gold. In other cases, a
quantity of like data may be necessary to adequately establish a conclusion or
trend. However, the economy of a RAID/Blu-ray hybrid archive means the
information can be efficiently and cost-effectively retained, mined, reviewed
and retained again, indefinitely either privately or in a cloud somewhere.
Online
applications, the integration of personal mobile devices into the workplace,
and compliance regulations will continue to ensure healthy growth of the data
monster. However, data archive solutions that include Blu-ray can help to wean
the monster off of its insatiable demands on the local power grid and reverse
the overall impact on the environment for a more sustainable ecological and
economic future, and a greener cloud.
Yasuhiro Tai is General Manager, Special Projects at AVC
Networks Company, Panasonic Corporation.