Fast & Reliable Cloud Backups

For MySQL, MongoDB, Linux, Unix

Get Started

Nov 21, 2014

New Techniques for Protecting Against Cloud Data Loss

Categories IT Best Practices

Posted by Gen

feature photo

TL;DR: Today's distributed data networks mean data spends more time on the move at the application and server layers than it does in static storage repositories. This trend requires that security likewise move to the periphery of the network -- which increasingly is in the cloud. New approaches to data security protect information in transit and on the edge of your widely dispersed networks.

Your organization's data is dynamic: it's constantly on the move, and constantly changing. The task of "storing" it securely is equally dynamic. What does it mean to "lose" data in an age when information resides increasingly in applications and on servers at the edge of the network?

In a November 3, 2014, article, the Register's Adrian Bridgwater distinguishes legacy data that needs to be stored for compliance or regulatory purposes from the "live frontline data stream" representing the transactional, real-time operations that drive the organization.

Bridgwater points out that cloud storage services generally apply more layers of security than are used by IT departments, yet internal data stores are subject to as much risk as data maintained outside the company's facilities. These cloud-security tools include intrusion detection systems, vulnerability detection, web application firewalls, and log management.

Encryption is a key component of any modern data-security plan, yet encrypting data requires more processing power. As companies focus on making their data networks as efficient as possible, they need a way to implement encryption without slowing things down.

Today's software and hardware are designed to overcome the performance hit on the processors that have to encrypt and decrypt the data. For example, Intel's Advanced Encryption Standard New Instructions (AES-NI) in its Xeon and Core processors are said to deliver a ten-fold increase in performance over software-only encryption. ZDNet's Ram Lakshminarayanan describes the seven new instructions in a November 12, 2014, article.

When sensitive data such as passwords, credit cards, and Social Security numbers are decrypted by an application on a server, the decrypted data is stored in server memory. In virtual, multi-tenant environments, virtual machine isolation and Intel Xeon's Trusted Execution Technology prevent third parties from accessing this information.

New replication technique protects data without bogging down performance

A popular method for preventing data loss in data-storage systems is random replication, which partitions data across multiple servers and randomly assigns data to replicas to provide fault tolerance. A shortcoming inherent to random replication in large-cluster environments is that failure of a small number of clusters inevitably causes permanent data loss.

Stanford University researchers led by Asaf Cidon have developed a new technique called Copyset Replication that the researchers claim reduces the occurrences of data loss when some clusters fail.

The researchers found that once a cluster exceeds 300 nodes, a power outage is almost guaranteed to cause permanent loss of some data. Yahoo, LinkedIn, and Facebook are among the companies that have suffered this type of data loss, which is expensive to remedy because you either have to find the lost data in a backup or recompute the data sets holding the data.

When a data cluster grows to more than 300 nodes, a power outage affecting a small number of clusters is almost certain to cause permanent data loss. Source: Asaf Cidon, et al., Stanford University

Copyset Replication offers the same performance benefits of random replication but protects against data loss by splitting nodes into copysets, or sets of R nodes. Replicas of a single "chunk" of data are stored only in one copyset, so data is lost only when an outage affects all nodes of a single copyset.

The researchers give the example of a system with nine nodes split into three copysets: {1, 2, 3}, {4, 5, 6}, {7, 8, 9}. With copyset replication, data would be lost only if all three nodes in a particular copyset failed at the same time. Conversely, a simultaneous failure of any three nodes in a random-replication system would cause data loss because any combination of three nodes constitutes a copyset.

Data-loss safeguards are at the forefront of the BitCan cloud storage service. All connections to databases and servers are encrypted via SSH, and data in the service's storage layer is protected by 256-bit encryption. Backups of your MySQL, MongoDB, Unix, and Linux systems and files are created automatically on the schedule you set through BitCan's user-friendly interface. Visit the BitCan site to sign up for a free 30-day trial account.

Fast & Reliable Cloud Backups

For MySQL, MongoDB, Linux, Unix

Get Started

Categories IT Best Practices