Fast & Reliable Cloud Backups

For MySQL, MongoDB, Linux, Unix

Get Started

Dec 19, 2014

Tame the Big-Data Deluge by Devising a Metadata Strategy

Posted by Gen

feature photo

Ride the crest of the big-data wave by using metadata management as your surfboard. As more of your organization's information assets become untethered from relational databases, you'll rely increasingly on metadata to classify, qualify, and otherwise manage today's diverse data resources.

"Metadata" is one of those terms that appears to have as many meanings as there are people using the word. The standard definition of metadata is "data about data." That's like defining a tree as "wood with leaves."

A slightly better definition of metadata comes from Mika Javanainen's November 5, 2014, article on TechRadar: "The attributes, properties and tags that describe and classify information." This may include the data type (text document, image, Javascript, etc.), creation date, author, or workflow state.

Like many definitions, this one fails to communicate the importance of metadata to the task of organizing and managing massive data stores comprised of diverse elements that relate and interact in ways that are often unpredictable. As Javanainen points out, metadata's most important role may be as a bridge between diverse information residing in organizations: CRM, ERP, and other siloed databases housing both structured and unstructured data.

Javanainen recommends creating metadata templates for employees in the organization to standardize on, such as ones for proposals, contracts, invoices, and product information. This allows metadata attributes to be applied automatically and consistently to data at the point of ingestion.

Managing the transition from structured RDBMSs to unstructured big data

As Ventana Research's Mark Smith points out in a November 12, 2014, article on the Smart Data Collective site, most big data in organizations resides in conventional relational databases (76 percent, according to the company's research), followed by flat files (61 percent) and data-warehouse appliances (46 percent).

However, when enterprise data managers were asked which tools they plan to use for their future big-data tasks, 46 percent named in-memory databases, 44 percent cited Hadoop, 43 percent named specialized databases, and 42 percent plan to adopt NoSQL.


Companies intend to use a mixed bag of technologies as they begin to implement their big-data strategies. Source: Ventana Research

The companies surveyed by Ventana Research identified metadata management as the single most important aspect of their big-data integration plans (58 percent), followed by joining disparate data sources (56 percent) and establishing rules for processing and routing data (56 percent).

A new company named Primary Data intends to help organizations realize the full value of their metadata resources. Forbes' Tom Coughlin describes the company's unique approach in a November 26, 2014, article.

The Primary Data platform uses data virtualization to create a single global namespace that can be used to manage direct attached, network attached, and both public and private cloud storage. To improve performance and efficiency, content metadata is stored on fast flash-based storage servers, while the data the metadata refers to is housed on lower-cost (and slower) hard disk drives.


Primary Data's metadata server creates a logical abstraction of physical storage that automates data movement and placement via an intelligent policy engine. Source: Storage Newsletter

The BitCan cloud storage service gives your big-data plans a jumpstart by providing a simple point-and-click interface for managing heterogeneous MySQL and MongoDB databases, as well as Unix/Linux/Windows system and file backups. You setup and schedule your backups in just seconds, receive alerts about the status of your backups, and use the same console to recover and restore databases and files.

BitCan encrypts your data at the communication and storage layers, and all your backups are stored indefinitely on Amazon S3 servers. Visit the BitCan site to create a free 30-day trial account.

Fast & Reliable Cloud Backups

For MySQL, MongoDB, Linux, Unix

Get Started