Hadoop: The Biggest Catalyst of Change!

Boni Bruno, Chief Solutions Architect, Dell EMC

Boni Bruno, Chief Solutions Architect, Dell EMC

Data and analytics are taking center stage as the single most powerful catalyst for change in the enterprise. To maximize the impact of Business Intelligence, CIOs have to build and execute an effective, holistic data and analytics strategy. As strategies develop, CIOs will inevitably have to take a closer look at their information management programs and platforms to ensure support for larger data sets, real-time data analytics, and an increasing use of machine-learning algorithms to maintain a competitive edge in the market. In many cases, Hadoop provides the perfect framework to meet the growing demands CIOs are being faced with in their perspective companies.

  ​With Hortonworks Data Flow or IBM Streams as two examples, you can add and adjust data sources as needed to your Hadoop cluster   

At Dell EMC (Dell EMC is just one business unit of Dell Technologies), we have helped over 2000 large enterprises implement big data and analytics solutions using Hadoop in production environments. Looking at Hortonworks Revenue Report for 2016 as one indicator of Hadoop growth and adoption, the company reported its best year yet at $184.5 million for 2016 with $52 million in Fourth quarter revenue. Cloudera does not release details of its financials, but between Hortonworks and Cloudera alone, Dell EMC has sold more than $200 million in infrastructure equipment in 2016 specifically for Hadoop deployments, this excludes revenue from analytics deals obtained with Tick Data Analytics, SAS, Splunk, and others. This effort has led to Dell EMC winning Hortonworks recently announced Global Strategic Alliance of the Year Award. Bottom line—Hadoop growth and adoption is significant and a key focus area for Dell EMC and its many customers.

So what’s driving Hadoop growth and adoption? From my perspective, there are three key drivers for adopting Hadoop : Firstly, Enterprise Data Warehouse Optimization, secondly, Streaming Analytics and IoT, and then we have Attack/Threat Detection.  

Growth and Adoption of Hadoop

Enterprise Data Warehouse (EDW) exists in many enterprises. As datasets increase, many CIOs are finding the capacities of their EDW systems being exhausted—load processing times are too long, SLA’s are not being met, and ultimately the delivery of critical business intelligence is impacted. By moving resource intensive ETL processes to Hadoop, CIOs can free up valuable CPU cycles on their EDW system and improve performance. You can save money by moving cold data off to Hadoop freeing up capacity and lowering licensing costs. In fact, cold data in Hadoop can be mined for additional business insights when combined with other data not available in the EDWsystem, e.g. analyzing cold data with external system logs, social media, security data, etc. Optimizing your EDW with Hadoop offers CIOs cost reductions, improved reporting, and support for more types of unstructured data.

The Internet of Things has opened doors for CIOs to be able to take real-time actions using streaming analytics. With Hortonworks Data Flow or IBM Streams as two examples, you can add and adjust data sources as needed to your Hadoop cluster, trace and audit a data path, and dynamically adjust data pipelines with your available bandwidth. The key is to Explore, Optimize, and Transform. Explore your data: payment tracking, pricing, consumer feedback, shrinkage analysis, customer behavior, etc. Optimize your supply chain, customer support, inventory control, vendor score cards, and more. Transform your business: automate inventory predictions, proactive staffing, improve target offers, and enhance various other business process using predictive analytics.

Security analytics and threat detection is a growing use case for Hadoop. Using machine learning algorithms and data analytics on Netflow streams, log streams, packet streams, and stored data, companies can identify complex threat vectors and proactively remediate attacks. From fraud detection to data theft, Hadoop offers the perfect platform to process a full stack of telemetry data, enable advance correlation, and provide a single view into advance threats. Check out the Apache Metron Project for further insights. Some interesting commercial security solutions to look at are Niara (recently acquired by HPE) and Securonix.

Interesting Applications on the Hadoop Framework

I’m glad to report that many CIOs are now seeing the fruits of their labor with Hadoop. As pilot projects moved into production and new business insights are being observed, it has become much easier for CIOs to land and expand Hadoop into other parts of the organization, i.e. it’s no longer a science project but a real world business solution.

I would like to close by mentioning some cool Hadoop products I’ve recently worked with that may be of interest to the readers:

1. Pivotal HDB: Fast Native Hadoop SQL Database with integrated machine learning.
2. Syncsort: Great data integration tool for Hadoop. Supports Mainframe Integration.
3. Galera Cluster: HA Solution for MySQL. MySQL stores Hadoop metadata for Hive, Oozie, Ranger, Ambari. 

Read Also

Real Time Data Integration on Hadoop

Richard Winter, Principal Architect, Think Big Analytics, a Teradata Company [NYSE: TDC]

Hadoop: A Capital Market Perspective

Ken Barnes, SVP Corporate Development, Options IT

Back to the Networking Future

Jim Houghton, CTO of the Americas Region, CSC

Data Design and Storytelling

Dona Wong, VP-Digital Strategy, Communications & Outreach, Federal Reserve Bank of New York