DataTorrent: Data Ingestion in Real Time

Phu Hoang, Co-founder & CEO
Today, the majority of data available in organizations is “unstructured”, and must be organized in a way that makes it suitable for information mining and subsequent analysis, as the significance of predictions made by sorting and analyzing big data has gained momentum. Hadoop is the core platform for structuring big data that enables distributed parallel processing of large volumes of data across servers resulting in the extraction of potential value from all this data. However, organizations face challenges in the form of getting data in and out of Hadoop, and no existing tool handles all the requirements demanded for Hadoop ingestion. Santa Clara, CA based DataTorrent is providing real-time big data analytics solution that has high performing, fault tolerant unified architecture for both data in motion and data at rest—through DataTorrent Real-time Streaming (RTS). The company has also introduced an enterprise-grade unified platform for both stream and batch processing on Hadoop—DataTorrent dtIngest. “dtIngest simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline,” says Phu Hoang, Co-founder and CEO, DataTorrent.

DataTorrent’s dtIngest is built on Apache Apex—an open source enterprise grade unified stream and batch processing platform, making the job of configuring and running Hadoop data ingestion and data distribution pipelines a point-and-click process. The Apache 2.0 open-source platform is completely fault tolerant and can ‘resume’ file ingest on failure. The solution is simple to use and manage, making it easier to configure, save, and launch multiple data ingestion and distribution pipelines. Its centralized management provides visibility, monitoring and summary logs. With secure and efficient data movement dtIngest supports compression and encryption during ingestion, is certified with Kerberos-enabled secure Hadoop clusters, and runs in any Hadoop 2.0 cluster.

The company was established by Yahoo! veterans who had in-depth expertise in overseeing big data for driving edge applications and framework on a massive scale. Their mastery in building large-scale platforms, have helped to build an infinitely scalable and fully fault-tolerant technology that leverages the very latest developments in the Big Data ecosystem.

DataTorrent dtIngest simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline

“Recognizing that Big Data holds the most value for operational intelligence when it can be acted upon instantaneously–our goal was to solve the challenges posed in processing massive amounts of data in real-time,” adds Phu Hoang. The DataTorrent platform is a robust solution for real-time stream analytics and action for Big Data, simplifying the development and production of real-time applications.

As a platform that offers critical enterprise features such as linear scalability and high availability, DataTorrent enables enterprises to take advantage of the real-time impact of big data, with no risks, constraints, management overhead or performance degradation. One of the company’s long standing client Silver Spring Networks has been using the power of DataTorrent’s real time analytics, in the innovate Silverlink Sensor Network. The SilverLink Sensor Network is the first network-based service that transforms the real-time analysis of smart grid big data to improve operations and increase customer engagement.

For the days to come DataTorrent has plans to evolve dtIngest with more data sources, destinations and computational modules as they build more applications. “Imagine what you could achieve when you’re able to automatically analyze and act on big data in real-time–instead of having to wait for it to be processed, stored, and then wait for your next scheduled report or batch job to run,” concludes Phu Hoang.


Santa Clara, CA

Phu Hoang, Co-founder & CEO

Providing unified stream and batch data ingestion application for Hadoop through DataTorrent dtIngest