big data application

Distributed log analytics using Apache Kafka, Kafka Connect and Fluentd

Distributed log analytics using Apache Kafka, Kafka Connect and Fluentd

At Cloudbox Labs, we think logs is an incredibly interesting dataset. They are the heart beats of our tech stack. In this post we built a robust set of data infrastructure that can handle large volume of logs from all our applications, and allow for real time analytics as well as batch processing.

Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

Google Cloud has fully managed services that allow end users to build big data pipelines for their analytical needs. In this post, we are going to build a data pipeline that analyzes real time stock tick data streamed from gCloud Pub/Sub, runs them through a pair correlation trading algorithm, and outputs trading signals onto Pub/Sub for execution.

Building a real time NYC subway tracker with Apache Kafka

Building a real time NYC subway tracker with Apache Kafka

In recent years, Apache Kafka has become the technology of choice when it comes to working with streaming data. In this post we will use Apache Kafka to build a real time NYC subway tracker that shows you when the next train will arrive in the station.