Distributed log analytics using Apache Kafka, Kafka Connect and Fluentd

Distributed log analytics using Apache Kafka, Kafka Connect and Fluentd

At Cloudbox Labs, we think logs is an incredibly interesting dataset. They are the heart beats of our tech stack. In this post we built a robust set of data infrastructure that can handle large volume of logs from all our applications, and allow for real time analytics as well as batch processing.

Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

Building a real time quant trading engine on Google Cloud Dataflow and Apache Beam

Google Cloud has fully managed services that allow end users to build big data pipelines for their analytical needs. In this post, we are going to build a data pipeline that analyzes real time stock tick data streamed from gCloud Pub/Sub, runs them through a pair correlation trading algorithm, and outputs trading signals onto Pub/Sub for execution.

Building a real time NYC subway tracker with Apache Kafka

Building a real time NYC subway tracker with Apache Kafka

In recent years, Apache Kafka has become the technology of choice when it comes to working with streaming data. In this post we will use Apache Kafka to build a real time NYC subway tracker that shows you when the next train will arrive in the station. 

Building gRPC services on AWS

Building gRPC services on AWS

RPC (remote procedure call) is the mechanism whereby an application client can invoke a function call to a server running on distributed hardware as if it were calling a local function. In this blog post we will build a distributed service on Amazon Web Services(AWS) using one of the most popular modern RPC frameworks, gRPC from Google.

Continuous Integration with Docker and Jenkins

Continuous Integration with Docker and Jenkins

In today’s distributed computing environment, continuous integration and delivery (CI/CD) can be challenging given the multitude of dependencies that have to managed and replicated. In this post, we are going to develop a continuous integration workflow using Docker.

Building distributed data pipeline on AWS

Building distributed data pipeline on AWS

Data pipelines are common for any businesses that work with large amount of data. It is a crucial piece of infrastructure that fetches data from its source, transforms it and stores it for internal use. In this post we will build a distributed data pipeline using core services in Amazon Web Services (AWS).