At Cloudbox Labs, we think logs is an incredibly interesting dataset. They are the heart beats of our tech stack. In this post we built a robust set of data infrastructure that can handle large volume of logs from all our applications, and allow for real time analytics as well as batch processing.
We are big fans of Apache Kafka when it comes to building distributed real time stream processing systems. In this post, we are going to use Kafka Streams to track real time statistics of Citi Bike utilization in New York City.
Google Cloud has fully managed services that allow end users to build big data pipelines for their analytical needs. In this post, we are going to build a data pipeline that analyzes real time stock tick data streamed from gCloud Pub/Sub, runs them through a pair correlation trading algorithm, and outputs trading signals onto Pub/Sub for execution.