At Cloudbox Labs, we think logs is an incredibly interesting dataset. They are the heart beats of our tech stack. In this post we built a robust set of data infrastructure that can handle large volume of logs from all our applications, and allow for real time analytics as well as batch processing.
Google Cloud has fully managed services that allow end users to build big data pipelines for their analytical needs. In this post, we are going to build a data pipeline that analyzes real time stock tick data streamed from gCloud Pub/Sub, runs them through a pair correlation trading algorithm, and outputs trading signals onto Pub/Sub for execution.
In recent years, Apache Kafka has become the technology of choice when it comes to working with streaming data. In this post we will use Apache Kafka to build a real time NYC subway tracker that shows you when the next train will arrive in the station.
RPC (remote procedure call) is the mechanism whereby an application client can invoke a function call to a server running on distributed hardware as if it were calling a local function. In this blog post we will build a distributed service on Amazon Web Services(AWS) using one of the most popular modern RPC frameworks, gRPC from Google.
Data pipelines are common for any businesses that work with large amount of data. It is a crucial piece of infrastructure that fetches data from its source, transforms it and stores it for internal use. In this post we will build a distributed data pipeline using core services in Amazon Web Services (AWS).