ARTS2022
SWE5003 - Achitecting Real Time Systems for Data Processing (ISS NUS Offering) - Code Base
This module is part of the ISS MTech Graduate Certificate series. The module is under the Graduate Certificate in Engineering Big Data series offered by NUS-ISS.
The course will equip the participants with essential knowledge and skills to architect systems that processes real time stream data. The course will discuss reference architecture (Kappa) of data intensive systems that include processing pipelines, ingestion patterns specific to stream data, asynchronous message design for moving data, evaluation, processing, analysis and cataloguing of various data streams, persistence strategies, security, privacy, in-memory strategies, and client side delivery mechanism.
Upon completion of the course, the participants will be able to:
- Understand the various facets of a real time data and stream processing pipeline.
- Design a reference architecture for a real time data processing system by determining the needful layers such as ingestion, collection, wrangling, message queues, analysis, in-memory processing, storage and accessing new insights.
- Collect and design appropriate storage strategy for data originating from smaller devices such as IoT, Sensors and IoE.
- Design data ingestion layer, data wrangling layer and processing layer based on the networking protocols and storage requirements.
- Design robust message producers and consumers for writing and reading messages.
- Evaluate and determine best stream processing framework suited for the given business needs.
- Assemble messaging architecture (example Kafka) based on communication patterns and use-case requirements to ensure reliable stream processing via common configurations.
- Integrate disparate data sources using unanimous ingestion layer that manages channels via MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver.
- Integrate and process pipelines that work with structured and discretized streams.
- Build and optimize production-grade deployments of Streaming solutions via common algorithms, configuration recipes, and tuning of instrumentation API.