SWE5003 - Achitecting Real Time Systems for Data Processing - Code Base

Overview

ARTS2022

SWE5003 - Achitecting Real Time Systems for Data Processing (ISS NUS Offering) - Code Base

This module is part of the ISS MTech Graduate Certificate series. The module is under the Graduate Certificate in Engineering Big Data series offered by NUS-ISS.

The course will equip the participants with essential knowledge and skills to architect systems that processes real time stream data. The course will discuss reference architecture (Kappa) of data intensive systems that include processing pipelines, ingestion patterns specific to stream data, asynchronous message design for moving data, evaluation, processing, analysis and cataloguing of various data streams, persistence strategies, security, privacy, in-memory strategies, and client side delivery mechanism.

Upon completion of the course, the participants will be able to:

  • Understand the various facets of a real time data and stream processing pipeline.
  • Design a reference architecture for a real time data processing system by determining the needful layers such as ingestion, collection, wrangling, message queues, analysis, in-memory processing, storage and accessing new insights.
  • Collect and design appropriate storage strategy for data originating from smaller devices such as IoT, Sensors and IoE.
  • Design data ingestion layer, data wrangling layer and processing layer based on the networking protocols and storage requirements.
  • Design robust message producers and consumers for writing and reading messages.
  • Evaluate and determine best stream processing framework suited for the given business needs.
  • Assemble messaging architecture (example Kafka) based on communication patterns and use-case requirements to ensure reliable stream processing via common configurations.
  • Integrate disparate data sources using unanimous ingestion layer that manages channels via MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver.
  • Integrate and process pipelines that work with structured and discretized streams.
  • Build and optimize production-grade deployments of Streaming solutions via common algorithms, configuration recipes, and tuning of instrumentation API.
You might also like...

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Dec 29, 2022

Reading Dalta Lake data from Beam

Reading Delta Lake Data from Beam General Info: All files, except org.apache.beam.sdk.io.DeltaFileIO are from Daltalake Standalone Reader. I was not a

Nov 21, 2022

An embedded database implemented in pure java based on bitcask which is a log-structured hash table for K/V Data.

Baka Db An embedded database implemented in pure java based on bitcask which is a log-structured hash table for K/V Data. Usage import cn.ryoii.baka.B

Dec 20, 2021

Dremio - the missing link in modern data

Dremio Dremio enables organizations to unlock the value of their data. Documentation Documentation is available at https://docs.dremio.com. Quickstart

Dec 31, 2022

Jalgorithm is an open-source Java library which has implemented various algorithms and data structure

Jalgorithm is an open-source Java library which has implemented various algorithms and data structure

We loved Java and algorithms, so We made Jalgorithm ❤ Jalgorithm is an open-source Java library which has implemented various algorithms and data stru

Dec 15, 2022

The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

Feb 12, 2022

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

Firehose - Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

Dec 22, 2022

Operating Systems - Concepts of computer operating systems including concurrency, memory management, file systems, multitasking, performance analysis, and security. Offered spring only.

Nachos for Java README Welcome to Nachos for Java. We believe that working in Java rather than C++ will greatly simplify the development process by p

Nov 28, 2021

Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.

Dagger Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processi

Dec 22, 2022

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Jan 2, 2023

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

Dec 30, 2022

Serverless Reference Architecture for Real-time File Processing

Serverless Reference Architecture for Real-time File Processing

Serverless Reference Architecture: Real-time File Processing The Real-time File Processing reference architecture is a general-purpose, event-driven,

Oct 7, 2022

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Hystrix is a latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable.

Hystrix: Latency and Fault Tolerance for Distributed Systems Hystrix Status Hystrix is no longer in active development, and is currently in maintenanc

Jan 5, 2023

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

SeaTunnel SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021. SeaTunnel is a very easy-to-use ultra-high-performanc

Jan 2, 2023

Time-Based One-Time Password (RFC 6238) and HMAC-Based One-Time Password (RFC 4226) reference implementations and more.

Crypto Time-Based One-Time Password (RFC 6238) and HMAC-Based One-Time Password (RFC 4226) reference implementations and more. Getting Started TOTP ge

May 12, 2022

Source code of course - Building Real-Time REST APIs with Spring Boot

springboot-blog-rest-api Learn how to build real-time REST APIs with Spring Boot by building a complete Blog App. Source code of Popular Building Real

Jan 6, 2023

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

About CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. CrateDB offers the

Jan 2, 2023
Owner
Suria R Asai
Suria R Asai
BioJava is an open-source project dedicated to providing a Java framework for processing biological data.

Welcome to BioJava is an open-source project dedicated to providing a Java framework for processing biological data. It provides analytical and statis

BioJava 513 Dec 31, 2022
Custom Base continuation of OyVey (a phobos skid lel)

OyVeyPlus Custom Base continuation of OyVey (a phobos skid lel) please note that some of the current modules are skidded. This is because i was testin

BigJayMuffin 11 Dec 5, 2022
A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.

High Scale Lib This is Boundary's fork of Cliff Click's high scale lib. We will be maintaining this fork with bug fixes, improvements and versioned bu

BMC TrueSight Pulse (formerly Boundary) 402 Jan 2, 2023
This repository has the code for basic operations on tries - insert, search and delete.

This repository is part of the unacademy session series I took on 17th and 18th of April, 2021. I am daily improving it a bit, with the amount of time

Tarun Gupta 12 Apr 27, 2021
A generalization of Elias Gamma Code

Zeta-Xi Code Zeta-Xi Code is a universal code for representing variable-length nonnegative integers in binary format, developed by Einar Saukas. It's

Einar Saukas 6 Dec 22, 2022
Google Hash Code '22 Question

Answer for - Mentorship and Teamwork Google Hash Code '22 Question Credit goes to Google LLC - Hash Code '22 Work is so much more fun when we are part

Dilshan Karunarathne 4 Apr 12, 2022
Aix-bench, the Java benchmark for code synthesis problem.

AiXcoder NL2Code Evaluation Benchmark (aix-bench) 简体中文 Paper available: https://arxiv.org/abs/2206.13179 Introduction This is a method-level benchmark

null 28 Dec 12, 2022
High Performance data structures and utility methods for Java

Agrona Agrona provides a library of data structures and utility methods that are a common need when building high-performance applications in Java. Ma

Real Logic 2.5k Jan 5, 2023
Clojure's data structures modified for use outside of Clojure

This library has been extracted from the master branch of Clojure (http://clojure.org) version 1.5.1 (as of October 2013) http://github.com/richhick

Karl Krukow 221 Oct 6, 2022