Access paged data as a "stream" with async loading while maintaining order

Related tags

Big data datastream
Overview

DataStream

What?

DataStream is a simple piece of code to access paged data and interface it as if it's a single "list". It only keeps track of queued entries which is at most page size + minimum entries.

How does it work?

The class uses an ArrayDeque to queue entries and a SynchronisedQueue to hand of entries when reading thread is waiting for data. It makes use of the IExecutor interface to generalise asynchronous execution of the load function. I could have gone with an ExecutorService for this, but that wouldn't be able to use any custom scheduler when desired.

Usage

First we have to implement IExecutor which is going to handle our async task scheduling.

import nl.iobyte.datastream.interfaces.IExecutor;

public class MyExecutor implements IExecutor {
    
    public void async(Runnable r) {
        //TODO schedule task
    }
    
}

We also need a DataProvider which is going to get the data from the desired source for a given page with size and turn it into an ordered list of objects.

import nl.iobyte.datastream.interfaces.DataProvider;

import java.util.List;

public class MyDataProvider implements DataProvider<MyClass> {
    
    public List<MyClass> page(int i, int size) {
        //page indexes start with 0 to max page - 1
        //size is how many items per page, if we don't want to use it just ignore it
        
        //TODO Retrieve and parse page
        
        return list; //Return list of parsed data here
    }
    
}

Next up we can finally initialize our DataStream for type MyClass.

MyExecutor executor = ....;
MyDataProvider provider = ...;

DataStream<MyClass> stream = new DataStream(
        provider,
        executor,
        10, //Page size, must be bigger than minimum entries
        3 //Entries in queue at which to start loading the next page
);

//Take a single vallue
MyClass entry = stream.next(5, TimeUnit.SECONDS); //Specify after how long to timeout if no values loaded

Iterator<MyClass> iterator = stream.iterator(5, TimeUnit.SECONDS); //Specify after how long to timeout if no values loaded

for(MyClass entry : stream.iterate(5, TimeUnit.SECONDS))//Specify after how long to timeout if no values loaded
    //TODO do something with my entry
You might also like...

Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Finding average number of words in all the comments in a data set 📝 Mapper Function In the mapper function we first tokenize entire data and then fin

Aug 23, 2021

It is a simple java terminal game. I built it in order to practice my code skills that I obtained while I was learning Java.

Java-terminal-game It is a simple java terminal game. I built it in order to practice my code skills that I obtained while I was learning Java. The ga

Jan 20, 2022

cglib - Byte Code Generation Library is high level API to generate and transform Java byte code. It is used by AOP, testing, data access frameworks to generate dynamic proxy objects and intercept field access.

cglib Byte Code Generation Library is high level API to generate and transform JAVA byte code. It is used by AOP, testing, data access frameworks to g

Jan 8, 2023

BAIN Social is a Fully Decentralized Server/client system that utilizes Concepts pioneered by I2P, ToR, and PGP to create a system which bypasses singular hosts for data while keeping that data secure.

SYNOPSIS ---------------------------------------------------------------------------------------------------- Welcome to B.A.I.N - Barren's A.I. Natio

Jan 11, 2022

Just-In-Time Access is an AppEngine application that lets you manage just-in-time privileged access to Google Cloud projects.

Just-In-Time Access is an AppEngine application that lets you manage just-in-time privileged access to Google Cloud projects.

Just-In-Time Access Just-In-Time Access is an AppEngine application that lets you manage just-in-time privileged access to Google Cloud projects. Syno

Jan 3, 2023

Student Result Management System - This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end of a semester after the teacher's have provided the respective marks.

Student Result Management System This is a CLI based software where the Software is capable of maintaining and generating Student's Result at the end

Aug 27, 2022

A simple desktop application with minimalistic UI capable of maintaining a file based database of movies presenting the opportunity of adding and transferring movies for production companies using a TCP client-server architecture.

A simple desktop application with minimalistic UI capable of maintaining a file based database of movies presenting the opportunity of adding and transferring movies for production companies using a TCP client-server architecture.

MovieMania-2022-JavaFX-Term-Project-L1T2 A simple desktop application with minimalistic UI capable of maintaining a file based database of movies pres

Oct 21, 2022

A JSON Schema validation implementation in pure Java, which aims for correctness and performance, in that order

Read me first The current version of this project is licensed under both LGPLv3 (or later) and ASL 2.0. The old version (2.0.x) was licensed under LGP

Jan 4, 2023

Sample application demonstrating an order fulfillment system decomposed into multiple independant components (e.g. microservices). Showing concrete implementation alternatives using e.g. Java, Spring Boot, Apache Kafka, Camunda, Zeebe, ...

Sample application demonstrating an order fulfillment system decomposed into multiple independant components (e.g. microservices). Showing concrete implementation alternatives using e.g. Java, Spring Boot, Apache Kafka, Camunda, Zeebe, ...

Sample application demonstrating an order fulfillment system decomposed into multiple independant components (e.g. microservices). Showing concrete implementation alternatives using e.g. Java, Spring Boot, Apache Kafka, Camunda, Zeebe, ...

Dec 14, 2022

This repository contains a functional example of an order delivery service similar to UberEats, DoorDash, and Instacart.

Order Delivery Microservice Example In an event-driven microservices architecture, the concept of a domain event is central to the behavior of each se

Dec 7, 2022

JVM runtime class loading protection agent.(JVM类加载保护agent)

JVM类加载监控agent,可配置黑名单,禁止恶意类加载(包括jsp webshell)

Sep 28, 2022

Saga pattern with Java = order - payment - stock microservices are ready to use

Saga pattern with Java => order -> payment -> stock microservices are ready to use

Order_Payment_Stock_Saga_Pattern Saga pattern with Java = order - payment - stock microservices are ready to use Docker-compose.yaml You can see th

Dec 27, 2022

Powerful and flexible library for loading, caching and displaying images on Android.

Powerful and flexible library for loading, caching and displaying images on Android.

Universal Image Loader The great ancestor of modern image-loading libraries :) UIL aims to provide a powerful, flexible and highly customizable instru

Jan 2, 2023

An image loading and caching library for Android focused on smooth scrolling

An image loading and caching library for Android focused on smooth scrolling

Glide | View Glide's documentation | 简体中文文档 | Report an issue with Glide Glide is a fast and efficient open source media management and image loading

Dec 31, 2022
Owner
Thomas
Hi, I’m Thomas, a developer with passion for their work who's always interested in improving.
Thomas
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

IMPORTANT NOTE!!! Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here: https://github

Nathan Marz 8.9k Dec 26, 2022
Stream summarizer and cardinality estimator.

Description A Java library for summarizing data in streams for which it is infeasible to store all events. More specifically, there are classes for es

AddThis 2.2k Dec 30, 2022
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter

Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor. Heron in Apache

The Apache Software Foundation 3.6k Dec 28, 2022
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Apache Zeppelin Documentation: User Guide Mailing Lists: User and Dev mailing list Continuous Integration: Contributing: Contribution Guide Issue Trac

The Apache Software Foundation 5.9k Jan 8, 2023
OpenRefine is a free, open source power tool for working with messy data and improving it

OpenRefine OpenRefine is a Java-based power tool that allows you to load data, understand it, clean it up, reconcile it, and augment it with data comi

OpenRefine 9.2k Jan 1, 2023
Netflix's distributed Data Pipeline

Suro: Netflix's Data Pipeline Suro is a data pipeline service for collecting, aggregating, and dispatching large volume of application events includin

Netflix, Inc. 772 Dec 9, 2022
Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

LinkedIn's Attic 589 Apr 1, 2022
SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.

SAMOA: Scalable Advanced Massive Online Analysis. This repository is discontinued. The development of SAMOA has moved over to the Apache Software Foun

Yahoo Archive 424 Dec 28, 2022
The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Presto 14.3k Jan 5, 2023
A platform for visualization and real-time monitoring of data workflows

Status This project is no longer maintained. Ambrose Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workfl

Twitter 1.2k Dec 31, 2022