A Persistent Java Collections Library

Overview

PCollections

A Persistent Java Collections Library

Maven Central Javadoc

Overview

PCollections serves as a persistent and immutable analogue of the Java Collections Framework. This includes efficient, thread-safe, generic, immutable, and persistent stacks, maps, vectors, sets, and bags, compatible with their Java Collections counterparts.

Persistent and immutable datatypes are increasingly appreciated as a simple, design-friendly, concurrency-friendly, and sometimes more time- and space-efficient alternative to mutable datatypes.

Persistent versus Unmodifiable

Note that these immutable collections are very different from the immutable collections returned by Java's Collections.unmodifiableCollection() and similar methods. The difference is that Java's unmodifiable collections have no producers, whereas PCollections have very efficient producers.

Thus if you have an unmodifiable Collection x and you want a new Collection x2 consisting of the elements of x in addition to some element e, you would have to do something like:

Collection x2 = new HashSet(x);
x2.add(e);

which involves copying all of x, using linear time and space.

If, on the other hand, you have a PCollection y you can simply say:

PCollection y2 = y.plus(e);

which still leaves y untouched but generally requires little or no copying, using time and space much more efficiently.

Usage

PCollections are created using producers and static factory methods. Some example static factory methods are HashTreePSet.empty() which returns an empty PSet, while HashTreePSet.singleton(e) returns a PSet containing just the element e, and HashTreePSet.from(collection) returns a PSet containing the same elements as collection. See Example Code below for an example of using producers.

The same empty(), singleton(), and from() factory methods are found in each of the PCollections implementations, which currently include one concrete implementation for each abstract type:

  • HashTreePMap provides a PMap implementation, analogous to Java's HashMap.
  • ConsPStack provides a PStack implementation, analogous to Java's LinkedList.
  • TreePVector provides a PVector implementation, analogous to Java's ArrayList.
  • HashTreePSet provides a PSet implementation, analogous to Java's HashSet.
  • HashTreePBag provides a PBag implementation, which is unordered like a set but can contain duplicate elements.

PCollections are highly interoperable with Java Collections: every PCollection is a java.util.Collection, every PMap is a java.util.Map, every PSequence — including every PStack and PVector — is a java.util.List, and every PSet is a java.util.Set.

PCollections uses Semantic Versioning, which establishes a strong correspondence between API changes and version numbering.

PCollections is in the Maven Central repository, under org.pcollections. Thus the Maven coordinates for PCollections are:

<dependency>
    <groupId>org.pcollections</groupId>
    <artifactId>pcollections</artifactId>
    <version>3.1.4</version>
</dependency>

or Gradle:

compile 'org.pcollections:pcollections:3.1.4'

Example Code

The following gives a very simple example of using PCollections, including the static factory method HashTreePSet.empty() and the producer plus(e):

import org.pcollections.*;

public class Example {
  public static void main(String... args) {
    PSet<String> set = HashTreePSet.empty();
    set = set.plus("something");

    System.out.println(set);
    System.out.println(set.plus("something else"));
    System.out.println(set);
  }
}

Running this program gives the following output:

[something]
[something else, something]
[something]

Building form source

To build the project from source clone the repository and then run ./gradlew

Related Work

Clojure and Scala also provide persistent collections on the JVM, but they are less interoperable with Java. Both Guava and java.util.Collections provide immutable collections but they are not persistent—that is, they do not provide efficient producers—so they are not nearly as useful. See Persistent versus Unmodifiable above.

Comments
  • Work around a type inference change in javac

    Work around a type inference change in javac

    The javac compiler's behavior when handling wildcards and "capture" type variables has been changed in Java 9 [1]. This change allows the code to be compiled with the latest version of javac.

    [1] https://bugs.openjdk.java.net/browse/JDK-8039214

    opened by cushon 11
  • OrderedPSet.minus is linear time

    OrderedPSet.minus is linear time

    OrderedPSet.minus calls TreePVector.minus which loops through all the entries searching for the right one to remove.

    Possible fix?

    Rather than augmenting the PVector with a PSet, use something like a PMap<E, Integer> that maps each entry to its index in the vector.

    Boxing all these ints would add overhead, but at least it would have the right asymptotic complexity.

    Alternately, it might be a good idea to have an IntTree that maps ints to ints

    enhancement 
    opened by prdoyle 10
  • [Patch] POrderedMap, and array-based implementations of that, POrderedSet, and PVector

    [Patch] POrderedMap, and array-based implementations of that, POrderedSet, and PVector

    From [email protected] on 2011-08-18T10:01:21Z

    Hi Harold,

    this patch adds the following new stuff:

    • POrderedMap: interface for PMaps that preserve the insertion order
    • ArrayPMap: an implementation of POrderedMap based on arrays
    • ArrayPVector: an implementation of PVector based on arrays
    • ArrayPSet: an implementation of POrderedSet based on arrays
    • ArrayPVectorTest: a junit test for ArrayPVector

    All these array-based variants are especially well-suited (efficient, both performance and memory wise) for small collections.

    We use them (since yesterday!) for collection-valued attributes on nodes and edges of graphs in our graph library JGraLab. There you have thousands or even millions of nodes and edges, but their collection-valued attributes are all rather small, so small memory footprints are very important for us.

    I think there are many similar use-cases that can benefit from these variants, too, so it'd be great if you included them in the pcollections proper.

    Original issue: http://code.google.com/p/pcollections/issues/detail?id=19

    opened by blackdrag 10
  • Null keys & values should be supported

    Null keys & values should be supported

    Currently, the contract of PMap is "non-null keys and non-null values"; as it stands HashPMap forbids null keys, but actually allows null values. That isn't the problem though: in fact, there is a very strong case to explicitly support both null keys and null values:

    • The "for all K and V" implied by <K,V> should really mean "for all". Since null is an element of all K and V, it should be allowed so the types can tell the truth.
    • It's much simpler to use a collection that is guaranteed to work for any unknown values than one that imposes special runtime constraints.
    • While using nulls in code is generally a bad idea, this is none of PCollection's business. Collections shouldn't know about or constrain their contents, unless there's a good performance or correctness or ordering reason, which there isn't here.
    • Prohibiting nulls makes PCollections less useful and less safe in exchange for no gain. In particular, PCollections is unusable for my current usecase, and there is no good reason for this to be so. It's a shame, because there are so few good immutable collection libraries for Java.

    This can mostly be fixed by replacing key.hashCode() and key.equals with Objects.hashCode and Objects.equals. I'm happy to knock up a PR if you are convinced.

    opened by kenbot 9
  • Create a maven repository

    Create a maven repository

    From ian.clarke on 2010-05-05T17:09:24Z

    We use Maven for our dependency management, and therefore it is very difficult for us to use Java libraries that are not available via a public Maven repository.

    Would be greatly appreciated if you could create one for pcollections.

    Original issue: http://code.google.com/p/pcollections/issues/detail?id=10

    opened by blackdrag 9
  • Remove personal details from older commits

    Remove personal details from older commits

    I have realised that my personal details have leaked in some of the commits that have been merged in master a few years ago - can those commits be amended to remove them by any chance?

    opened by ghost 7
  • Upload artifacts on Maven Central

    Upload artifacts on Maven Central

    From [email protected] on 2011-10-27T11:13:37Z

    The binary JAR, sources JAR and documentation may be uploaded to Maven Central or a suitable Maven repository. Makes for an easy inclusion in projects.

    Original issue: http://code.google.com/p/pcollections/issues/detail?id=21

    bug duplicate 
    opened by blackdrag 7
  • Sorted maps and sets (fixes issue #20)

    Sorted maps and sets (fixes issue #20)

    This pull request addresses issue #20 (SortedSet/TreeSet alternative?), by adding the following:

    • new interface PSortedMap extending both PMap and java.util.NavigableMap
    • new interface PSortedSet extending both PSet and java.util.NavigableSet
    • new classes TreePMap and TreePSet implementing those interfaces (respectively)
    • a few new classes that support the implementation of TreePMap and TreePSet
    • corresponding unit-tests and documentation
    • new methods Empty.sortedMap and Empty.sortedSet

    I realize this is kind of huge; if you're on board with the general approach, but want to go back-and-forth with separate pull requests for separate pieces, I'm open to that.

    opened by ran-arigur 6
  • HashTreePSet not serializable

    HashTreePSet not serializable

    HashTreePSet is not serializable. Only tested on HashTreePSet.empty(). Tested on pcollections 3.1.0.

    Simple test case:

    import org.pcollections.HashTreePSet;
    import org.pcollections.PSet;
    
    import java.io.ByteArrayOutputStream;
    import java.io.ObjectOutputStream;
    
    public class SerializePSet {
    
      public static void main(String... args) throws Exception {
        PSet set = HashTreePSet.empty();
        ByteArrayOutputStream bytes = new ByteArrayOutputStream();
        ObjectOutputStream out = new ObjectOutputStream(bytes);
        out.writeObject(set);
      }
    
    }
    

    Output:

    Exception in thread "main" java.io.NotSerializableException: org.pcollections.IntTreePMap$1
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
    	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
    	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
    	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
    	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
    	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    	at test.SerializePSet.main(SerializePSet.java:28)
    
    bug 
    opened by JimN-LiveAction 6
  • SortedSet/TreeSet alternative?

    SortedSet/TreeSet alternative?

    From [email protected] on 2011-10-11T15:52:20Z

    I was wondering if any of the collections offered by pcollections offer an alternative to java.util.SortedSet and java.util.TreeSet.

    I would thus like to have a persistent SortedSet implementation with at least the following methods (with semantics as defined by java.util.SortedSet):

    • first
    • last
    • headSet
    • tailSet

    Original issue: http://code.google.com/p/pcollections/issues/detail?id=20

    opened by blackdrag 6
  • PStack is not a stack

    PStack is not a stack

    From [email protected] on 2010-03-29T14:26:41Z

    Stacks should support the operations: push, pop, peek.

    Original issue: http://code.google.com/p/pcollections/issues/detail?id=7

    bug 
    opened by blackdrag 6
  • optimize `PSet.intersect()`

    optimize `PSet.intersect()`

    • [ ] if the passed list is a Set, then iterate over whichever set is smaller, so we get O(mlgn) complexity, with m<n
    • [ ] benchmark that this is actually faster
    enhancement 
    opened by hrldcpr 1
  • add a utility method for deep conversion from java collections

    add a utility method for deep conversion from java collections

    Thank you for this great library that makes good stuff from Clojure available to Java developers.

    In the context of my book Data-Oriented programming, I am considering to use pcollections to illustrate to Java developers the power of data immutability when building a software system.

    A fundamental principle of Data-Oriented programming is to represent data with generic immutable data structures instead of instances of classes.

    As an example, a book in a library management system (that's the example I use in the book) is represented with a nested map combining maps and lists:

    Map watchmen = Map.of(
                          "isbn", "978-1779501127",
                          "title", "Watchmen",
                          "publicationYear", 1987,
                          "authors", List.of("alan-moore", "dave-gibbons"),
                          "bookItems", List.of(
                                               Map.of(
                                                      "id", "book-item-1",
                                                      "rackId", "rack-17",
                                                      "isLent", true
                                                      ),
                                               Map.of (
                                                       "id", "book-item-2",
                                                       "rackId", "rack-17",
                                                       "isLent", false
                                                       )
                                               )
                          );
    

    Now, I'd like to represent the same data as an immutable collection.

    The only way I found to recursively convert my nested map into persistent collection is by explicitly converting every map and list.

    Map watchmen = HashTreePMap.from(Map.of(
                          "isbn", "978-1779501127",
                          "title", "Watchmen",
                          "publicationYear", 1987,
                          "authors", TreePVector.from(List.of("alan-moore", "dave-gibbons")),
                          "bookItems", TreePVector.from(List.of(
                                               HashTreePMap.from(Map.of(
                                                      "id", "book-item-1",
                                                      "rackId", "rack-17",
                                                      "isLent", true
                                                      )),
                                               HashTreePMap.from(Map.of(
                                                       "id", "book-item-2",
                                                       "rackId", "rack-17",
                                                       "isLent", false
                                                       ))
                                               ))
                          ));
    

    Two questions:

    1. Is there a way to make this conversion less verbose?
    2. Is there a way to recursively converts a map into a pmap?
    enhancement good first issue 
    opened by viebel 4
  • Provide Stream collectors for persistent collections.

    Provide Stream collectors for persistent collections.

    One of the major differences between pcollections and other persistent collections libraries like vavr is the desire to remain interoperable with the standard java collections framework. However, because persistent collections do not allow for mutation, we miss out on a good chunk of the standard library's methods for working with collections.

    As an example, PVector implements java.util.List but does not support any of the mutating methods on that interface. For some there are natural replacements - add, addAll, remove and removeAll are supplanted by plus, plusAll, minus, and minusAll. For others though, there is no natural replacement: sort, retainAll, and clear have no direct analogue.

    This contrasts with classes like vavr's Vector which are fairly full featured in terms of the methods they support on account of not needing to maintain this compatibility.

    io.vavr.Vector.of(1, -1, 2, 3).filter(x -> x <= 2).sorted();
    

    The analogue with regular java collections pre-java 8 would be

    List<Integer> xs = new ArrayList<>();
    xs.add(1);
    xs.add(-1);
    xs.add(2);
    xs.add(3);
    xs.retainAll(x -> x <= 2);
    xs.sort();
    

    But with java 8, we get the streams api, which lets us do those larger functional transformations in the same manner as vavr albeit via explicitly targeting a collection type as the end result..

    List.of(1, -1, 2, 3)
      .stream()
      .filter(x -> x <= 2)
      .sorted()
      .collect(Collectors.toList());
    

    Which is all a long winded way of saying, it would make a lot of sense to provide collector implementations for targeting persistent collections.

    It matches up with the standard java way of doing things and would be a decent usability win. The main downside is that to do this would require a major version bump to make Java 8 the minimum and maybe working out some of the other issues with regards to performance. Specifically, if there is ever added a notion of transients or some other kind of mutable intermediate representation it would be transparent to swap the implementation of a collector provided by the library, perhaps without exposing that functionality until it is fully designed or maybe never.

    public interface PVector<E> extends PSequence<E> {
        public static <T> Collector<T, ?, PVector<T>> collector() {
            return Collector.of(Empty::pvector, PVector::plus, PVector::plusAll);
        }
        // ...
    }
    
    TreePVector.from(List.of(1, -1, 2, 3))
      .stream()
      .filter(x -> x <= 2)
      .sorted()
      .collect(PVector.collector());
    

    There would be other benefits to moving to Java 8 as well - chaos and major version bumps are ladders. Perhaps a PVector.empty() could supplant Empty.pvector() or similar api improvements and deprecations.

    enhancement help wanted 
    opened by bowbahdoe 1
  • try to reuse existing collections whenever possible

    try to reuse existing collections whenever possible

    …for example in minus and minusAll methods, if nothing is removed then ideally the original collection should be returned. Same for plusAll(empty())

    • [ ] find any methods that don't achieve this
    • [ ] fix them if it won't hurt performance, otherwise just add a comment

    for example see 4720fa7

    enhancement 
    opened by hrldcpr 0
  • fix deprecation warnings

    fix deprecation warnings

    mvn clean compile -Dmaven.compiler.showDeprecation=true

    …they don't indicate any actual problem, they're just because we have to implement mutators to fulfill Collections interfaces, but we have also deprecated mutators in PCollections so that people get IDE warnings if they try to use them

    possibly can suppress the warning with a suppression comment?

    enhancement good first issue 
    opened by hrldcpr 3
Owner
harold cooper
dilettante @recursecenter @mit
harold cooper
Persistent (immutable) collections for Java and Kotlin

What are Dexx Collections? Dexx Collections are a port of Scala's immutable, persistent collection classes to pure Java. Persistent in the context of

Andrew O'Malley 208 Sep 30, 2022
Eclipse Collections is a collections framework for Java with optimized data structures and a rich, functional and fluent API.

English | 中文 | Deutsch | Español | Ελληνικά | Français | 日本語 | Norsk (bokmål) | Português-Brasil | Русский | हिंदी Eclipse Collections is a comprehens

Eclipse Foundation 2.1k Dec 29, 2022
Java port of a concurrent trie hash map implementation from the Scala collections library

About This is a Java port of a concurrent trie hash map implementation from the Scala collections library. It is almost a line-by-line conversion from

null 147 Oct 31, 2022
High Performance Primitive Collections for Java

HPPC: High Performance Primitive Collections Collections of primitive types (maps, sets, stacks, lists) with open internals and an API twist (no java.

Carrot Search 890 Dec 28, 2022
Java Collections till the last breadcrumb of memory and performance

Koloboke A family of projects around collections in Java (so far). The Koloboke Collections API A carefully designed extension of the Java Collections

Roman Leventov 967 Nov 14, 2022
A beginner's guide to Learn Java Collections Framework

Collections-Framework A beginner's guide to Learn Java Collections Framework Topics included: Collection Interface ArrayList Iterator Stack Queue and

Anuj Kumar Sharma 97 Dec 30, 2022
The Java collections framework provides a set of interfaces and classes to implement various data structures and algorithms.

Homework #14 Table of Contents General Info Technologies Used Project Status Contact General Information Homework contains topics: Sorting an ArrayLis

Mykhailo 1 Feb 12, 2022
fasttuple - Collections that are laid out adjacently in both on- and off-heap memory.

FastTuple Introduction There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class gar

BMC TrueSight Pulse (formerly Boundary) 137 Sep 30, 2022
A high performance caching library for Java

Caffeine is a high performance, near optimal caching library. For more details, see our user's guide and browse the API docs for the latest release. C

Ben Manes 13k Jan 5, 2023
A Java library for quickly and efficiently parsing and writing UUIDs

fast-uuid fast-uuid is a Java library for quickly and efficiently parsing and writing UUIDs. It yields the most dramatic performance gains when compar

Jon Chambers 142 Jan 1, 2023
Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-only access.

Hollow Hollow is a java library and toolset for disseminating in-memory datasets from a single producer to many consumers for high performance read-on

Netflix, Inc. 1.1k Dec 25, 2022
Java library for the HyperLogLog algorithm

java-hll A Java implementation of HyperLogLog whose goal is to be storage-compatible with other similar offerings from Aggregate Knowledge. NOTE: This

Aggregate Knowledge (a Neustar service) 296 Dec 30, 2022
A simple integer compression library in Java

JavaFastPFOR: A simple integer compression library in Java License This code is released under the Apache License Version 2.0 http://www.apache.org/li

Daniel Lemire 487 Dec 30, 2022
LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL, Vulkan), audio (OpenAL), parallel computing (OpenCL, CUDA) and XR (OpenVR, LibOVR) applications.

LWJGL - Lightweight Java Game Library 3 LWJGL (https://www.lwjgl.org) is a Java library that enables cross-platform access to popular native APIs usef

Lightweight Java Game Library 4k Dec 29, 2022
A modern I/O library for Android, Kotlin, and Java.

Okio See the project website for documentation and APIs. Okio is a library that complements java.io and java.nio to make it much easier to access, sto

Square 8.2k Dec 31, 2022
RxJava – Reactive Extensions for the JVM – a library for composing asynchronous and event-based programs using observable sequences for the Java VM.

RxJava: Reactive Extensions for the JVM RxJava is a Java VM implementation of Reactive Extensions: a library for composing asynchronous and event-base

ReactiveX 46.7k Dec 30, 2022
Jalgorithm is an open-source Java library which has implemented various algorithms and data structure

We loved Java and algorithms, so We made Jalgorithm ❤ Jalgorithm is an open-source Java library which has implemented various algorithms and data stru

Muhammad Karbalaee 35 Dec 15, 2022
A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.

High Scale Lib This is Boundary's fork of Cliff Click's high scale lib. We will be maintaining this fork with bug fixes, improvements and versioned bu

BMC TrueSight Pulse (formerly Boundary) 402 Jan 2, 2023
Library for creating In-memory circular buffers that use direct ByteBuffers to minimize GC overhead

Overview This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used

Tatu Saloranta 132 Jul 28, 2022