fastutil extends the Java Collections Framework by providing type-specific maps, sets, lists and queues.

Overview

Welcome to fastutil

fastutil is a collection of type-specific Java classes that extend the Java Collections Framework by providing several containers, such as maps, sets, lists and prority queues, implementing the interfaces of the java.util package; it also provides big (64-bit) arrays, sets, lists, and fast, practical I/O classes for binary and text files.

fastutil provides a huge collection of specialized classes generated starting from a parameterized version; the classes are much more compact and much faster than the general ones. Please read the package documentation for more information.

Since version 8.5.5, fastutil is split into two jars for convenience:

  • fastutil-core.jar contains data structures based on integers, longs, doubles, and objects;

  • fastutil.jar is the classic distribution, containing all classes.

Note that core classes are duplicated in the standard jar, so if you are depending on both (for example, because of transitive dependencies) you should exclude the core jar.

Previous split versions would provide different classes in different jars, but managing sensibly dependencies turned out to be impossible.

You can also create a small, customized fastutil jar (which you can put in your repo, local maven repo, etc.) using the find-deps.sh shell script. It has mild prerequisites, as only the jdeps tool is required (bundled with JDK 8). It can be used to identify all fastutil classes your project uses and build a minimized jar only containing the necessary classes.

Building

First, you have to make sources to get the actual Java sources. After that, ant jar will generate a single jar file; ant javadoc will generate the API documentation; ant junit will run the unit tests.

If you want to obtain the three jars above, you have to run the script split.sh, and then ant osgi-rest.

The Java sources are generated using a C preprocessor. The gencsource.sh script reads in a driver file, that is, a Java source that uses some preprocessor-defined symbols and some conditional compilation, and produces a (fake) C source, which includes the driver code and some definitions that customize the environment.

Comments
  • Spliterators and primitive streams

    Spliterators and primitive streams

    Here it is, spliterators and primitive stream methods for collections. Also fill in some other Java 8 methods.

    Overview of the change: https://docs.google.com/document/d/1yMf5LSajEQ4i4odnwWrXDPPl7OmjX1Fj2MpHzhCAB5c/edit?usp=sharing

    This change addresses https://github.com/vigna/fastutil/issues/116, https://github.com/vigna/fastutil/issues/115, and https://github.com/vigna/fastutil/issues/98

    IntArrayList.intStream().sum() is about 3.5 times faster then java.util.ArrayList.stream().mapToInt(Integer::intValue).sum() (1ns per element for ArrayList, .28ns per element for IntArrayList)

    If you think the changes are substantial enough to warrant a bump to 9.0.0, may I suggest that we require Java 9 for that so that we can take advantage of its high performance methods for arrays and index checks?

    I am not content with how the disambiguation methods (such as forEachRemaining(...fastutil.ints.IntConsumer) delegates to forEachRemaining(java.util.function.IntConsumer)) turned out. But without this, calling forEachRemaining(...fastutil.ints.IntConsumer) would cause a compile-time error as it would be ambiguous between java.util.function.IntConsumer and java.util.function.Consumer. If you don't think this disambiguation is worth it (for example, you wanted to deprecate ...fastutil.ints.IntConsumer), then I can remove it and the cruft associated with it.

    Apologies for doubling the size of TODO. I do plan on getting to the items I listed in it, but I didn't want to block any longer on this.

    opened by techsy730 56
  • Java 8 support

    Java 8 support

    Hi,

    as you said in #43:

    Well, Java 7 hit the end of life. So Java 8 support it's a matter of survival.

    Do you plan to migrate to Java 8? I strongly urge you to - I personally need most of these changes :) I identified several "todo"-items for Java 8, which I would be willing to implement:

    • Support of the new Map methods like putIfAbsent, computIfAbsent etc. (#57)
    • Synchronization support of the newly introduced map methods (#48)
    • Support of new Iterator interfaces (#43)
    • Support of new Function interfaces, e.g. Int2DoubleMap should implement Java 8's IntToDoubleFunction.

    I imagine that one could implement all this in a way which allows you to specify the target Java version. But, while it should be possible, I advice against it; as you pointed out, Java 7 is not the go-to version anymore and future releases should target Java 8.

    opened by incaseoftrouble 55
  • [Feature request] Pairs

    [Feature request] Pairs

    Would you accept Pair<K,V>, similar to https://docs.oracle.com/javase/9/docs/api/javafx/util/Pair.html and https://developer.android.com/reference/android/util/Pair in your codebase ? I use Pairs in my product and would be interested in PairIntInt and PairIntObject<V>.

    If you'd accept such a PR, I would try to create it if I understand the language

    opened by Arthur-Milchior 40
  • Publish an image only using the basic primitives

    Publish an image only using the basic primitives

    If I recall correctly, I already added support for building a reduced jar ("fastutil-min"). It might make sense to actually publish this.

    Reasoning: Asides from memory usage due to array packing there is, to my knowledge, no reason to use any key/value types except int, long and double, as essentially all operations are only defined for these types and operations on the other "primitives" even incur a performance penalty due to conversion. One could discuss whether boolean types could be included (XY2BooleanMap and BooleanList have some merit I guess). Advantage are smaller image (not that important), less clutter (autocompletion isn't spammed with 100 different types) and faster to work with (indexing fastutil does take a while).

    Maybe the "full" fastutil image could even depend on the "min" image, however I guess this is a bit tricky to work with.

    EDIT: floats are also natively supported by the JVM, however they come with many other pitfalls, so I still think that they should not be included in the core image

    opened by incaseoftrouble 35
  • New Utility functions for fastutil.

    New Utility functions for fastutil.

    Here some functions that were really needed. (More the stack then the Priority queue)

    Compiled the code to test if it works and added a test to make sure nothing crashes or gives unexpected results.

    opened by Speiger 35
  • Reorganize assertions, defines, and relationship between JDK's and fastutil's primitive-based functional interfaces

    Reorganize assertions, defines, and relationship between JDK's and fastutil's primitive-based functional interfaces

    This is an umbrella issue for discussing two problems that are becoming very visible in these days in the codebase:

    • The overlap of different people implementing different conditional features has led to a proliferation of complex assertions—the semantics of which, however, is sometime much simpler than one would expect (e.g., it's just a matter of being int/long/double or any other type). All contributors added stuff to gencsources.sh, resulting in a large collection of #define's without any comment. Ten years ago that would have been OK, but with the amount of subtly convoluted conditional compilation we have now this is no longer acceptable. It has become very difficult (for me) to understand which code is being generated. There must be some documentation for the #define's, and clearly named assertions.

    • The strategy for implementing functional interfaces or callers of functional interfaces in the presence of multiple possible implementations (e.g., object-based, JDK-primitive, or fastutil-primitive) is not uniform throughout the codebase. I find myself thinking most of the time "shouldn't this be implemented the other way around?". This is not healthy (oh, of course maybe I'm just old). Once again, ten years ago there were so few cases that looking around in other parts of the code was sufficient to understand what was the best practice, but this is no longer the case.

    For the first part, it is mostly a matter of restructuring the file and commenting.

    For the second part, I must write somewhere what is the preferred design. I'll keep updating this text, and I invite everybody working on fastutil to comment below.

    Now, given the premises, there are things we cannot change:

    • We want to benefit from code improvements upstream. This means that anything primitive that has been implemented in the JDK must not be overridden. For example, IntIerator.forEachRemaining() has primitive implementations in the JDK, so we should use those (possibly all of them—also the ones accepting object-based consumers).

    • Lambdas will be fastutil-primitive. This is unavoidable because fastutil's types join the object and primitive JDK world, and this is the only possibility to avoid ambiguity while continuing fastutil's tradition of adding primitive-based operation without eliminating the object-based ones. As already implemented by @techsy730 in a number of cases, for methods accepting functional interfaces there must the usual deprecated object-based one, a fastutil-primitive one, and a JDK-primitive one if it is possible. We have by now functions (unary operators), predicates, consumers and binary operators, which should cover everything type-specific presently (except of course for two-argument functions in which the three types involved are not all the same, as in Map.computeIf*()).

    The solution chosen is as follows (at present time, for Consumer/Predicate-related methods):

    • in the int/long/double case, we inherit from the JDK or implement the overload accepting a JDK type-specific functor. An @apiNote will explain that this is the method to override.

    • in the remaining case, the overload accepting a fastutil type-specific functor will be implemented, instead. Again, an @apiNote will explain that this is the method to override.

    • in the int/long/double case, the overload accepting a fastutil type-specific functor delegates to the overload accepting a JDK type-specific functor. A WARNING will explain that this method should not be overridden.

    • in the remaining cases, the overload accepting a widened JDK type-specific functor checks for the functor being an instance of fastutil type-specific functor, in which cases passes directly the functor with a cast, or generates a new lambda using the only method of the functor. An @apiNote that will explain that this method incurs in a slight overhead.

    To make the implementation easy, the functor type associated with the actual implementation is stored in a macro, so the implementation of the first two cases is the same. The other implementations are contained within conditional code, which makes it possible to add appropriate comments. See Consumer.drv for an example.

    Another issue is whether to use the JDK-primitive method names, if available, instead of fastutil's names. In principle I'm against this option because fastutil has always avoided adding disambiguating strings to methods if the type of the arguments was sufficient. So a BinaryOperator should have an apply() method. The applyAsInt() coming from the JDK should be delegated with a default method to apply(). This delegation would actually never happen inside fastutil at runtime if we proceed as explained above, because a ready-made JDK-primitive thing would be turned into a new lambda in which the JDK-primitive method is used to implement the fastutil-primitive method.

    opened by vigna 33
  • Harden immutability of ImmutableList.

    Harden immutability of ImmutableList.

    Harden immutability of ImmutableList.

    This closes a few avenues that could have lead to mutating the supposed to be immutable list. In particular, the backing array was non-final and protected (now private and final), and mutations could still happen through the iterator.

    Adds the cult-favorite optimization of sharing a single empty instance if given empty arguments to the static constructor.

    Also add fast toArray for object based ImmutableList and ArrayList.

    opened by techsy730 30
  • Clean up some of the new

    Clean up some of the new "of()" static factory methods.

    Clean up the of methods()

    • Make the of methods in the root List and Set interfaces return immutable collections (like the platform does)
    • Validate that duplicate entries to the "of" method of sets throws (like the platform does)

    Because of this validation, it is now worth it to have overloads of 0 and 1 element, which skip the temporary array and validation (0 and 1 element sets can never have duplication).

    In the spirit of ArraySet's "just trust me" API of working raw arrays unvalidated, an it has an offshoot ofUnchecked that takes the elements as is and does no validation, like the array accepting constructors do.

    opened by techsy730 24
  • Insert-Performance in Object2ObjectOpenHashMap depending on insertion order

    Insert-Performance in Object2ObjectOpenHashMap depending on insertion order

    I'm using an Object2ObjectOpenHashMap as an internal data structure. When my program ends, I write out the content to disk (using custom serialization code, iterating over the map). When the program the next time starts, it reads the data back in from disk, inserting the objects again into the map. I realized that reading the data back in is much slower than inserting the objects in the first round. After some testing, I found out the following:

    • Inserting objects into an Object2ObjectOpenHashMap in "random" order requires some amount of calls to equals() in the inserted Objects.
    • Inserting objects into an Object2ObjectOpenHashMap in the order as they are returned from the iterator from another Object2ObjectOpenHashMap, it requires many more calls to equals() in the inserted Objects (at least one magnitude more calls).

    If the equals() method does a bit of logic, it has a huge impact on performance.

    I'll attach a short piece of code that demonstrates the problem, inserting Strings into the Map (Actually, WrappedStrings so I'm able to count the number of equals() invocations). Inserting the 2 million Strings the first time requires about 5 million equal() calls, while inserting them the second time requires about 180 million equal() calls.

    It reminds me a bit of the problem in Java's HashMap a while ago (Java 7?) where it was said inserting objects in a special order could lead to Denial of Service attacks due to very poor performance in such cases.

    InsertTest.java.txt

    opened by mrieser 20
  • NPE while putting into Object2IntOpenHashMap

    NPE while putting into Object2IntOpenHashMap

    Hi, NPE while putting into Object2IntOpenHashMap, could anyone help?

    java.lang.NullPointerException
     at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.rehash(Object2IntOpenHashMap.java:1313) ~[fastutil-8.1.0.jar:?]
     at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.insert(Object2IntOpenHashMap.java:322) ~[fastutil-8.1.0.jar:?]
     at it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap.put(Object2IntOpenHashMap.java:331) ~[fastutil-8.1.0.jar:?]
    
    opened by tenghuanhe 19
  • Info request

    Info request

    Hello to Everyone who reads this. Maybe even @vigna.

    I want to help with this project. Make improvements etc. I have no issue with working with drv files and do all these edits even if there is no compiler help.

    What I have an issue with is test the code because there is no real instructions/tutorial on how to actually generate the source code. What this issue is about is very simple. Please provide a instruction list/tutorial how to generate these things. If there is a difference between linux and windows and mac then provide instructions how to do these.

    Why am i requesting this help? Very simple. There is some requirements to write tests & test them. Ensure code is actually working. Which has to be done A either in the head or B not at all. This is prone to cause errors or mistakes in the code that may be not found.

    The single requirement: Make them easy and simple to understand so that someone who has never worked with sh scripts & make-files before can understand them. Because right now you need actual experience to know what to do to do simple exporting of the source code.

    Also the Wikisection is great for such things.

    Thanks for reading. Speiger

    opened by Speiger 19
  • Introduced jump inside iterator

    Introduced jump inside iterator

    This is a memory optimization for the case when a user required to modify iterator by moving it to a desire position. This works the same way as iteraotr(fromElement) but doesn't create a new iterator that decreases memory footprint at an algorithms which makes a lot of jumps.

    opened by catap 6
  • putIfAbsent should replace a null value

    putIfAbsent should replace a null value

    This was shown in a recent Java Collections Puzzlers. The JavaDoc for Map.putIfAbsent states the following.

    If the specified key is not already associated with a value (or is mapped to {@code null}) associates it with the given value and returns {@code null}, else returns the current value.

    @Test
    public void putIfAbsent_nullExistingValue() {
      var map = new Object2ObjectOpenHashMap<Object, Object>();
      map.put("a", null);
      map.putIfAbsent("a", "b");
      assertThat(map).containsEntry("a", "b");
    }
    
    FAILED: Test.putIfAbsent_nullExistingValue
    key is present but with a different value
    value of: map.get(a)
    expected: b
    but was : null
    map was : {a=>null}
    

    The computeIfAbsent method has similar wording and fails in this scenario.

    I have requested that this peculiarity of the api be added to Guava's testlib.

    opened by ben-manes 2
  • Primitive type equivalent of Map#forEach is missing

    Primitive type equivalent of Map#forEach is missing

    I would like to iterate a map's entry set without the need for boxing. I found no equivalent for Map#forEach though.

    One could achieve this via iterating over *2*EntrySet() but the code looks less readable then.

    opened by markusheiden 5
  • *MappedBigList should implement a FlyweightPrototype interface

    *MappedBigList should implement a FlyweightPrototype interface

    Currently there is no way to check dynamically if an BigList implements copy() or not. If all the MappedBigList implemented FlyweightPrototype, it would be easier to make a function that makes a threadsafe copy of any type of BigList.

    Unfortunately FlyweightPrototype is in dsiutils, not fastutil, so I don't know what's the best way to proceed here.

    opened by seirl 1
  • Document that only trusted data should be deserialized

    Document that only trusted data should be deserialized

    It appears many of the classes which implement Serializable allocate memory based on a size value specified in the deserialized data, e.g. read an int as size value and create an array of that size. I assume this is done intentionally for good performance; however, it should be documented. Adversaries can abuse this to cause Denial of Service attacks. For example the Guava library received CVE-2018-10237 for the same issue. (I just want to point out here that this can be considered a security issue; I am not asking for fastutil to get a CVE or to change the implementation.)

    It would therefore be good to at least in the README and the Javadoc main page describe that the deserialization implementation is designed for efficiency and should not be used for untrusted data.

    opened by Marcono1234 2
  • Add collectors for Maps

    Add collectors for Maps

    This adds collectors for Stream to fastutil Maps.

    Future improvements would be to investigate supporting primitive streams (e.g. IntStream) and improving the type signature for collectors where char is the key or value. When char is the key, the functions are for int. When char is the value, the functions are for Object.

    ~~Also there is boxing that could be avoided using more specialized function types, like Int2IntFunction instead of ToIntFunction<Integer>.~~

    opened by shawjef3 5
Owner
Sebastiano Vigna
Sebastiano Vigna
This service checks the Co-WIN public API at a specific interval and send update to users specified telegram bot.

COVID VACCINE TELEGRAM BOT USING SPRING BOOT This application is a covid vaccine slot notifier via telegram bot. This application uses public CO-WIN A

Hardeek Sharma 6 Oct 4, 2022
Ultra-fast SQL-like queries on Java collections

CQEngine - Collection Query Engine CQEngine – Collection Query Engine – is a high-performance Java collection which can be searched with SQL-like quer

Niall Gallagher 1.6k Dec 30, 2022
An extensible Java framework for building XML and non-XML streaming applications

Smooks Framework This is the Git source code repository for the Smooks Project. Build Status Building Pre-requisites JDK 8 Apache Maven 3.2.x Maven gi

Smooks Framework 353 Dec 1, 2022
Auto Code Audit Framework for Java

前言 笔者最近在研究java自动化代码审计这方面的内容,也看了一些相关的文章,其中主要是跟着4ra1n师傅的文章进行学习的。目前学到的有两种自动化审计思路,一是AST,二是ASM。前者基于java源代码,后者基于字节码。

r2 61 Nov 30, 2022
Core part of pipes framework plus some commonly used extensions

Pipes Pipes is a simple, lightweight data processing framework for Java. This repo comes with the core part plus three extensions (For Google Big Quer

null 7 Oct 4, 2022
Sauron, the all seeing eye! It is a service to generate automated reports and track migrations, changes and dependency versions for backend services also report on known CVE and security issues.

SAURON - VERSION AND DEPLOYMENT TRACKER DESCRIPTION Sauron, the all seeing eye! It is a service to generate automated reports and track migrations, ch

FREENOWTech 20 Oct 31, 2022
A Local implementation of a java library functions to create a serverside and clientside application which will communicate over TCP using given port and ip address.

A Local implementation of java library functions to create a serverside and clientside application which will communicate over TCP using given port and ip address.

Isaac Barry 1 Feb 12, 2022
Java Notes & Codes for better understanding and it contains all the funtions with examples and also added Cheat Sheet for Revision

Java Notes & Codes for better understanding and it contains all the funtions with examples and also added Cheat Sheet for Revision...

Ujjawal Singh 1 Nov 30, 2022
A calculator that performs various operations such as addition, subtraction, multiplication and division of positive and negative values

A calculator that performs various operations such as addition, subtraction, multiplication and division of positive and negative values The calculator also does percentages, square roots and squares

Andrey Fabricio 1 Jan 31, 2022
Tripoli imports raw mass spectrometer data files and supports interactive review and archiving of isotopic data.

Tripoli imports raw mass spectrometer data files and supports interactive review and archiving of isotopic data. Tripoli facilitates visualization of temporal trends and scatter during measurement, statistically rigorous filtering of data, and calculation of statistical parameters.

CIRDLES 7 Dec 15, 2022
Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Google Cloud Platform 18 Dec 29, 2022
A Java to iOS Objective-C translation tool and runtime.

J2ObjC: Java to Objective-C Translator and Runtime Project site: https://j2objc.org J2ObjC blog: https://j2objc.blogspot.com Questions and discussion:

Google 5.9k Dec 29, 2022
Make Slack and Facebook Bots in Java.

JBot Make bots in Java. JBot is a java framework (inspired by Howdyai's Botkit) to make Slack and Facebook bots in minutes. It provides all the boiler

Ram 1.2k Dec 18, 2022
API gateway for REST and SOAP written in Java.

Membrane Service Proxy Reverse HTTP proxy (framework) written in Java, that can be used as an API gateway as a security proxy for HTTP based integrati

predic8 GmbH 389 Dec 31, 2022
The open-source Java obfuscation tool working with Ant and Gradle by yWorks - the diagramming experts

yGuard yGuard is an open-source Java obfuscation tool. With yGuard it is easy as pie ( ?? ) to configure obfuscation through an extensive ant task. yG

yWorks GmbH 265 Jan 2, 2023
An example mod that uses Vigilance and Essential with Java

Vigilance Example Mod (Java) I haven't really seen any mods that use Vigilance and Essential with Java, so here's a quick example mod I made. It's bas

null 6 Dec 25, 2022
A harness to build the source code from openjdk.java.net using Free Software tools and dependencies

A harness to build the source code from openjdk.java.net using Free Software tools and dependencies

IcedTea 2 Mar 5, 2022
Implementation of various design patterns in C++, Java and Python

DesignPatterns Implementation of various design patterns in C++, Java and Python. Strategy Pattern Description: Strategy Pattern in implemented in a p

Lakshmanan Meiyappan 12 Jul 20, 2022
Select USACO solutions in Java and C++

Select USACO solutions in Java and C++

Akshar Barot 2 Sep 14, 2022