Simple full text indexing and searching library for Java

Overview

Download Build Status codecov

indexer4j

Simple full text indexing and searching library for Java

Install

Gradle

repositories {
    jcenter()
}

dependencies {
   compile 'com.haeungun.indexer4j:indexer4j:<:the-latest-version>'
}

Maven

<dependency>
	<groupId>com.haeungun.indexer4j</groupId>
	<artifactId>indexer4j</artifactId>
	<version>the-latest-version</version>
	<type>pom</type>
</dependency>

Features

  • Support TF-IDF, BM25 score
  • Support Regex, N-gram tokenizer
  • Easy indexing with field annotation

TODO

  • Parrallel build and search
  • Support JDK 11 CI on travis CI (Jacoco does not support JDK11 yet)
  • Improve saving and loading features

Examples

@Document
public class ExampleDocument {

    @DocumentId
    private String id;

    @DocumentField
    private String title;

    @DocumentField
    private String contents;

    public ExampleDocument(String id, String title, String contents) {
        this.id = id;
        this.title = title;
        this.contents = contents;
    }

    public String getId() {
        return id;
    }

    public String getTitle() {
        return title;
    }

    public String getContents() {
        return contents;
    }
}

List<ExampleDocument> documents = Arrays.asList(
            new ExampleDocument("doc1", "First Document", "Lorem Lorem Lorem Lorem Lorem"),
            new ExampleDocument("doc2", "Third Document", "Lorem is hello java python"),
            new ExampleDocument("doc3", "Second Document", "Lorem ipsum dolor"),
            new ExampleDocument("doc4", "Forth Document", "Lorem"));

Indexer<ExampleDocument> index = new Indexer<>();
for (ExampleDocument document : documents) {
     index.add(document);
}

index.build();

List<SearchResult> result = index.search("First");
// output => [{docId="doc1", score="..."}]

List<SearchResult> result2 = index.search("Lorem ipsum");
// output = [{docId="doc3", score="..."}, {docId="doc1", score="..."}, ...] 
You might also like...

Minecraft mod to block NameMC indexing on servers.

Fuck NameMC A mod to block server status ping from NameMC. ๐Ÿ“– What's this mod? Let's say it outright, NameMC doesn't have any decency. It indexes ever

Dec 28, 2022

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Jan 5, 2023

Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

Search API to handle Spelling-Corrections Based on N-gram index algorithm: using MySQL Ngram Full-Text Parser Sample Screen-Recording Screen.Recording

Dec 4, 2021

A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Lucene Serverless This project demonstrates a proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework. โœ”๏ธ

Oct 29, 2022

Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.

twitter-text This repository is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, tes

Jan 8, 2023

Extract text from a PDF (pdf to text). Api for PHP/JS/Python and others.

Extract text from a PDF (pdf to text). API in docker. Why did we create this project? In the Laravel project, it was necessary to extract texts from l

May 13, 2022

Facsimile - Copy Your Most Used Text to Clipboard Easily with Facsimile!. It Helps You to Store You Most Used Text as a Key, Value Pair and Copy it to Clipboard with a Shortcut.

Facsimile - Copy Your Most Used Text to Clipboard Easily with Facsimile!. It Helps You to Store You Most Used Text as a Key, Value Pair and Copy it to Clipboard with a Shortcut.

Facsimile An exact copy of Your Information ! Report Bug ยท Request Feature Table of Contents About The Project Built With Getting Started Installation

Sep 12, 2022

Text Object Java Objects (TOJOs): an object representation of a multi-line structured text file like CSV

It's a simple manager of "records" in a text file of CSV, JSON, etc. format. It's something you would use when you don't want to run a full database,

Dec 27, 2022

Open source Picture to text, text to Picture app

Open source Picture to text, text to Picture app

Pic SMS App Pic SMS is a free open source app. With Pic SMS, you can: convert pictures into text parts and send as SMS convert text parts into a pictu

Feb 8, 2022

RTL marquee text view android right to left moving text - persian - farsi - arabic - urdo

RTL marquee text view android right to left moving text - persian - farsi - arabic - urdo

RtlMarqueeView RTL marquee text view can hande the speed of moving text can jump to the specefic position of the text at start can loop the marquee te

Feb 14, 2022

A VisionCamera Frame Processor Plugin to preform text detection on images using MLKit Vision Text Recognition

A VisionCamera Frame Processor Plugin to preform text detection on images using MLKit Vision Text Recognition

vision-camera-ocr A VisionCamera Frame Processor Plugin to preform text detection on images using MLKit Vision Text Recognition. Installation yarn add

Dec 19, 2022

A simple live streaming mobile app with cool functionalities and time extension, and live chat. With a payment system integrated. Server is designed with socket.io to give you full flexibility.

A simple live streaming mobile app with cool functionalities and time extension, and live chat. With a payment system integrated. Server is designed with socket.io to give you full flexibility.

Video Live Streaming Platform Android A simple live streaming mobile app with cool functionalities and time extension, and live chat. With a payment s

Dec 16, 2022

Full-featured Socket.IO Client Library for Java, which is compatible with Socket.IO v1.0 and later.

Socket.IO-client Java This is the Socket.IO Client Library for Java, which is simply ported from the JavaScript client. See also: Android chat demo en

Jan 4, 2023

๐Ÿ‘„ The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

๐Ÿ‘„ The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Dec 28, 2022

Nightmare-text - This is a simple lib that help to create, titles, actionbars, hovers and click actions chat components.

Nightmare text This is a simple lib that help to create, titles, actionbars, hovers and click actions chat components. Setup public final class Testin

Mar 9, 2022

A complete and performing library to highlight text snippets (EditText, SpannableString and TextView) using Spannable with Regular Expressions (Regex) for Android.

A complete and performing library to highlight text snippets (EditText, SpannableString and TextView) using Spannable with Regular Expressions (Regex) for Android.

Highlight A complete and performing library to highlight text snippets (EditText/Editable and TextView) using Spannable with Regular Expressions (Rege

Dec 22, 2022

Java library for creating text-based GUIs

Java library for creating text-based GUIs

Lanterna Lanterna is a Java library allowing you to write easy semi-graphical user interfaces in a text-only environment, very similar to the C librar

Dec 31, 2022

A Java API for checking if text contains profanity via the alt-profanity-checker Python library.

ProfanityCheckerAPI A Java API for checking if text contains profanity via the alt-profanity-checker Python library. It uses jep to run and interpret

Feb 19, 2022
Comments
  • The document must have only one ID

    The document must have only one ID

    If multiple IDs are declared, an exception must be thrown. Make sure that multiple document IDs are not declared.

    https://github.com/haeungun/indexer4j/blob/319dda52a0213130b1574b35b129a2838a5e4c62/src/main/java/com/haeungun/indexer4j/core/DocumentExtractor.java#L51

    invalid 
    opened by haeungun 0
Releases(0.3.0)
  • 0.3.0(Jan 27, 2019)

    Changelog

    • Implement N-gram tokenizer
    • Add Indexer constructor for a custom tokenizer
    • Improve unit test coverage

    Maven

    <dependency>
    	<groupId>com.haeungun.indexer4j</groupId>
    	<artifactId>indexer4j</artifactId>
    	<version>0.3.0</version>
    	<type>pom</type>
    </dependency>
    

    Gradle

    compile 'com.haeungun.indexer4j:indexer4j:0.3.0'
    
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Jan 21, 2019)

    Changelog

    • Support Object type for DocumentId
    • Add topK parameter for search API
    • Improve unittest coverage
    • Add java doc

    Maven

    <dependency>
    	<groupId>com.haeungun.indexer4j</groupId>
    	<artifactId>indexer4j</artifactId>
    	<version>0.2.0</version>
    	<type>pom</type>
    </dependency>
    

    Gradle

    compile 'com.haeungun.indexer4j:indexer4j:0.2.0'
    
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Jan 20, 2019)

    Maven

    <dependency>
    	<groupId>com.haeungun.indexer4j</groupId>
    	<artifactId>indexer4j</artifactId>
    	<version>0.1.0</version>
    	<type>pom</type>
    </dependency>
    

    Gradle

    compile 'com.haeungun.indexer4j:indexer4j:0.1.0'
    

    Changelog

    • First release to jcenter
    Source code(tar.gz)
    Source code(zip)
Owner
Haeun Kim
Hello :D
Haeun Kim
A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Lucene Serverless This project demonstrates a proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework. โœ”๏ธ

Arseny Yankovsky 38 Oct 29, 2022
A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

Adnan Hossain 6 Oct 24, 2022
filehunter - Simple, fast, open source file search engine

Simple, fast, open source file search engine. Designed to be local file search engine for places where multiple documents are stored on multiple hosts with multiple directories.

null 32 Sep 14, 2022
Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

Apache Solr is an enterprise search platform written in Java and using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

The Apache Software Foundation 630 Dec 28, 2022
Free and Open, Distributed, RESTful Search Engine

Elasticsearch A Distributed RESTful Search Engine https://www.elastic.co/products/elasticsearch Elasticsearch is a distributed RESTful search engine b

elastic 62.3k Dec 31, 2022
Apache Lucene and Solr open-source search software

Apache Lucene and Solr have separate repositories now! Solr has become a top-level Apache project and main line development for Lucene and Solr is hap

The Apache Software Foundation 4.3k Jan 7, 2023
GitHub Search Engine: Web Application used to retrieve, store and present projects from GitHub, as well as any statistics related to them.

GHSearch Platform This project is made of two subprojects: application: The main application has two main responsibilities: Crawling GitHub and retrie

SEART - SoftwarE Analytics Research Team 68 Nov 25, 2022
OpenSearch is an open source distributed and RESTful search engine.

OpenSearch is an open source search and analytics engine derived from Elasticsearch

null 6.2k Jan 1, 2023
Mp4grep - a CLI for transcribing and searching audio/video files

mp4grep mp4grep is a tool that transcribes and searches audio and video files for a regex pattern. mp4grep isn't just for mp4 files! It also supports

ooc 250 Dec 23, 2022