Apache Lucene is a high-performance, full featured text search engine library written in Java.

Overview

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building with Gradle

Basic steps:

  1. Install OpenJDK 11 (or greater)
  2. Download Lucene from Apache and unpack it (or clone the git repository).
  3. Run gradle launcher script (gradlew).

Step 0) Set up your development environment (OpenJDK 11 or greater)

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README. Lucene runs with Java 11 or later.

Lucene uses Gradle for build control.

NOTE: Lucene changed from Ant to Gradle as of release 9.0. Prior releases still use Ant.

Step 1) Checkout/Download Lucene source code

You can clone the source code from GitHub:

https://github.com/apache/lucene

or get Lucene source archives for a particular release from:

https://lucene.apache.org/core/downloads.html

Download either a zip or a tarred/gzipped version of the archive, and uncompress it into a directory of your choice.

Step 2) Run Gradle

Run "./gradlew help", this will show the main tasks that can be executed to show help sub-topics.

If you want to build Lucene, type:

./gradlew assemble

NOTE: DO NOT use gradle command that is already installed on your machine (unless you know what you'll do). The "gradle wrapper" (gradlew) does the job - downloads the correct version of it, setups necessary configurations.

The first time you run Gradle, it will create a file "gradle.properties" that contains machine-specific settings. Normally you can use this file as-is, but it can be modified if necessary.

./gradlew check will assemble Lucene and run all validation tasks (including unit tests).

./gradlew help will print a list of help guides that help understand how the build and typical workflow works.

If you want to build the documentation, type:

./gradlew documentation

Gradle build and IDE support

  • IntelliJ - IntelliJ idea can import the project out of the box.
  • Eclipse - Basic support (help/IDEs.txt).
  • Netbeans - Not tested.

Contributing

Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support

Comments
  • Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector [LUCENE-1483]

    Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector [LUCENE-1483]

    This issue changes how an IndexSearcher searches over multiple segments. The current method of searching multiple segments is to use a MultiSegmentReader and treat all of the segments as one. This causes filters and FieldCaches to be keyed to the MultiReader and makes reopen expensive. If only a few segments change, the FieldCache is still loaded for all of them.

    This patch changes things by searching each individual segment one at a time, but sharing the HitCollector used across each segment. This allows FieldCaches and Filters to be keyed on individual SegmentReaders, making reopen much cheaper. FieldCache loading over multiple segments can be much faster as well - with the old method, all unique terms for every segment is enumerated against each segment - because of the likely logarithmic change in terms per segment, this can be very wasteful. Searching individual segments avoids this cost. The term/document statistics from the multireader are used to score results for each segment.

    When sorting, its more difficult to use a single HitCollector for each sub searcher. Ordinals are not comparable across segments. To account for this, a new field sort enabled HitCollector is introduced that is able to collect and sort across segments (because of its ability to compare ordinals across segments). This TopFieldCollector class will collect the values/ordinals for a given segment, and upon moving to the next segment, translate any ordinals/values so that they can be compared against the values for the new segment. This is done lazily.

    All and all, the switch seems to provide numerous performance benefits, in both sorted and non sorted search. We were seeing a good loss on indices with lots of segments (1000?) and certain queue sizes / queries, but the latest results seem to show thats been mostly taken care of (you shouldnt be using such a large queue on such a segmented index anyway).

    • Introduces
      • MultiReaderHitCollector - a HitCollector that can collect across multiple IndexReaders. Old HitCollectors are wrapped to support multiple IndexReaders.
      • TopFieldCollector - a HitCollector that can compare values/ordinals across IndexReaders and sort on fields.
      • FieldValueHitQueue - a Priority queue that is part of the TopFieldCollector implementation.
      • FieldComparator - a new Comparator class that works across IndexReaders. Part of the TopFieldCollector implementation.
      • FieldComparatorSource - new class to allow for custom Comparators.
    • Alters
      • IndexSearcher uses a single HitCollector to collect hits against each individual SegmentReader. All the other changes stem from this ;)
    • Deprecates
      • TopFieldDocCollector
      • FieldSortedHitQueue

    Migrated from LUCENE-1483 by Mark Miller (@markrmiller), 1 vote, resolved Feb 02 2009 Attachments: LUCENE-1483.patch (versions: 35), LUCENE-1483-backcompat.patch, LUCENE-1483-partial.patch, sortBench.py, sortCollate.py Linked issues:

    • #2381
    • #3793
    type:enhancement legacy-jira-resolution:Fixed legacy-jira-priority:Minor legacy-jira-fix-version:2.9 affects-version:2.9 
    opened by asfimport 319
  • Further steps towards flexible indexing [LUCENE-1458]

    Further steps towards flexible indexing [LUCENE-1458]

    I attached a very rough checkpoint of my current patch, to get early feedback. All tests pass, though back compat tests don't pass due to changes to package-private APIs plus certain bugs in tests that happened to work (eg call TermPostions.nextPosition() too many times, which the new API asserts against).

    [Aside: I think, when we commit changes to package-private APIs such that back-compat tests don't pass, we could go back, make a branch on the back-compat tag, commit changes to the tests to use the new package private APIs on that branch, then fix nightly build to use the tip of that branch?o]

    There's still plenty to do before this is committable! This is a rather large change:

    • Switches to a new more efficient terms dict format. This still uses tii/tis files, but the tii only stores term & long offset (not a TermInfo). At seek points, tis encodes term & freq/prox offsets absolutely instead of with deltas delta. Also, tis/tii are structured by field, so we don't have to record field number in every term. . On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 MB -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). . RAM usage when loading terms dict index is significantly less since we only load an array of offsets and an array of String (no more TermInfo array). It should be faster to init too. . This part is basically done.

    • Introduces modular reader codec that strongly decouples terms dict from docs/positions readers. EG there is no more TermInfo used when reading the new format. . There's nice symmetry now between reading & writing in the codec chain – the current docs/prox format is captured in:

      FormatPostingsTermsDictWriter/Reader
      FormatPostingsDocsWriter/Reader (.frq file) and
      FormatPostingsPositionsWriter/Reader (.prx file).
      
      This part is basically done.
      
    • Introduces a new "flex" API for iterating through the fields, terms, docs and positions:

      FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum
      
      This replaces TermEnum/Docs/Positions.  SegmentReader emulates the
      old API on top of the new API to keep back-compat.
      

    Next steps:

    • Plug in new codecs (pulsing, pfor) to exercise the modularity / fix any hidden assumptions.

    • Expose new API out of IndexReader, deprecate old API but emulate old API on top of new one, switch all core/contrib users to the new API.

    • Maybe switch to AttributeSources as the base class for TermsEnum, DocsEnum, PostingsEnum – this would give readers API flexibility (not just index-file-format flexibility). EG if someone wanted to store payload at the term-doc level instead of term-doc-position level, you could just add a new attribute.

    • Test performance & iterate.


    Migrated from LUCENE-1458 by Michael McCandless (@mikemccand), 1 vote, resolved Dec 03 2009 Attachments: LUCENE-1458_rotate.patch, LUCENE-1458_sortorder_bwcompat.patch, LUCENE-1458_termenum_bwcompat.patch, LUCENE-1458.patch (versions: 13), LUCENE-1458.tar.bz2 (versions: 7), LUCENE-1458-back-compat.patch (versions: 6), LUCENE-1458-DocIdSetIterator.patch (versions: 2), LUCENE-1458-MTQ-BW.patch, LUCENE-1458-NRQ.patch, UnicodeTestCase.patch (versions: 2) Linked issues:

    • #3100
    type:enhancement legacy-jira-resolution:Fixed module:core/index legacy-jira-priority:Minor legacy-jira-fix-version:4.0-ALPHA affects-version:4.0-ALPHA 
    opened by asfimport 256
  • Per thread DocumentsWriters that write their own private segments [LUCENE-2324]

    Per thread DocumentsWriters that write their own private segments [LUCENE-2324]

    See #3369 for motivation and more details.

    I'm copying here Mike's summary he posted on 2293:

    Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and "normal" segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO & CPU.


    Migrated from LUCENE-2324 by Michael Busch, 1 vote, resolved Apr 28 2011 Attachments: ASF.LICENSE.NOT.GRANTED--lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch (versions: 4), LUCENE-2324-SMALL.patch (versions: 5), test.out (versions: 4) Linked issues:

    • #3388
    • #4102
    • #4030
    • #3955
    • #3647
    • #3369
    type:enhancement legacy-jira-resolution:Fixed module:core/index legacy-jira-priority:Minor legacy-jira-fix-version:Realtime Branch 
    opened by asfimport 241
  • Automaton Query/Filter (scalable regex) [LUCENE-1606]

    Automaton Query/Filter (scalable regex) [LUCENE-1606]

    Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable).

    Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms.

    Some use cases I envision:

    1. lexicography/etc on large text corpora
    2. looking for things such as urls where the prefix is not constant (http:// or ftp://)

    The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter "enumerates" terms in a special way, by using the underlying state machine. Here is my short description from the comments:

     The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do:
      
     1. Look at the portion that is OK (did not enter a reject state in the DFA)
     2. Generate the next possible String and seek to that.
    

    the Query simply wraps the filter with ConstantScoreQuery.

    I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.


    Migrated from LUCENE-1606 by Robert Muir (@rmuir), resolved Dec 09 2009 Attachments: automaton.patch, automatonMultiQuery.patch, automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, automatonWithWildCard.patch, automatonWithWildCard2.patch, BenchWildcard.java, LUCENE-1606_nodep.patch, LUCENE-1606.patch (versions: 15), LUCENE-1606-flex.patch (versions: 12) Linked issues:

    • #3186
    • #3187
    • #3166
    type:enhancement legacy-jira-resolution:Fixed legacy-jira-priority:Minor module:core/search legacy-jira-fix-version:4.0-ALPHA 
    opened by asfimport 224
  • Integrate lat/lon BKD and spatial3d [LUCENE-6699]

    Integrate lat/lon BKD and spatial3d [LUCENE-6699]

    I'm opening this for discussion, because I'm not yet sure how to do this integration, because of my ignorance about spatial in general and spatial3d in particular :)

    Our BKD tree impl is very fast at doing lat/lon shape intersection (bbox, polygon, soon distance: LUCENE-6698) against previously indexed points.

    I think to integrate with spatial3d, we would first need to record lat/lon/z into doc values. Somewhere I saw discussion about how we could stuff all 3 into a single long value with acceptable precision loss? Or, we could use BinaryDocValues? We need all 3 dims available to do the fast per-hit query time filtering.

    But, second: what do we index into the BKD tree? Can we "just" index earth surface lat/lon, and then at query time is spatial3d able to give me an enclosing "surface lat/lon" bbox for a 3d shape? Or ... must we index all 3 dimensions into the BKD tree (seems like this could be somewhat wasteful)?


    Migrated from LUCENE-6699 by Michael McCandless (@mikemccand), 1 vote, resolved Sep 02 2015 Attachments: Geo3DPacking.java, LUCENE-6699.patch (versions: 26) Linked issues:

    • #7539
    type:enhancement legacy-jira-priority:Major legacy-jira-resolution:Fixed legacy-jira-fix-version:6.0 legacy-jira-fix-version:5.4 
    opened by asfimport 220
  • if a filter can support random access API, we should use it [LUCENE-1536]

    if a filter can support random access API, we should use it [LUCENE-1536]

    I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API.

    This was inspired by #2550, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit.

    Some notes on the test:

    • Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.

    • I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. "u s" means "united states" (phrase search).

    • I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.99999 (filter is non-null but all bits are set), 100 (filter=null, control)).

    • Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today).

    • Baseline (QPS) is current trunk, where filter is applied as iterator up "high" (ie in IndexSearcher's search loop).


    Migrated from LUCENE-1536 by Michael McCandless (@mikemccand), 2 votes, resolved Oct 25 2011 Attachments: CachedFilterIndexReader.java, changes-yonik-uwe.patch, LUCENE-1536_hack.patch, LUCENE-1536.patch (versions: 29), LUCENE-1536-rewrite.patch (versions: 8), luceneutil.patch Linked issues:

    type:enhancement legacy-jira-resolution:Fixed legacy-jira-priority:Minor module:core/search legacy-jira-fix-version:4.0-ALPHA affects-version:2.4 legacy-jira-label:mentor legacy-jira-label:gsoc2011 legacy-jira-label:lucene-gsoc-11 
    opened by asfimport 211
  • Allow Scorer to expose positions and payloads aka. nuke spans [LUCENE-2878]

    Allow Scorer to expose positions and payloads aka. nuke spans [LUCENE-2878]

    Currently we have two somewhat separate types of queries, the one which can make use of positions (mainly spans) and payloads (spans). Yet Span*Query doesn't really do scoring comparable to what other queries do and at the end of the day they are duplicating lot of code all over lucene. Span*Queries are also limited to other Span*Query instances such that you can not use a TermQuery or a BooleanQuery with SpanNear or anthing like that. Beside of the Span*Query limitation other queries lacking a quiet interesting feature since they can not score based on term proximity since scores doesn't expose any positional information. All those problems bugged me for a while now so I stared working on that using the bulkpostings API. I would have done that first cut on trunk but TermScorer is working on BlockReader that do not expose positions while the one in this branch does. I started adding a new Positions class which users can pull from a scorer, to prevent unnecessary positions enums I added ScorerContext#needsPositions and eventually Scorere#needsPayloads to create the corresponding enum on demand. Yet, currently only TermQuery / TermScorer implements this API and other simply return null instead. To show that the API really works and our BulkPostings work fine too with positions I cut over TermSpanQuery to use a TermScorer under the hood and nuked TermSpans entirely. A nice sideeffect of this was that the Position BulkReading implementation got some exercise which now :) work all with positions while Payloads for bulkreading are kind of experimental in the patch and those only work with Standard codec.

    So all spans now work on top of TermScorer ( I truly hate spans since today ) including the ones that need Payloads (StandardCodec ONLY)!! I didn't bother to implement the other codecs yet since I want to get feedback on the API and on this first cut before I go one with it. I will upload the corresponding patch in a minute.

    I also had to cut over SpanQuery.getSpans(IR) to SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk first but after that pain today I need a break first :).

    The patch passes all core tests (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't look into the MemoryIndex BulkPostings API yet)


    Migrated from LUCENE-2878 by Simon Willnauer (@s1monw), 11 votes, resolved Apr 11 2018 Attachments: LUCENE-2878_trunk.patch (versions: 2), LUCENE-2878.patch (versions: 30), LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, PosHighlighter.patch (versions: 2) Linked issues:

    • #5590

    Sub-tasks:

    • #4391
    • #4392
    • #4393
    • #4394
    • #5617
    • #5618
    • #5621
    • #5638
    type:enhancement legacy-jira-priority:Major module:core/search legacy-jira-label:gsoc2014 legacy-jira-fix-version:8.0 legacy-jira-resolution:Implemented legacy-jira-fix-version:7.4 affects-version:Positions Branch 
    opened by asfimport 209
  • Separately specify a field's type [LUCENE-2308]

    Separately specify a field's type [LUCENE-2308]

    This came up from dicussions on IRC. I'm summarizing here...

    Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc.

    I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields.

    The Field instance would still hold the actual value.

    We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper).

    This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index.

    This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters...


    Migrated from LUCENE-2308 by Michael McCandless (@mikemccand), 2 votes, resolved Mar 18 2013 Attachments: LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch (versions: 5), LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-FT-interface.patch (versions: 4), LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch Linked issues:

    • #3393
    • #4250

    Sub-tasks:

    • #3386
    type:enhancement legacy-jira-priority:Major legacy-jira-resolution:Fixed module:core/index legacy-jira-label:mentor legacy-jira-label:gsoc2011 legacy-jira-label:lucene-gsoc-11 legacy-jira-fix-version:4.0 
    opened by asfimport 200
  • Make Luke a Lucene/Solr Module [LUCENE-2562]

    Make Luke a Lucene/Solr Module [LUCENE-2562]

    see "RE: Luke - in need of maintainer": http://markmail.org/message/m4gsto7giltvrpuf "Web-based Luke": http://markmail.org/message/4xwps7p7ifltme5q

    I think it would be great if there was a version of Luke that always worked with trunk - and it would also be great if it was easier to match Luke jars with Lucene versions.

    While I'd like to get GWT Luke into the mix as well, I think the easiest starting point is to straight port Luke to another UI toolkit before abstracting out DTO objects that both GWT Luke and Pivot Luke could share.

    I've started slowly converting Luke's use of thinlet to Apache Pivot. I haven't/don't have a lot of time for this at the moment, but I've plugged away here and there over the past work or two. There is still a lot to do.

    luke1.jpg

    luke2.jpg

    luke3.jpg

    Luke-ALE-1.png

    Luke-ALE-2.png

    Luke-ALE-3.png

    Luke-ALE-4.png

    Luke-ALE-5.png

    lukeALE-documents.png


    Migrated from LUCENE-2562 by Mark Miller (@markrmiller), 11 votes, resolved Apr 24 2019 Attachments: LUCENE-2562.patch (versions: 3), LUCENE-2562-ivy.patch, LUCENE-2562-Ivy.patch (versions: 3), luke1.jpg, luke2.jpg, luke3.jpg, Luke-ALE-1.png, Luke-ALE-2.png, Luke-ALE-3.png, Luke-ALE-4.png, Luke-ALE-5.png, lukeALE-documents.png, luke-javafx1.png, luke-javafx2.png, luke-javafx3.png, screenshot-1.png, スクリーンショット 2018-11-05 9.19.47.png Linked issues:

    Pull requests: https://github.com/apache/lucene-solr/pull/420, https://github.com/apache/lucene-solr/pull/490, https://github.com/apache/lucene-solr/pull/512

    type:task legacy-jira-priority:Major legacy-jira-resolution:Fixed legacy-jira-label:gsoc2014 module:luke legacy-jira-fix-version:8.1 legacy-jira-fix-version:9.0 
    opened by asfimport 196
  • Break out StorableField from IndexableField [LUCENE-3312]

    Break out StorableField from IndexableField [LUCENE-3312]

    In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own "documents" instead of the "user-space" impls we provide in oal.document.

    Similarly, with #4382, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor.

    But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables – one for the IndexableFields and one for the StorableFields. Either can be null.

    One downside is possible perf hit for fields that are both indexed & stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API....


    Migrated from LUCENE-3312 by Michael McCandless (@mikemccand), 2 votes, resolved Sep 02 2012 Attachments: LUCENE-3312-DocumentIterators-uwe.patch, lucene-3312-patch-01.patch, lucene-3312-patch-02.patch, lucene-3312-patch-03.patch, lucene-3312-patch-04.patch, lucene-3312-patch-05.patch, lucene-3312-patch-06.patch, lucene-3312-patch-07.patch, lucene-3312-patch-08.patch, lucene-3312-patch-09.patch, lucene-3312-patch-10.patch, lucene-3312-patch-11.patch, lucene-3312-patch-12.patch, lucene-3312-patch-12a.patch, lucene-3312-patch-13.patch, lucene-3312-patch-14.patch, LUCENE-3312-reintegration.patch (versions: 2) Linked issues:

    • #5702
    • #5413
    type:enhancement legacy-jira-priority:Major legacy-jira-resolution:Fixed module:core/index legacy-jira-fix-version:6.0 legacy-jira-label:lucene-gsoc-12 legacy-jira-label:gsoc2012 
    opened by asfimport 184
  • AttributeSource/TokenStream API improvements [LUCENE-1693]

    AttributeSource/TokenStream API improvements [LUCENE-1693]

    This patch makes the following improvements to AttributeSource and TokenStream/Filter:

    • introduces interfaces for all Attributes. The corresponding implementations have the postfix 'Impl', e.g. TermAttribute and TermAttributeImpl. AttributeSource now has a factory for creating the Attribute instances; the default implementation looks for implementing classes with the postfix 'Impl'. Token now implements all 6 TokenAttribute interfaces.

    • new method added to AttributeSource: addAttributeImpl(AttributeImpl). Using reflection it walks up in the class hierarchy of the passed in object and finds all interfaces that the class or superclasses implement and that extend the Attribute interface. It then adds the interface->instance mappings to the attribute map for each of the found interfaces.

    • removes the set/getUseNewAPI() methods (including the standard ones). Instead it is now enough to only implement the new API, if one old TokenStream implements still the old API (next()/next(Token)), it is wrapped automatically. The delegation path is determined via reflection (the patch determines, which of the three methods was overridden).

    • Token is no longer deprecated, instead it implements all 6 standard token interfaces (see above). The wrapper for next() and next(Token) uses this, to automatically map all attribute interfaces to one TokenWrapper instance (implementing all 6 interfaces), that contains a Token instance. next() and next(Token) exchange the inner Token instance as needed. For the new incrementToken(), only one TokenWrapper instance is visible, delegating to the currect reusable Token. This API also preserves custom Token subclasses, that maybe created by very special token streams (see example in Backwards-Test).

    • AttributeImpl now has a default implementation of toString that uses reflection to print out the values of the attributes in a default formatting. This makes it a bit easier to implement AttributeImpl, because toString() was declared abstract before.

    • Cloning is now done much more efficiently in captureState. The method figures out which unique AttributeImpl instances are contained as values in the attributes map, because those are the ones that need to be cloned. It creates a single linked list that supports deep cloning (in the inner class AttributeSource.State). AttributeSource keeps track of when this state changes, i.e. whenever new attributes are added to the AttributeSource. Only in that case will captureState recompute the state, otherwise it will simply clone the precomputed state and return the clone. restoreState(AttributeSource.State) walks the linked list and uses the copyTo() method of AttributeImpl to copy all values over into the attribute that the source stream (e.g. SinkTokenizer) uses.

    • Tee- and SinkTokenizer were deprecated, because they use Token instances for caching. This is not compatible to the new API using AttributeSource.State objects. You can still use the old deprecated ones, but new features provided by new Attribute types may get lost in the chain. A replacement is a new TeeSinkTokenFilter, which has a factory to create new Sink instances, that have compatible attributes. Sink instances created by one Tee can also be added to another Tee, as long as the attribute implementations are compatible (it is not possible to add a sink from a tee using one Token instance to a tee using the six separate attribute impls). In this case UOE is thrown.

    The cloning performance can be greatly improved if not multiple AttributeImpl instances are used in one TokenStream. A user can e.g. simply add a Token instance to the stream instead of the individual attributes. Or the user could implement a subclass of AttributeImpl that implements exactly the Attribute interfaces needed. I think this should be considered an expert API (addAttributeImpl), as this manual optimization is only needed if cloning performance is crucial. I ran some quick performance tests using Tee/Sink tokenizers (which do cloning) and the performance was roughly 20% faster with the new API. I'll run some more performance tests and post more numbers then.

    Note also that when we add serialization to the Attributes, e.g. for supporting storing serialized TokenStreams in the index, then the serialization should benefit even significantly more from the new API than cloning.

    This issue contains one backwards-compatibility break: TokenStreams/Filters/Tokenizers should normally be final (see #2827 for the explaination). Some of these core classes are not final and so one could override the next() or next(Token) methods. In this case, the backwards-wrapper would automatically use incrementToken(), because it is implemented, so the overridden method is never called. To prevent users from errors not visible during compilation or testing (the streams just behave wrong), this patch makes all implementation methods final (next(), next(Token), incrementToken()), whenever the class itsself is not final. This is a BW break, but users will clearly see, that they have done something unsupoorted and should better create a custom TokenFilter with their additional implementation (instead of extending a core implementation).

    For further changing contrib token streams the following procedere should be used:

    • rewrite and replace next(Token)/next() implementations by new API

    • if the class is final, no next(Token)/next() methods needed (must be removed!!!)

    • if the class is non-final add the following methods to the class: {code:java} /** @deprecated Will be removed in Lucene 3.0. This method is final, as it should

    • not be overridden. Delegates to the backwards compatibility layer. */ public final Token next(final Token reusableToken) throws java.io.IOException { return super.next(reusableToken); }

      /\*\* `@deprecated` Will be removed in Lucene 3.0. This method is final, as it should
      
    • not be overridden. Delegates to the backwards compatibility layer. */ public final Token next() throws java.io.IOException { return super.next(); } {code} Also the incrementToken() method must be final in this case (and the new method end() of LUCENE-1448)


    Migrated from LUCENE-1693 by Michael Busch, resolved Jul 24 2009 Attachments: lucene-1693.patch (versions: 4), LUCENE-1693.patch (versions: 15), LUCENE-1693-TokenizerAttrFactory.patch, PerfTest3.java, TestAPIBackwardsCompatibility.java, TestCompatibility.java (versions: 4) Linked issues:

    • #2770
    • #2769
    • #2771
    • #2534
    type:enhancement legacy-jira-resolution:Fixed module:analysis legacy-jira-priority:Minor legacy-jira-fix-version:2.9 
    opened by asfimport 172
  • Getting exception on search after upgrading to Lucene 9.4

    Getting exception on search after upgrading to Lucene 9.4

    Description

    After upgrading from Lucene 9.3.0 to Lucene 9.4.2 the index search with sorting by description throws the following exception:

    Caused by: java.lang.IllegalStateException: Term [77 73 64 66 6a 66 73 67 73 20 61 64 6b 66 64 6a 68 74 67 64 67 20 61 64 6b 66 64 6a 68 74 67 64 67 20 72 65 74 72 65 72 74 65 20] exists in doc values but not in the terms index at org.apache.lucene.search.comparators.TermOrdValComparator$CompetitiveIterator.init(TermOrdValComparator.java:582) at org.apache.lucene.search.comparators.TermOrdValComparator$CompetitiveIterator.update(TermOrdValComparator.java:553) at org.apache.lucene.search.comparators.TermOrdValComparator$TermOrdValLeafComparator.updateCompetitiveIterator(TermOrdValComparator.java:457) at org.apache.lucene.search.comparators.TermOrdValComparator$TermOrdValLeafComparator.setHitsThresholdReached(TermOrdValComparator.java:284) at org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector.countHit(TopFieldCollector.java:86) at org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1.collect(TopFieldCollector.java:202) at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:305) at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:247) at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:744) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:662) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:636) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:553)

    I wrote a small java function that reproduces the issue. Increasing maxRows value there to above 200 somehow resolves the problem.

    public static void test() throws Exception {
    	String name = "description";
    	String content = "content";
    	File dir = new File("C:\\test"); 
    	MMapDirectory directory = new MMapDirectory(dir.toPath());	
    	Analyzer analyzer = new StandardAnalyzer(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET);
    	
    	boolean createIndex = true; // must be changed to false after calling this function first time
    	if (createIndex) {
    		String[] values = {
    				"anshduvfv ",
    				"dhisdefihfhg ",
    				"Afasdfasdf ",
    				"Retrerte ",
    				"bnbssdfgfg ",
    				"wrgfhhjg ",
    				"jfhtvg ",
    				"fhdhfdsads ",
    				"Wsdfjfsgs ",
    				"adkfdjhtgdg ",
    		};
    
    		Random random = new Random();
    		IndexWriterConfig config = new IndexWriterConfig(analyzer);
    		config.setOpenMode(OpenMode.CREATE_OR_APPEND);
    		IndexWriter writer = new IndexWriter(directory, config);
    		for (int i = 0; i < 110; i++) {
    			for (int j = 0; j < 10; j++) {
    				String value = values[j] + values[random.nextInt(10)] + values[random.nextInt(10)] + values[random.nextInt(10)];
    				
    				Document doc = new Document();
    				doc.add(new TextField(content, value, Field.Store.NO));
    				doc.add(new StringField(name, value, Field.Store.YES));
    				doc.add(new SortedDocValuesField(name, new BytesRef(value.toLowerCase())));  // case-insensitive sorting
    				writer.addDocument(doc);
    			}
    		}
    		writer.close();
    	}
    	
    	int maxRows = 100;
    	String request = "*:*";
    	DirectoryReader reader = DirectoryReader.open(directory);
    	IndexSearcher searcher = new IndexSearcher(reader);
    	QueryParser parser = new QueryParser(content, analyzer);
    	parser.setSplitOnWhitespace(true);
    	parser.setAllowLeadingWildcard(false);
    	Query query = parser.parse(request);
    	Sort sort = new Sort(new SortField(name, Type.STRING, true));
    	TopDocs docs = searcher.search(query, maxRows, sort);
    	reader.close();
    }
    

    Version and environment details

    Upgraded from Lucene 9.3.0 to Lucene 9.4.2. Using lucene-backward-codecs-9.4.2.jar OS: MS Windows 11 Java: jdk-11.0.14

    type:bug 
    opened by vstrout 0
  • Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

    Create new KnnByteVectorField and KnnVectorsReader#getByteVectorValues(String)

    This completes the refactoring as described in: https://github.com/apache/lucene/issues/11963

    This commit:

    • splits out ByteVectorValues from VectorValues.
    • Adds getByteVectorValues(String field) to KnnVectorsReader
    • Adds a new KnnByteVectorField and disallows BytesRef values in the KnnVectorField
    • No longer allows ByteVectorValues to be read from a KnnVectorField.

    These refactors are difficult to split up any further.

    opened by benwtrent 2
  • org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs#testMPQ3 fails reproducible

    org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs#testMPQ3 fails reproducible

    Description

    Policeman Jenkins failed when executing this test on OpenJ9 JVM, but actually it is reproducible:

    Java: 64bit/openj9/jdk-17.0.5 -XX:-UseCompressedOops -Xgcpolicy:gencon
    
    1 tests failed.
    FAILED:  org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs.testMPQ3
    
    Error Message:
    java.lang.AssertionError: expected:<0.5828427076339722> but was:<0.5828428268432617>
    
    Stack Trace:
    java.lang.AssertionError: expected:<0.5828427076339722> but was:<0.5828428268432617>
        at __randomizedtesting.SeedInfo.seed([E853C78F22129ACC:708B13FE3EC459FF]:0)
        at app//org.junit.Assert.fail(Assert.java:89)
        at app//org.junit.Assert.failNotEquals(Assert.java:835)
        at app//org.junit.Assert.assertEquals(Assert.java:555)
        at app//org.junit.Assert.assertEquals(Assert.java:685)
        at app//org.apache.lucene.tests.search.CheckHits.verifyExplanation(CheckHits.java:503)
        at app//org.apache.lucene.tests.search.CheckHits.verifyExplanation(CheckHits.java:428)
        at app//org.apache.lucene.tests.search.CheckHits.verifyExplanation(CheckHits.java:428)
        at app//org.apache.lucene.tests.search.CheckHits$ExplanationAsserter.collect(CheckHits.java:623)
        at app//org.apache.lucene.tests.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:52)
        at app//org.apache.lucene.tests.search.AssertingCollector$1.collect(AssertingCollector.java:66)
        at app//org.apache.lucene.tests.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:52)
        at app//org.apache.lucene.tests.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:52)
        at app//org.apache.lucene.tests.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:52)
        at app//org.apache.lucene.tests.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:52)
        at app//org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:282)
        at app//org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:254)
        at app//org.apache.lucene.tests.search.AssertingBulkScorer.score(AssertingBulkScorer.java:101)
        at app//org.apache.lucene.search.ReqExclBulkScorer.score(ReqExclBulkScorer.java:46)
        at app//org.apache.lucene.tests.search.AssertingBulkScorer.score(AssertingBulkScorer.java:101)
        at app//org.apache.lucene.search.TimeLimitingBulkScorer.score(TimeLimitingBulkScorer.java:76)
        at app//org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
        at app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:776)
        at app//org.apache.lucene.tests.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:78)
        at app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:694)
        at app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:688)
        at app//org.apache.lucene.tests.search.CheckHits.checkExplanations(CheckHits.java:336)
        at app//org.apache.lucene.tests.search.QueryUtils.checkExplanations(QueryUtils.java:114)
        at app//org.apache.lucene.tests.search.QueryUtils.check(QueryUtils.java:144)
        at app//org.apache.lucene.tests.search.QueryUtils.check(QueryUtils.java:140)
        at app//org.apache.lucene.tests.search.QueryUtils.check(QueryUtils.java:129)
        at app//org.apache.lucene.tests.search.CheckHits.checkHitCollector(CheckHits.java:105)
        at app//org.apache.lucene.tests.search.BaseExplanationTestCase.qtest(BaseExplanationTestCase.java:110)
        at app//org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs.qtest(TestSimpleExplanationsWithFillerDocs.java:116)
        at app//org.apache.lucene.search.TestSimpleExplanations.testMPQ3(TestSimpleExplanations.java:229)
        at [[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0](mailto:[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0)(Native Method)
        at [[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke](mailto:[email protected]/jdk.internal.reflect.NativeMethodAccessorImpl.invoke)(NativeMethodAccessorImpl.java:77)
        at [[email protected]/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke](mailto:[email protected]/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke)(DelegatingMethodAccessorImpl.java:43)
        at [[email protected]/java.lang.reflect.Method.invoke](mailto:[email protected]/java.lang.reflect.Method.invoke)(Method.java:568)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
        at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
        at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
        at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
        at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
        at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
        at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at [[email protected]/java.lang.Thread.run](mailto:[email protected]/java.lang.Thread.run)(Thread.java:857)
    

    Reproduce failure on Java 19 (Hotspot):

    org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs > test suite's output saved to C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\build\test-results\test\outputs\OUTPUT-org.apache.lucene.search.TestSimpleExplanationsWithFillerDocs.txt, copied below:
      2> Jan. 03, 2023 6:30:02 PM org.apache.lucene.store.MemorySegmentIndexInputProvider <init>
      2> INFORMATION: Using MemorySegmentIndexInput with Java 19
       >     java.lang.AssertionError: expected:<0.5828427076339722> but was:<0.5828428268432617>
       >         at __randomizedtesting.SeedInfo.seed([E853C78F22129ACC:708B13FE3EC459FF]:0)
       >         at org.junit.Assert.fail(Assert.java:89)
       >         at org.junit.Assert.failNotEquals(Assert.java:835)
       >         at org.junit.Assert.assertEquals(Assert.java:555)
       >         at org.junit.Assert.assertEquals(Assert.java:685)
       >         at org.apache.lucene.tests.search.CheckHits.verifyExplanation(CheckHits.java:503)
    

    Version and environment details

    Main and 9.x branches

    Reproduce line

    gradlew test --tests TestSimpleExplanationsWithFillerDocs.testMPQ3 -Dtests.seed=E853C78F22129ACC -Dtests.multiplier=3 -Dtests.locale=wo -Dtests.timezone=EST5EDT -Dtests.asserts=true -Dtests.file.encoding=UTF-8

    type:bug legacy-jira-label:test-failure 
    opened by uschindler 0
  • ban finalizers in the build somehow (worst-case: use error-prone)

    ban finalizers in the build somehow (worst-case: use error-prone)

    Description

    I was looking at new error-prone checks in #12056 and there's a new check to ban finalizers.

    Because the method is in the built-in JDK deprecated list (e.g. https://github.com/policeman-tools/forbidden-apis/blob/main/src/main/resources/de/thetaphi/forbiddenapis/signatures/jdk-deprecated-11.txt#L195), I would expect the check to fail if i override finalize, but it doesn't because in most cases finalizer will not actually CALL object.finalize.

    Let's ban finalizers completely, one way or another though, we don't want them to sneak in. We can always enable the error-prone check for it as one solution.

    type:bug 
    opened by rmuir 6
  • Better skipping for multi-term queries with a FILTER rewrite.

    Better skipping for multi-term queries with a FILTER rewrite.

    Currently multi-term queries with a filter rewrite internally rewrite to a disjunction if 16 terms or less match the query. Otherwise postings lists of matching terms are collected into a DocIdSetBuilder. This change replaces the latter with a mixed approach where a disjunction is created between the 16 terms that have the highest document frequency and an iterator produced from the DocIdSetBuilder that collects all other terms. On fields that have a zipfian distribution, it's quite likely that no high-frequency terms make it to the DocIdSetBuilder. This provides two main benefits:

    • Queries are less likely to allocate a FixedBitSet of size maxDoc.
    • Queries are better at skipping or early terminating. On the other hand, queries that need to consume most or all matching documents may get a slowdown.

    The slowdown is unfortunate, but my gut feeling is that this change still has more pros than cons.

    opened by jpountz 6
Releases(releases/lucene/9.4.2)
Owner
The Apache Software Foundation
The Apache Software Foundation
Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

Apache Solr is an enterprise search platform written in Java and using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

The Apache Software Foundation 630 Dec 28, 2022
Apache Lucene and Solr open-source search software

Apache Lucene and Solr have separate repositories now! Solr has become a top-level Apache project and main line development for Lucene and Solr is hap

The Apache Software Foundation 4.3k Jan 7, 2023
A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

Adnan Hossain 6 Oct 24, 2022
Apache Lucene.NET

Apache Lucene.NET Full-text search for .NET Apache Lucene.NET is a .NET full-text search engine framework, a C# port of the popular Apache Lucene proj

The Apache Software Foundation 1.9k Jan 4, 2023
Simple full text indexing and searching library for Java

indexer4j Simple full text indexing and searching library for Java Install Gradle repositories { jcenter() } dependencies { compile 'com.haeun

Haeun Kim 47 May 18, 2022
Free and Open, Distributed, RESTful Search Engine

Elasticsearch A Distributed RESTful Search Engine https://www.elastic.co/products/elasticsearch Elasticsearch is a distributed RESTful search engine b

elastic 62.3k Dec 31, 2022
GitHub Search Engine: Web Application used to retrieve, store and present projects from GitHub, as well as any statistics related to them.

GHSearch Platform This project is made of two subprojects: application: The main application has two main responsibilities: Crawling GitHub and retrie

SEART - SoftwarE Analytics Research Team 68 Nov 25, 2022
OpenSearch is an open source distributed and RESTful search engine.

OpenSearch is an open source search and analytics engine derived from Elasticsearch

null 6.2k Jan 1, 2023
filehunter - Simple, fast, open source file search engine

Simple, fast, open source file search engine. Designed to be local file search engine for places where multiple documents are stored on multiple hosts with multiple directories.

null 32 Sep 14, 2022
🔍An open source GitLab/Gitee/Gitea code search tool. Kooder 是一个为 Gitee/GitLab 开发的开源代码搜索工具,这是一个镜像仓库,主仓库在 Gitee。

Kooder is a open source code search project, offering code, repositories and issues search service for code hosting platforms including Gitee, GitLab and Gitea.

开源中国 350 Dec 30, 2022
A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Lucene Serverless This project demonstrates a proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework. ✔️

Arseny Yankovsky 38 Oct 29, 2022
Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

Apache Solr is an enterprise search platform written in Java and using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

The Apache Software Foundation 630 Dec 28, 2022
Path Finding Visualizer for Breadth first search, Depth first search, Best first search and A* search made with java swing

Path-Finding-Visualizer Purpose This is a tool to visualize search algorithms Algorithms featured Breadth First Search Deapth First Search Gready Best

Leonard 11 Oct 20, 2022
Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

Search API to handle Spelling-Corrections Based on N-gram index algorithm: using MySQL Ngram Full-Text Parser Sample Screen-Recording Screen.Recording

Hardik Singh Behl 5 Dec 4, 2021
Apache Lucene and Solr open-source search software

Apache Lucene and Solr have separate repositories now! Solr has become a top-level Apache project and main line development for Lucene and Solr is hap

The Apache Software Foundation 4.3k Jan 7, 2023
Full-featured Socket.IO Client Library for Java, which is compatible with Socket.IO v1.0 and later.

Socket.IO-client Java This is the Socket.IO Client Library for Java, which is simply ported from the JavaScript client. See also: Android chat demo en

Socket.IO 5k Jan 4, 2023
Full Featured Google Chrome Dev Tools to JavaFX WebView browser debugging.

JavaFX WebView Debugger Via WebSocket connection to Google Chrome Dev Tools JavaFx WebView debugging with Chrome Dev tools is highly dependent on Goog

Vladimir Schneider 56 Dec 19, 2022
Full Featured Eclipse Theme Customizer!

Jeeeyul's Eclipse Themes (former Eclipse Chrome Theme) Jeeeyul's Eclipse Themes allows you to customize every single details of Eclipse's appearance.

Jeeeyul Lee 821 Dec 17, 2022