Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

Last update: Dec 28, 2022

Overview

Apache Solr

Apache Solr is an enterprise search platform written in Java and using Apache Lucene. Major features include full-text search, index replication and sharding, and result faceting and highlighting.

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit https://solr.apache.org/guide/

Building with Gradle

Firstly, you need to set up your development environment (OpenJDK 11 or greater).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README. Solr runs with Java 11 and later.

As of 9.0, Solr uses Gradle as the build system. Ant build support has been removed.

To build Solr, run (./ can be omitted on Windows):

./gradlew assemble

NOTE: DO NOT use gradle command that is already installed on your machine (unless you know what you'll do). The "gradle wrapper" (gradlew) does the job - downloads the correct version of it, setups necessary configurations.

The first time you run Gradle, it will create a file "gradle.properties" that contains machine-specific settings. Normally you can use this file as-is, but it can be modified if necessary.

The command above packages a full distribution of Solr server; the package can be located at:

solr/packaging/build/solr-*

Note that the gradle build does not create or copy binaries throughout the source repository so you need to switch to the packaging output folder above; the rest of the instructions below remain identical. The packaging directory is rewritten on each build.

For development, especially when you have created test indexes etc, use the ./gradlew dev task which will copy binaries to ./solr/packaging/build/dev but only overwrite the binaries which will preserve your test setup.

If you want to build the documentation, type ./gradlew -p solr documentation.

Running Solr

After building Solr, the server can be started using the bin/solr control scripts. Solr can be run in either standalone or distributed (SolrCloud mode).

To run Solr in standalone mode, run the following command from the solr/ directory:

bin/solr start

To run Solr in SolrCloud mode, run the following command from the solr/ directory:

bin/solr start -c

The bin/solr control script allows heavy modification of the started Solr. Common options are described in some detail in solr/README.txt. For an exhaustive treatment of options, run bin/solr start -h from the solr/ directory.

Gradle build and IDE support

IntelliJ - IntelliJ idea can import the project out of the box. Code formatting conventions should be manually adjusted.
Eclipse - Not tested.
Netbeans - Not tested.

Gradle build and tests

./gradlew assemble will build a runnable Solr as noted above.

./gradlew check will assemble Solr and run all validation tasks unit tests.

./gradlew help will print a list of help commands for high-level tasks. One of these is helpAnt that shows the gradle tasks corresponding to ant targets you may be familiar with.

Contributing

Please review the Contributing to Solr Guide for information on contributing.

Discussion and Support

Mailing Lists
Issue Tracker (JIRA)
IRC: #solr and #solr-dev on freenode.net
Slack

Comments

SOLR-15089: Allow backup/restoration to Amazon's S3 blobstore
Description

Solr provides a BackupRepository interface with which users can create backups to arbitrary backends. There is now a GCS implementation (see https://github.com/apache/solr/pull/39), but no S3 impl yet.

Solution

This PR adds a BackupRepository implementation for communicating with S3.

Tests

We've added new unit tests at the BackupRepository level as well as tests for the S3 interactions (using S3Mock framework).

Checklist

Please review the following and check all that apply:

[x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[x] I have created a Jira issue and added the issue ID to my pull request title.

[x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[x] I have developed this patch against the main branch.

[x] I have run ./gradlew check.

[x] I have added tests for my changes.

[x] I have added documentation for the Reference Guide

We have not yet done the work of adding license files for all the newly added libraries/dependencies. These will be added in a future commit.
opened by athrog 42
Dynamically discover lucene version for use in build

The test :solr:validateConfigFileSanity checks that <luceneMatchVersion> is correct in various solrconfig files. Now that lucene version is different from solr version (e.g. on main branch right now), the build needs to know what lucene version we have.

I tried using Lucene's Version.LATEST.toString(), but the build found a 8_10_0 version in my env. So now I try to just pull it from versions.props, and that works.

However, it may noe be compatible with the local lucene version override feature in lucene-dev-repo-composite.gradle??

opened by janhoy 39
SOLR-15955: Update Jetty dependency to 10
https://issues.apache.org/jira/browse/SOLR-15955

Summary:

Upgrades to Jetty 10.0.12

dropwizard metrics 4.2.12 for dropwizard-metrics9 -> dropwizard-metrics10

log4j 2.19.0 and slf4j 2.0.3

for s3mock specifically upgrade spring-boot 2.5.14 and spring 5.3.23 to handle Jetty 10
opened by markrmiller 34
SOLR-15824 Improved Query Screen raw query parameters section
https://issues.apache.org/jira/browse/SOLR-15824

Description

While sending a query in solr, it is very convenient that the q field can be resizable(because it's in textarea tag), while the fq and raw query parameters fields are not resizable, which can cause difficulties in long query parameters. To solve this made improvement on query.html and query.css.

Solution

I put fq and raw query parameters inside textarea tag and with css I made it vertical resizable because of the line between query and result. Because if it was resizable both vertical and horizontal or only horizontal, it would cause the appearance of the panel to deteriorate.

Checklist

Please review the following and check all that apply:

[X] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[X] I have created a Jira issue and added the issue ID to my pull request title.

[X] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[X] I have developed this patch against the main branch.

[X] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide

EDIT

I changed only raw query parameters section because of the comments on PR. Added +/- buttons for raw query parameters.
opened by betulince 27
SOLR-16271: remove wildcard imports
https://issues.apache.org/jira/browse/SOLR-16271

Description

Remove wildcard imports from the build. In later issue, will add a spotless based check.

Solution

made the changes.

Tests

manually run tests.

Checklist

Please review the following and check all that apply:

[ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[ ] I have created a Jira issue and added the issue ID to my pull request title.

[ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[ ] I have developed this patch against the main branch.

[ ] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide
opened by epugh 24
SOLR-15982: Add end time value to backup response
https://issues.apache.org/jira/browse/SOLR-15982

Description

Adding new field endTime to collection backup response (response is actually an aggregated result of responses from multiple nodes).

Solution

After backup finished (regardless is it sync or async) it will write backup.properties file to repository. Actually it writes the content of the org.apache.solr.core.backup.BackupProperties. Before writing backup properties data the value endTime is filled at org.apache.solr.core.backup.BackupProperties.store(Writer):

public void store(Writer propsWriter) throws IOException { properties.put("indexSizeMB", String.valueOf(indexSizeMB)); properties.put("indexFileCount", String.valueOf(indexFileCount)); properties.put(BackupManager.END_TIME_PROP, Instant.now().toString()); properties.store(propsWriter, "Backup properties file"); }

Exactly after writing backup properties file an additional field endTime will be appended to the response within org.apache.solr.cloud.api.collections.BackupCmd.call(ClusterState, ZkNodeProps, NamedList<Object>) call:

public void call(ClusterState state, ZkNodeProps message, NamedList<Object> results) throws Exception { ... try (BackupRepository repository = cc.newBackupRepository(repo)) { ... backupMgr.writeBackupProperties(backupProperties); if(backupProperties != null) { NamedList<Object> response = (NamedList<Object>) results.get("response"); response.add("endTime", backupProperties.getEndTime()); } ... } }

Tests

None

Checklist

Please review the following and check all that apply:

[x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[x] I have created a Jira issue and added the issue ID to my pull request title.

[x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[x] I have developed this patch against the main branch.

[x] I have run ./gradlew check.

[ ] I have added tests for my changes.

[x] I have added documentation for the Reference Guide
opened by ijioio 23
SOLR-15342: Separate out a SolrJ-Zookeeper module

https://issues.apache.org/jira/browse/SOLR-15342

I open this PR in draft to share the evolution of the work. I closed the old PR since it has been open for a while, there have been quite a few changes since then in relation to ZooKeeper and some of the work has been contributed in a separate PR. I resume the work done in the latter in this one

opened by heythm 21
SOLR-15842: Fix async backup response
https://issues.apache.org/jira/browse/SOLR-15842

Description

Adding new field to org.apache.solr.handler.admin.CoreAdminHandler.TaskObject to hold operation results.

Solution

I apply a simple changes that adds an additional field operationRspInfo to org.apache.solr.handler.admin.CoreAdminHandler.TaskObject. It is filled with operation results in case of operation finished successfully (it is exactly the same results that are used in case of sync request). This way we store the results within TaskObject.

Later, when request status is sent, besides adding standard value Response, an additional value response will be added to request response. Value response will hold operation results preserved in TaskObject.

Tests

None

Checklist

Please review the following and check all that apply:

[x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[x] I have created a Jira issue and added the issue ID to my pull request title.

[x] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[x] I have developed this patch against the main branch.

[ ] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide
opened by ijioio 20
SOLR-10452: setQueryParams should be deprecated in favor of SolrClientBuilder methods
https://issues.apache.org/jira/browse/SOLR-10452

Description

setQueryParams should be deprecated in favor of SolrClientBuilder methods.

Solution

I moved the the setQueryParams over for the Http2SolrClient, it already had been done for HttpSolrClient. I noticed we have a addQueryParams method, which I marked deprecated and we shouldn't use, as that goes against the idea of a Solr Client being immutable.

One area I tried to fix and gave up on was the DelegationTokenHttpSolrClient, I couldn't quite figure out what to do there, would love a suggestion or a fix ;-)

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

[ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[ ] I have created a Jira issue and added the issue ID to my pull request title.

[ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[ ] I have developed this patch against the main branch.

[ ] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide
opened by epugh 19
SOLR-16574: Demonstrate Dense Vectors and KNN as part of the Films example
https://issues.apache.org/jira/browse/SOLR-16574

Description

Enrich the films example to demonstrate how to use the Dense Vectors feature.

Solution

Added the field film_vector to the films dataset. This is an embedding vector created to represent the movie with 10 dimensions. The vector is created by combining the first 5 dimensions of a pre-trained BERT sentence model applied on the name of the movies plus the name of the genres, followed by an item2vec 5-dimensions model of itemset co-occurrence of genres in the movies, totaling 10 dimensions. Even though it is expected that similar movies will be close to each other, this is just a "toy example" model to serve as source for creating the films vectors.

The README of the example was also updated to include the specification of the Dense Vector field in the schema. Also, a new section was created, with examples showing how to make KNN queries with the vectors.

Tests

Added the new field film_vector to the 3 dataset formats (JSON, XML, CSV), making sure to preserve the exact same data from the original datasets, so that the "diff" will be only the appendage of the new field.

Checked the creation of the collection for the 3 dataset formats. Regardless of the format all the 1100 films were indexed, and the film_vector field was correctly parsed and indexed as well.

Checked the KNN example queries for all the 3 dataset formats.

Checklist

Please review the following and check all that apply:

[X] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[X] I have created a Jira issue and added the issue ID to my pull request title.

[X] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[X] I have developed this patch against the main branch.

[ ] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide
opened by gabrielmagno 19
SOLR-16368: Use Builder Pattern with Solr Clients
https://issues.apache.org/jira/browse/SOLR-16368

Description

Part of working on reducing the use of legacy HttpSolrClient in the tests everywhere is seeing if mutating the client can be reduced by embracing the Builder pattern.

Solution

Use the Builder where possible.

I couldn't figure out how to untangle the logic in TestRandomFlRTGCloud and would love some eyes on that one!

Tests

ran the tests

Checklist

Please review the following and check all that apply:

[ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[ ] I have created a Jira issue and added the issue ID to my pull request title.

[ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[ ] I have developed this patch against the main branch.

[ ] I have run ./gradlew check.

[ ] I have added tests for my changes.

[ ] I have added documentation for the Reference Guide
opened by epugh 19

SOLR-16613: CryptoKeys should handle RSA padding for OpenJ9

https://issues.apache.org/jira/browse/SOLR-16613

Tested with:

# With Temurin 17
./gradlew test --tests TestRSAKeyPair
./gradlew test --tests TestPKIAuthenticationPlugin

# With OpenJ9 17
RUNTIME_JAVA_HOME=/Library/Java/JavaVirtualMachines/ibm-semeru-open-17.jdk/Contents/Home ./gradlew test --tests TestRSAKeyPair
RUNTIME_JAVA_HOME=/Library/Java/JavaVirtualMachines/ibm-semeru-open-17.jdk/Contents/Home ./gradlew test --tests TestPKIAuthenticationPlugin

opened by risdenk 1

SOLR-16532 Further improvements to opentelemetry module

https://issues.apache.org/jira/browse/SOLR-16532

I created a new PR where we can gather all followup cleanups that may arrive after the initial merge in #1168

opened by janhoy 1
SOLR-16610: Support Copy n Paste of Command Line commands in Ref Guide

https://issues.apache.org/jira/browse/SOLR-16610

Description

This is an example of the types of changes we would need to make to be more in line with Antora's handling of command lines and being able to cut n paste them.

Solution

Conversion

Tests

manual

opened by epugh 5
SOLR-16608: Ability to compress the collection state
https://issues.apache.org/jira/browse/SOLR-16608

Description

This PR provides the ability to configured a minimum size of state.json above which it will be compressed when written to Zookeeper. Solr will be able to handle compressing/decompressing in ZLib format in all areas where Solr reads state.json from Zookeeper.

Solution

This uses ZLib compression to optionally compress state.json, the core of this logic is contained within a new class called CompressionUtil that handles both compression and decompression as well as some logic to check if bytes are compressed in a very efficient way. This can handle reading/writing both compressed and uncompressed data so it changing the configuration for compression is backward compatible and will not break reading state.json.

Tests

There are tests added to check various layers to ensure compression/decompression is working, including unit tests for CompressionUtil and tests for ZkStateReader and ZkStateWriter.

Checklist

Please review the following and check all that apply:

[x] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.

[x] I have created a Jira issue and added the issue ID to my pull request title.

[ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)

[x] I have developed this patch against the main branch.

[x] I have run ./gradlew check.

[x] I have added tests for my changes.

[x] I have added documentation for the Reference Guide
opened by justinrsweeney 4
SOLR-788: transfer mlt query for component right

https://issues.apache.org/jira/browse/SOLR-788 I think this is a somewhat reliable solution for the problem with distributed mlt component. Reminder, now it naively relies on parsing booleanQuery.toString() that's unacceptable.

opened by mkhludnev 0

Owner

The Apache Software Foundation

GitHub https://solr.apache.org/

Apache Lucene is a high-performance, full featured text search engine library written in Java.

1.4k Jan 5, 2023

A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Lucene Serverless This project demonstrates a proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework. ✔️

38 Oct 29, 2022

Apache Lucene.NET

Apache Lucene.NET Full-text search for .NET Apache Lucene.NET is a .NET full-text search engine framework, a C# port of the popular Apache Lucene proj

1.9k Jan 4, 2023

A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

6 Oct 24, 2022

Free and Open, Distributed, RESTful Search Engine

Elasticsearch A Distributed RESTful Search Engine https://www.elastic.co/products/elasticsearch Elasticsearch is a distributed RESTful search engine b

62.3k Dec 31, 2022

GitHub Search Engine: Web Application used to retrieve, store and present projects from GitHub, as well as any statistics related to them.

GHSearch Platform This project is made of two subprojects: application: The main application has two main responsibilities: Crawling GitHub and retrie

SEART - SoftwarE Analytics Research Team

68 Nov 25, 2022

OpenSearch is an open source distributed and RESTful search engine.

OpenSearch is an open source search and analytics engine derived from Elasticsearch

6.2k Jan 1, 2023

🔍An open source GitLab/Gitee/Gitea code search tool. Kooder 是一个为 Gitee/GitLab 开发的开源代码搜索工具，这是一个镜像仓库，主仓库在 Gitee。

Kooder is a open source code search project, offering code, repositories and issues search service for code hosting platforms including Gitee, GitLab and Gitea.

350 Dec 30, 2022

filehunter - Simple, fast, open source file search engine

Simple, fast, open source file search engine. Designed to be local file search engine for places where multiple documents are stored on multiple hosts with multiple directories.

32 Sep 14, 2022

Simple full text indexing and searching library for Java

indexer4j Simple full text indexing and searching library for Java Install Gradle repositories { jcenter() } dependencies { compile 'com.haeun

47 May 18, 2022

Apache Lucene and Solr open-source search software

Apache Lucene and Solr have separate repositories now! Solr has become a top-level Apache project and main line development for Lucene and Solr is hap

4.3k Jan 7, 2023

Path Finding Visualizer for Breadth first search, Depth first search, Best first search and A* search made with java swing

Path-Finding-Visualizer Purpose This is a tool to visualize search algorithms Algorithms featured Breadth First Search Deapth First Search Gready Best

11 Oct 20, 2022

Apache Lucene is a high-performance, full featured text search engine library written in Java.

1.4k Jan 5, 2023

A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Lucene Serverless This project demonstrates a proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework. ✔️

38 Oct 29, 2022

Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

Search API to handle Spelling-Corrections Based on N-gram index algorithm: using MySQL Ngram Full-Text Parser Sample Screen-Recording Screen.Recording

5 Dec 4, 2021

Apache Solr is an enterprise search platform written in Java and using Apache Lucene.

Related tags

Overview

Apache Solr

Online Documentation

Building with Gradle

Running Solr

Gradle build and IDE support

Gradle build and tests

Contributing

Discussion and Support

Comments

Description

Solution

Tests

Checklist

Description

Solution

Checklist

EDIT

Description

Solution

Tests

Checklist

Description

Solution

Tests

Checklist

Description

Solution

Tests

Checklist

Description

Solution

Tests

Checklist

Description

Solution

Tests

Checklist

Description

Solution

Tests

Checklist

Description

Solution

Tests

Description

Solution

Tests

Checklist

Owner

The Apache Software Foundation

Apache Lucene is a high-performance, full featured text search engine library written in Java.

A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Apache Lucene.NET

A simple fast search engine written in java with the help of the Collection API which takes in multiple queries and outputs results accordingly.

Free and Open, Distributed, RESTful Search Engine

GitHub Search Engine: Web Application used to retrieve, store and present projects from GitHub, as well as any statistics related to them.

OpenSearch is an open source distributed and RESTful search engine.

🔍An open source GitLab/Gitee/Gitea code search tool. Kooder 是一个为 Gitee/GitLab 开发的开源代码搜索工具，这是一个镜像仓库，主仓库在 Gitee。

filehunter - Simple, fast, open source file search engine

Simple full text indexing and searching library for Java

Apache Lucene and Solr open-source search software

Path Finding Visualizer for Breadth first search, Depth first search, Best first search and A* search made with java swing

Apache Lucene is a high-performance, full featured text search engine library written in Java.

A proof-of-concept serverless full-text search solution built with Apache Lucene and Quarkus framework.

Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

🔍 Open Source Enterprise Cognitive Search Engine

The Chronix Server implementation that is based on Apache Solr.

Apache Lucene.NET

简繁体汉字转拼音的项目，解决多音字的问题。ElasticSearch、solr 的拼音分词工具

CUBA Platform is a high level framework for enterprise applications development