Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text

Overview

Welcome to Apache OpenNLP!

Build Status Coverage Status Maven Central Documentation Status GitHub license Twitter Follow

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

This toolkit is written completely in Java and provides support for common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, coreference resolution, language detection and more!

These tasks are usually required to build more advanced text processing services.

The goal of the OpenNLP project is to be a mature toolkit for the above mentioned tasks.

An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.

Presently, OpenNLP includes common classifiers such as Maximum Entropy, Perceptron and Naive Bayes.

OpenNLP can be used both programmatically through its Java API or from a terminal through its CLI. OpenNLP API can be easily plugged into distributed streaming data pipelines like Apache Flink, Apache NiFi, Apache Spark.

Useful Links

For additional information, visit the OpenNLP Home Page

You can use OpenNLP with any language, demo models are provided here.

The models are fully compatible with the latest release, they can be used for testing or getting started.

Please train your own models for all other use cases.

Documentation, including JavaDocs, code usage and command-line interface examples are available here

You can also follow our mailing lists for news and updates.

Overview

Currently the library has different packages:

  • opennlp-tools : The core toolkit.
  • opennlp-uima : A set of Apache UIMA annotators.
  • opennlp-brat-annotator : A set of annotators for BRAT
  • opennlp-morfologik-addon : An addon for Morfologik
  • opennlp-sandbox: Other projects in progress are found in the sandbox

Getting Started

You can import the core toolkit directly from Maven, SBT or Gradle:

Maven


   
    
    
     org.apache.opennlp
    
    
    
     opennlp-tools
    
    
    
     ${opennlp.version}
    

   

SBT

libraryDependencies += "org.apache.opennlp" % "opennlp-tools" % "${opennlp.version}"

Gradle

compile group: "org.apache.opennlp", name: "opennlp-tools", version: "${opennlp.version}"

For more details please check our documentation

Building OpenNLP

At least JDK 8 and Maven 3.3.9 are required to build the library.

After cloning the repository go into the destination directory and run:

mvn install

Contributing

The Apache OpenNLP project is developed by volunteers and is always looking for new contributors to work on all parts of the project. Every contribution is welcome and needed to make it better. A contribution can be anything from a small documentation typo fix to a new component.

If you would like to get involved please follow the instructions here

Comments
  • OPENNLP-1026: Replace references and usages of opennlp.tools.util.Heap with java.util.SortedSet

    OPENNLP-1026: Replace references and usages of opennlp.tools.util.Heap with java.util.SortedSet

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [X] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [X] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [X] Is your initial contribution a single, squashed commit?

    For code changes:

    • [X] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [ ] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by smarthi 13
  • OPENNLP-1050: [WIP] Add formats support for Irish Sentence Bank

    OPENNLP-1050: [WIP] Add formats support for Irish Sentence Bank

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [X] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [X] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [X] Is your initial contribution a single, squashed commit?

    For code changes:

    • [X] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [X] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by jimregan 12
  • OPENNLP-880: Refactor data indexer

    OPENNLP-880: Refactor data indexer

    Added a patch that changes multiple items (Sorry That I did not fix 1 item at a time-I am a scientist not a software engineer):

    1.) Added method to EventTrainer – train(DataIndexer indexer) –Then I added the method to AbstractEventTrainer. No other code needed to be changed. 2.) Created a new class – PluggableParmeters: Only AbstractTrainers had access to getXXXParam(Value,Default). So I pulled out this functionality into a separate class. Now both AbstractTrainers and AbstractDataIndexers can hold parameters. 3.) Refactored DataIndexer. This touched a lot of code. Added init(Map,Map) method and index(ObjectStream) method. Changed 1-Pass and 2-Pass DataIndexers. Everywhere 1-pass/2-pass indexers where created, I changed the constructor, and added the init and index methods. 4.) Changed GIS.doTrain(indexer) to use the parameters passed in the init method. 5.) QNTrainer – created a working init method. Changed the isValid method so it is not the init method.

    #4 Make the GIS class work within the new training API

    opened by danielruss 12
  • OPENNLP-1350 Improve normaliser MAIL_REGEX

    OPENNLP-1350 Improve normaliser MAIL_REGEX

    Addresses OPENNLP-1350

    The MAIL_REGEX in UrlCharSSequenceNormalizer causes replaceAll(...) to become extremely costly when given an input string with a long sequence of characters from the first character set in the regex, but which ultimately fails to match the whole regex. This pull request fixes that, and also another detail:

    Allow + in the local part, and disallow _ in the domain part. There are other characters that are allowed in the local part as well, but these are less common (https://en.wikipedia.org/wiki/Email_address).

    The speedup for unfortunate input is achieved by adding a negative lookbehind with a single characters from the first character set. Currently, the replaceAll(" ") on a string of ~100K characters from the set [-_.0-9A-Za-z] runs in ~1minute on modern hardware; adding a negative lookbehind with one of the characters from that set reduces this to a few milliseconds, and is functionally equivalent. (Consider the current pattern and a match from position i to k. If the character at i-1 is in the character set, there would also be a match from i-1 to k, which would already be replaced.)

    opened by jonmv 10
  • OPENNLP-1091: Findbugs issues and IDE warnings

    OPENNLP-1091: Findbugs issues and IDE warnings

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [ ] Is your initial contribution a single, squashed commit?

      • Not yet, intentionally, in case we decide to trim some of the changes

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [ ] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by kinow 10
  • OPENNLP-1398 - Fix Temporary File Information Disclosure Vulnerability

    OPENNLP-1398 - Fix Temporary File Information Disclosure Vulnerability

    Security Vulnerability Fix

    This pull request fixes a Temporary File Information Disclosure Vulnerability, which existed in this project.

    Preamble

    The system temporary directory is shared between all users on most unix-like systems (not MacOS, or Windows). Thus, code interacting with the system temporary directory must be careful about file interactions in this directory, and must ensure that the correct file posix permissions are set.

    This PR was generated because a call to File.createTempFile(..) was detected in this repository in a way that makes this project vulnerable to local information disclosure. With the default uname configuration, File.createTempFile(..) creates a file with the permissions -rw-r--r--. This means that any other user on the system can read the contents of this file.

    Impact

    Information in this file is visible to other local users, allowing a malicious actor co-resident on the same machine to view potentially sensitive files.

    Other Examples

    The Fix

    The fix has been to convert the logic above to use the following API that was introduced in Java 1.7.

    File tmpDir = Files.createTempFile("temp dir").toFile();
    

    The API both creates the file securely, ie. with a random, non-conflicting name, with file permissions that only allow the currently executing user to read or write the contents of this file. By default, Files.createTempFile("temp dir") will create a file with the permissions -rw-------, which only allows the user that created the file to view/write the file contents.

    :arrow_right: Vulnerability Disclosure :arrow_left:

    :wave: Vulnerability disclosure is a super important part of the vulnerability handling process and should not be skipped! This may be completely new to you, and that's okay, I'm here to assist!

    First question, do we need to perform vulnerability disclosure? It depends!

    1. Is the vulnerable code only in tests or example code? No disclosure required!
    2. Is the vulnerable code in code shipped to your end users? Vulnerability disclosure is probably required!

    Vulnerability Disclosure How-To

    You have a few options options to perform vulnerability disclosure. However, I'd like to suggest the following 2 options:

    1. Request a CVE number from GitHub by creating a repository-level GitHub Security Advisory. This has the advantage that, if you provide sufficient information, GitHub will automatically generate Dependabot alerts for your downstream consumers, resolving this vulnerability more quickly.
    2. Reach out to the team at Snyk to assist with CVE issuance. They can be reached at the Snyk's Disclosure Email.

    Detecting this and Future Vulnerabilities

    This vulnerability was automatically detected by GitHub's CodeQL using this CodeQL Query.

    You can automatically detect future vulnerabilities like this by enabling the free (for open-source) GitHub Action.

    I'm not an employee of GitHub, I'm simply an open-source security researcher.

    Source

    This contribution was automatically generated with an OpenRewrite refactoring recipe, which was lovingly hand crafted to bring this security fix to your repository.

    The source code that generated this PR can be found here: SecureTempFileCreation

    Opting-Out

    If you'd like to opt-out of future automated security vulnerability fixes like this, please consider adding a file called .github/GH-ROBOTS.txt to your repository with the line:

    User-agent: JLLeitschuh/security-research
    Disallow: *
    

    This bot will respect the ROBOTS.txt format for future contributions.

    Alternatively, if this project is no longer actively maintained, consider archiving the repository.

    CLA Requirements

    This section is only relevant if your project requires contributors to sign a Contributor License Agreement (CLA) for external contributions.

    It is unlikely that I'll be able to directly sign CLAs. However, all contributed commits are already automatically signed-off.

    The meaning of a signoff depends on the project, but it typically certifies that committer has the rights to submit this work under the same license and agrees to a Developer Certificate of Origin (see https://developercertificate.org/ for more information).

    - Git Commit Signoff documentation

    If signing your organization's CLA is a strict-requirement for merging this contribution, please feel free to close this PR.

    Sponsorship & Support

    This contribution is sponsored by HUMAN Security Inc. and the new Dan Kaminsky Fellowship, a fellowship created to celebrate Dan's memory and legacy by funding open-source work that makes the world a better (and more secure) place.

    This PR was generated by Moderne, a free-for-open source SaaS offering that uses format-preserving AST transformations to fix bugs, standardize code style, apply best practices, migrate library versions, and fix common security vulnerabilities at scale.

    Tracking

    All PR's generated as part of this fix are tracked here: https://github.com/JLLeitschuh/security-research/issues/18

    opened by JLLeitschuh 9
  • OPENNLP-994: Remove deprecated methods from the Document Categorizer

    OPENNLP-994: Remove deprecated methods from the Document Categorizer

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [X] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [X] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [X] Is your initial contribution a single, squashed commit?

    For code changes:

    • [X] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [X] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by smarthi 9
  • OPENNLP-198: Added link for demo website (WIP) for Sentence Detector

    OPENNLP-198: Added link for demo website (WIP) for Sentence Detector

    I was working on a personal tutorial site for OpenNLP and till now I was trying to showcase some sample for running Sentence Detector API. Hope it would be helpful for community.

    You can have a look at initial demo. http://my-ai-launcher.appspot.com/#/visualnlp/opennlp

    Also Refer https://issues.apache.org/jira/secure/attachment/12872265/sentdetect.png

    I am working on adding more API's and option for loading custom models. Further I will add support for training of customer models.

    Please share your valuable feedback.

    opened by MrIndiaDev 8
  • OPENNLP-958: Add POS Name Finder feature generator

    OPENNLP-958: Add POS Name Finder feature generator

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [X] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [X] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [X] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [ ] Is your initial contribution a single, squashed commit?

    For code changes:

    • [X] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [X] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by wcolen 8
  • OPENNLP-1366: Fix Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

    OPENNLP-1366: Fix Training of MaxEnt Model with large corpora fails with java.io.UTFDataFormatException

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [x] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [x] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by mawiesne 7
  • OPENNLP-1302: add missing param tags

    OPENNLP-1302: add missing param tags

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes: [* ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    [*] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    [*] Has your PR been rebased against the latest commit within the target branch (typically master)?

    [*] Is your initial contribution a single, squashed commit?

    For code changes: [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? [ ] Have you written or updated unit tests to verify your changes? [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0? [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? For documentation related changes: [* ] Have you ensured that format looks appropriate for the output in which it is rendered? Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by jongwoojeff 7
  • OPENNLP-1431 Enhance JavaDoc in opennlp.tools.dictionary and opennlp.tools.entitylinker packages

    OPENNLP-1431 Enhance JavaDoc in opennlp.tools.dictionary and opennlp.tools.entitylinker packages

    Change

    • adds missing JavaDoc
    • improves existing documentation for clarity
    • removes superfluous text
    • adds 'final' modifier where useful and applicable
    • adds 'Override' annotation where useful and applicable
    • adds package-info.java file to entitylinker package
    • fixes some typos

    Tasks

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically main)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [ ] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [x] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by mawiesne 0
  • OPENNLP-1358 - Failing tests on Windows 11 and Java 17

    OPENNLP-1358 - Failing tests on Windows 11 and Java 17

    What does this PR do?

    Tests are not failing on Windows anymore. This PR adds build support for Windows (latest = Server 2022) and adjusts test cases (if needed) to be run on GH actions CI.

    • Adds CI for Windows (latest = Server 2022, similar to Windows 11) to be able to build / test with Windows
    • Unifies how line endings are handled by git via .gitattributes as we have a checkstyle rule in place which prohibits the use of CRLF (checkstyle will fail a build on Windows if git isn't configured accordingly)
    • Converts line endings from CRLF to LF (according to supplied .gitattributes, see https://docs.github.com/en/get-started/getting-started-with-git/configuring-git-to-handle-line-endings) via git add --renormalize .
    • Workarounds https://github.com/junit-team/junit5/issues/2811 (@TempDir created folders cannot be deleted on Windows Server 2022)
    • Removes Coveralls as there is no Jacoco support at the moment (so the step isn't useful anymore): https://github.com/coverallsapp/github-action/issues/22

    Matrix build is now happy: https://github.com/rzo1/opennlp/actions/runs/3809019320

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically main)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [x] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by rzo1 0
  • OPENNLP-1429 Add the Internal annotation to interfaces and classes documented accordingly

    OPENNLP-1429 Add the Internal annotation to interfaces and classes documented accordingly

    Change

    • adds Internal annotation to code which has existing hints/notes on internal use
    • improves some JavaDoc by linking to existing classes or interfaces
    • fixes inconsistent JavaDoc in SentenceModelLoader
    • adds 'final' modifier where useful and applicable
    • adds 'Override' annotation where useful and applicable
    • fixes some typos

    Tasks

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically main)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [ ] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [x] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by mawiesne 0
  • OPENNLP-1428 - Enhance DownloadUtil to avoid the use of hard-coded model urls

    OPENNLP-1428 - Enhance DownloadUtil to avoid the use of hard-coded model urls

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically main)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [x] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by rzo1 0
  • OPENNLP-1369: Fixing NPE when serializing a model which depends of an…

    OPENNLP-1369: Fixing NPE when serializing a model which depends of an…

    …other one

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [ ] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [ ] Is your initial contribution a single, squashed commit?

    For code changes:

    • [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [ ] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

    opened by avanco 0
  • OPENNLP-912: Rule based sentence detector

    OPENNLP-912: Rule based sentence detector

    Thank you for contributing to Apache OpenNLP.

    In order to streamline the review of the contribution we ask you to ensure the following steps have been taken:

    For all changes:

    • [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

    • [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

    • [x] Has your PR been rebased against the latest commit within the target branch (typically master)?

    • [x] Is your initial contribution a single, squashed commit?

    For code changes:

    • [x] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder?
    • [x] Have you written or updated unit tests to verify your changes?
    • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
    • [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder?
    • [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder?

    For documentation related changes:

    • [ ] Have you ensured that format looks appropriate for the output in which it is rendered?

    Note:

    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.

    opened by Alanscut 3
Owner
The Apache Software Foundation
The Apache Software Foundation
Serverless Reference Architecture for Real-time File Processing

Serverless Reference Architecture: Real-time File Processing The Real-time File Processing reference architecture is a general-purpose, event-driven,

AWS Samples 436 Oct 7, 2022
Gleam is a simple Scheme language interpreter written in Java.

Gleam Scheme Interpreter (c) 2001-2020 Guglielmo Nigri (guglielmonigri at yahoo.it, googlielmo at gmail.com) Gleam comes with ABSOLUTELY NO WARRANTY.

Guglielmo Nigri 2 Jun 6, 2022
FundurASM - A Assembly-like language interpreter

FundurASM - A Assembly-like language interpreter This interpreter was written by LordBurtz Licensed under GPLv2, all rights reserved Running it Downlo

null 2 Jan 31, 2022
Spine - a language created for the purpose of writing HTML with C styled syntax

Spine is a language created for the purpose of writing HTML with C styled syntax. Although this is a pretty useless projec, it will still be very fun to make and maybe, just maybe remove the backpain from normal HTML.

Spine 3 Mar 19, 2022
Support alternative markup for Apache Maven POM files

Overview Polyglot for Maven is a set of extensions for Maven 3.3.1+ that allows the POM model to be written in dialects other than XML. Several of the

null 828 Dec 17, 2022
Apache FOP is a print formatter driven by XSL formatting objects

Apache FOP is a print formatter driven by XSL formatting objects

The Apache Software Foundation 149 Jan 2, 2023
Cluster manager for Apache Doris

Apache Doris (incubating) Manager The repository contains Manager for Apache Doris (incubating) License Apache License, Version 2.0 Report issues or s

The Apache Software Foundation 96 Jan 4, 2023
Flink/Spark Connectors for Apache Doris

Flink/Spark Connectors for Apache Doris

The Apache Software Foundation 30 Dec 7, 2022
BitBase is a Client-Server based Crypto trading platform which offers live pricing, dynamic charts, user portfolio, account settings... and much more!

BitBase-Crypto-Trading-Platform BitBase is a Client-Server based Crypto trading platform which offers live pricing, dynamic charts, user portfolio, ac

null 4 Feb 11, 2022
Prism (Refracted) is a change-tracking plugin for Bukkit-based servers

Prism (Refracted) is a change-tracking plugin for Bukkit-based servers. Supports rollbacks, restores, previews, wands, and so much more. Tracking so good, the NSA stole our name.

Darkhelmet 29 Dec 30, 2022
Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Automatically discover and tag PII data across BigQuery tables and apply column-level access controls based on confidentiality level.

Google Cloud Platform 18 Dec 29, 2022
Spigot plugin featuring a wide variety of features for a server based on modules.

CTSNC, standing for Custom Chat, Tablist, Scoreboard, NameTag & Chat, is a all-round solution based on multiple modules each featuring a dedicated function while CTSNC acts as the core. Here all configuration files are housed for easy management and customization.

null 2 Dec 30, 2022
Java XML library. A really cool one. Obviously.

XMLBeam This is a Java XML library with an extraordinary expressive API. By using XPath for read and write operations, many operations take only one l

Sven Ewald 70 Aug 25, 2022
icecream-java is a Java port of the icecream library for Python.

icecream-java is a Java port of the icecream library for Python.

Akshay Thakare 20 Apr 7, 2022
JPassport works like Java Native Access (JNA) but uses the Foreign Linker API instead of JNI. Similar to JNA, you declare a Java interface that is bound to the external C library using method names.

JPassport works like Java Native Access (JNA) but uses the Foreign Linker API instead of JNI. Similar to JNA, you declare a Java interface t

null 28 Dec 30, 2022
This library provides facilities to match an input string against a collection of regex patterns.

This library provides facilities to match an input string against a collection of regex patterns. This library acts as a wrapper around the popular Chimera library, which allows it to be used in Java.

Sahab 5 Oct 26, 2022
Java serialization library, proto compiler, code generator

A java serialization library with built-in support for forward-backward compatibility (schema evolution) and validation. efficient, both in speed and

protostuff 1.9k Dec 23, 2022
Discord IPC - Pure Java 16 library

Pure Java 16 library for interacting with locally running Discord instance without the use of JNI.

Meteor Development 8 Nov 14, 2022
Scaffolding is a library for Minestom that allows you to load and place schematics.

This library is very early in development and has too many bugs to count. For your own safety, you should not use it in a production environment.

Crystal Games 18 Nov 29, 2022