CogComp's Natural Language Processing libraries and Demos:

Overview

CogCompNLP

Build Status Build status

This project collects a number of core libraries for Natural Language Processing (NLP) developed by Cognitive Computation Group.

How to use it?

Depending on what you are after, follow one of the items:

  • If you want to annotate your raw text (i.e. no need to open the annotator boxes to retrain them) you should look into the pipeline.
  • If you want to train and test an NLP annotator (i.e. you want to open an annotator box), see the list of components below and choose the desired one. We recommend using JDK8, as no other versions are officially supported and tested.
  • If you want to read a corpus you should look into the corpus-readers module.
  • If you want to do feature-extraction you should look into edison module.

CogComp's main NLP libraries

Each library contains detailed readme and instructions on how to use it. In addition the javadoc of the whole project is available here.

Module Description
nlp-pipeline Provides an end-to-end NLP processing application that runs a variety of NLP tools on input text.
core-utilities Provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc.
corpusreaders Provides classes to read documents from corpora into core-utilities data structures.
curator Supports use of CogComp NLP Curator, a tool to run NLP applications as services.
edison A library for feature extraction from core-utilities data structures.
lemmatizer An application that uses WordNet and simple rules to find the root forms of words in plain text.
tokenizer An application that identifies sentence and word boundaries in plain text.
transliteration An application that transliterates names between different scripts.
pos An application that identifies the part of speech (e.g. verb + tense, noun + number) of each word in plain text.
ner An application that identifies named entities in plain text according to two different sets of categories.
md An application that identifies entity mentions in plain text.
relation-extraction An application that identifies entity mentions, then identify relation pairs among the mentions detected.
quantifier This tool detects mentions of quantities in the text, as well as normalizes it to a standard form.
inference A suite of unified wrappers to a set optimization libraries, as well as some basic approximate solvers.
depparse An application that identifies the dependency parse tree of a sentence.
verbsense This system addresses the verb sense disambiguation (VSD) problem for English.
prepsrl An application that identifies semantic relations expressed by prepositions and develops statistical learning models for predicting the relations.
commasrl This software extracts relations that commas participate in.
similarity This software compare objects --especially Strings-- and return a score indicating how similar they are.
temporal-normalizer A temporal extractor and normalizer.
dataless-classifier Classifies text into a user-specified label hierarchy from just the textual label descriptions
external-annotators A collection useful external annotators.
  • Questions? Have a look at our FAQs.

Using each library programmatically

To include one of the modules in your Maven project, add the following snippet with the #modulename# and #version entries replaced with the relevant module name and the version listed in this project's pom.xml file. Note that you also add to need the <repository> element for the CogComp maven repository in the <repositories> element.

    <dependencies>
         ...
        <dependency>
            <groupId>edu.illinois.cs.cogcomp</groupId>
            <artifactId>#modulename#</artifactId>
            <version>#version#</version>
        </dependency>
        ...
    </dependencies>
    ...
    <repositories>
        <repository>
            <id>CogCompSoftware</id>
            <name>CogCompSoftware</name>
            <url>http://cogcomp.org/m2repo/</url>
        </repository>
    </repositories>

Citing

If you are using the framework, please cite our paper:

@inproceedings{2018_lrec_cogcompnlp,
    author = {Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth},
    title = {CogCompNLP: Your Swiss Army Knife for NLP},
    booktitle = {11th Language Resources and Evaluation Conference},
    year = {2018},
    url = "http://cogcomp.org/papers/2018_lrec_cogcompnlp.pdf",
}
Comments
  • break apart illinois-common-resources

    break apart illinois-common-resources

    ...into several smaller pieces. Gazetteers can go in one; brown clusters (and maybe other clusters) in another. This may help reduce the problems with large dependencies in CI.

    opened by cogcomp-dev 34
  • DVector and dependencies

    DVector and dependencies

    The pom.xml for ner specifies a dependency on LBJava 1.2.14, but that version contains DVector, which in fact is now moved to core-utilities. Should that be changed to version 1.2.24? Seems the learners in 1.2.14 will be using the version in that jar rather than the one in core-utilities.

    opened by cowchipkid 29
  • Add in-house dependency parser

    Add in-house dependency parser

    Since it will be helpful for many projects, it might a good idea to incorporate Mengxiong's dependency parser (based on @IllinoisCogComp/illinois-sl) to the main set of annotators -- that way we can get rid of the Stanford NLP dependency (at least if constituency parsing is not needed).

    The latest code should be in Mengxion's Gitlab repo, but maybe Mengxiong can point you to a more recent version if he's still working with the group.

    Initial assignment is just for registering the issue, please re-assign accordingly.

    Pipeline Dependency Parsing Stanford-nlp 
    opened by christos-c 23
  • Pipeline SRL Verb Fails Weirdly

    Pipeline SRL Verb Fails Weirdly

    I'm trying to use pipeline to process some raw text and output a TextAnnotation with verb SRL. This is kind of urgent since I'm deploying my package as an online demo and the demo paper is due on 6/1. Originally I was using curator, but the curator failed frequently (not always). Then I switched to pipeline but my pipeline always fails. Any suggestions to workaround this would be great. @danyaljj @mssammon @HornHehhf All I need isverb SRL.

    I checkout the latest version of cogcomp-nlp. Here's the main function I use:

    public static void main(String[] args) throws Exception{
            String text = "Helicopters patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace.";
            ResourceManager userConfig = new ResourceManager("pipeline/config/pipeline-config.properties");
            AnnotatorService pipeline = PipelineFactory.buildPipeline(userConfig);
            TextAnnotation ta = pipeline.createAnnotatedTextAnnotation( "", "", text );
            System.out.println();
        }
    

    I got this error:

    Connected to the target VM, address: '127.0.0.1:39575', transport: 'socket'
    14:11:02 INFO  DepAnnotator:66 - Loading struct-perceptron-auto-20iter.model into temp file: tmp345673.model
    14:11:03 INFO  SLModel:88 - Load trained Models.....
    14:11:05 INFO  SLModel:97 - Load Model complete!
    14:11:05 INFO  LabeledChuLiuEdmondsDecoder:72 - Loading cached PoS-to-dep dictionary from deprels.dict
    14:11:06 ERROR BasicAnnotatorService:403 - The annotator for view SRL_VERB failed. Skipping the view . . . 
    14:11:06 ERROR BasicAnnotatorService:403 - The annotator for view DEPENDENCY failed. Skipping the view . . . 
    edu.illinois.cs.cogcomp.annotation.AnnotatorException: View 'NER_CONLL' cannot be provided by this AnnotatorService.
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addView(BasicAnnotatorService.java:308)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addView(BasicAnnotatorService.java:313)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addViewsAndCache(BasicAnnotatorService.java:400)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.createAnnotatedTextAnnotation(BasicAnnotatorService.java:378)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.createAnnotatedTextAnnotation(BasicAnnotatorService.java:193)
    	at edu.illinois.cs.cogcomp.pipeline.main.test.main(test.java:12)
    edu.illinois.cs.cogcomp.annotation.AnnotatorException: View 'SHALLOW_PARSE' cannot be provided by this AnnotatorService.
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addView(BasicAnnotatorService.java:308)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addView(BasicAnnotatorService.java:313)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.addViewsAndCache(BasicAnnotatorService.java:400)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.createAnnotatedTextAnnotation(BasicAnnotatorService.java:378)
    	at edu.illinois.cs.cogcomp.annotation.BasicAnnotatorService.createAnnotatedTextAnnotation(BasicAnnotatorService.java:193)
    	at edu.illinois.cs.cogcomp.pipeline.main.test.main(test.java:12)
    
    Disconnected from the target VM, address: '127.0.0.1:39575', transport: 'socket'
    
    Process finished with exit code 0
    
    
    opened by qiangning 22
  • New NerBenchmark features and ere-reader

    New NerBenchmark features and ere-reader

    Added an ERE reader, but also added a -release command line arg to NerBenchmark to have it build a final release using Test and Train data to train against, and Dev data for automatic convergence.

    opened by cowchipkid 19
  • Lazyannotator

    Lazyannotator

    Changed Annotator to allow lazy initialization. Key is that all users must call "getView()" and not "addView()", b.c. "getView()" is the public, generic way to get the view and "addView()" is the annotator-specific implementation, and I wanted a uniform mechanism.... I changed a number of Annotators (POS, Chunker, NER, Lemmatizer, SimpleGazetteerAnnotator) to use lazy init by default, and added some basic tests.

    opened by mssammon 19
  • Documentation for Java API

    Documentation for Java API

    Hi,

    I downloaded the latest (stable?) version 3.0.23 from:

    https://cogcomp.cs.illinois.edu/page/download_view/NETagger

    I cannot find the documentation, and the Read Me file is to thin on details. I want to call the NER on a string, have entities tagged, and get the entities as well as their positions in text. I use Java.

    I noticed the Github Read Me is more detailed, but cannot be used with the 3.0.23 version.

    I know I can probably download the latest version from Github and build a .jar (that is, if the Github version is tested), but I thought I would ask to see if there is documentation for the 3.0.23 version or a dist jar for the latest version already.

    Thank you.

    opened by mortezakz 18
  • Release updated chunker

    Release updated chunker

    i.e. the download here: http://cogcomp.cs.illinois.edu/page/software_view/Chunker

    make sure the correct models are deployed (deploy new ones to maven if necessary) run the local test to make sure it behaves as expected.

    opened by mssammon 18
  • Added illinois-depparse

    Added illinois-depparse

    Taking over the project from @mliu-dark-knight and creating the first attempt at merging with the main repo (this will close #178)

    This code will not work out of the box since the training of the models hasn't finished yet.

    Just wanted to have this as a statement of intent.

    opened by christos-c 17
  • NER gazetteer

    NER gazetteer

    Trying to run your latest version NER, 3.1.23, and this is the error I'm getting:

    Downloading the folder from datastore . . . GroupId: readonly.org.cogcomp.gazetteers ArtifactId: 1.5\gazetteers.zip The target C:\Users\Morteza.cogcomp-datastore\readonly.org.cogcomp.gazetteers\1.5\gazetteers already exists. Skipping download from the datastore . . . java.io.FileNotFoundException: C:\Users\Morteza.cogcomp-datastore\readonly.org.cogcomp.gazetteers\1.5\gazetteers\gazetteers\gazetteers-list.txt (The system cannot find the path specified) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at java.io.FileInputStream.(FileInputStream.java:93) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:67) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.init(GazetteersFactory.java:54) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:312) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:57) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:101) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:125) .... Model file E:\Codes\ner\models\CoNLL_enron.model.level1 located in a jar file Model file E:\Codes\ner\models\CoNLL_enron.model.level2 located in a jar file 22:53:33.653 [main] ERROR e.i.cs.cogcomp.ner.NERAnnotator - Cannot annotate the text, the exception was: java.lang.NullPointerException: null at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.ExpressiveFeaturesAnnotator.annotate(ExpressiveFeaturesAnnotator.java:73) ~[illinois-ner-3.1.23.jar:na] at edu.illinois.cs.cogcomp.ner.NERAnnotator.addView(NERAnnotator.java:152) ~[illinois-ner-3.1.23.jar:na] .. Exception in thread "main" java.lang.IllegalArgumentException: View NER_CONLL not found at edu.illinois.cs.cogcomp.core.datastructures.textannotation.AbstractTextAnnotation.getView(AbstractTextAnnotation.java:134) ..

    opened by mortezakz 16
  • How to pass run-time parameters to Annotators?

    How to pass run-time parameters to Annotators?

    @mssammon any thoughts on what is a best way to pass a run-time parameter to an Annotator? @666666fzl has this annotator for temporal package, which could [optionally] accept an extra parameter indicating "document creation time". Is there a clean way to pass parameters to Annotators? (for initialization I know we can use ResourceManager).

    How about we create a addView function (optional to implement) such that also accepts a ResourceManager object?

    opened by danyaljj 15
  • Bump protobuf-java from 3.16.1 to 3.16.3 in /core-utilities

    Bump protobuf-java from 3.16.1 to 3.16.3 in /core-utilities

    Bumps protobuf-java from 3.16.1 to 3.16.3.

    Release notes

    Sourced from protobuf-java's releases.

    Protobuf Release v3.16.3

    Java

    • Refactoring java full runtime to reuse sub-message builders and prepare to migrate parsing logic from parse constructor to builder.
    • Move proto wireformat parsing functionality from the private "parsing constructor" to the Builder class.
    • Change the Lite runtime to prefer merging from the wireformat into mutable messages rather than building up a new immutable object before merging. This way results in fewer allocations and copy operations.
    • Make message-type extensions merge from wire-format instead of building up instances and merging afterwards. This has much better performance.
    • Fix TextFormat parser to build up recurring (but supposedly not repeated) sub-messages directly from text rather than building a new sub-message and merging the fully formed message into the existing field.
    • This release addresses a Security Advisory for Java users
    Commits
    • b8c2488 Updating version.json and repo version numbers to: 16.3
    • 42e47e5 Refactoring Java parsing (3.16.x) (#10668)
    • 98884a8 Merge pull request #10556 from deannagarcia/3.16.x
    • 450b648 Cherrypick ruby fixes for monterey
    • b17bb39 Merge pull request #10548 from protocolbuffers/3.16.x-202209131829
    • c18f5e7 Updating changelog
    • 6f4e817 Updating version.json and repo version numbers to: 16.2
    • a7d4e94 Merge pull request #10547 from deannagarcia/3.16.x
    • 55815e4 Apply patch
    • 152d7bf Update version.json with "lts": true (#10535)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    Dependency Parsing 
    opened by dependabot[bot] 0
  • Bump uimaj-core from 2.8.1 to 2.10.2 in /temporal-normalizer

    Bump uimaj-core from 2.8.1 to 2.10.2 in /temporal-normalizer

    Bumps uimaj-core from 2.8.1 to 2.10.2.

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    Dependency Parsing 
    opened by dependabot[bot] 0
  • Problem Running NER/Downloading Gazetteer

    Problem Running NER/Downloading Gazetteer

    Hi! I am trying to use the runNER.sh script to annotate data, as well as use the runBenchmark.sh script in the downloaded version of illinois-ner to evaluate the model on CoNLL-2003 test set. However, when running both scripts, I encountered the following issue where it appeared to be the case that the gazetteer cannot be downloaded. The error log after running the runNER.sh is pasted below:

    log4j:WARN No appenders could be found for logger (edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Downloading the folder from datastore . . . GroupId: readonly.org.cogcomp.gazetteers ArtifactId: 1.6/gazetteers.zip augmentedGroupId: readonly.org.cogcomp.gazetteers versionedFileName: 1.6/gazetteers.zip zippedFileName: ?/.cogcomp-datastore/tmp/1.6/gazetteers.zip java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at com.squareup.okhttp.internal.Platform.connectSocket(Platform.java:101) at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:137) at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108) at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) at com.squareup.okhttp.Call.getResponse(Call.java:286) at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) at com.squareup.okhttp.Call.execute(Call.java:80) at io.minio.MinioClient.execute(MinioClient.java:826) at io.minio.MinioClient.executeHead(MinioClient.java:1018) at io.minio.MinioClient.statObject(MinioClient.java:1154) at io.minio.MinioClient.getObject(MinioClient.java:1343) at org.cogcomp.Datastore.getDirectory(Datastore.java:556) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:56) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:119) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126) at edu.illinois.cs.cogcomp.annotation.Annotator.lazyAddView(Annotator.java:201) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:167) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97) java.io.FileNotFoundException: ?/.cogcomp-datastore/tmp/1.6/gazetteers.zip (No such file or directory) [60/1887] at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.cogcomp.ZipHelper.unZipIt(ZipHelper.java:71) at org.cogcomp.Datastore.getDirectory(Datastore.java:585) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:71) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:56) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:119) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126) at edu.illinois.cs.cogcomp.annotation.Annotator.lazyAddView(Annotator.java:201) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:167) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97) zippedFileName: ?/.cogcomp-datastore/tmp/1.6/gazetteers.zip path: ?/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.6/gazetteers artifactId: gazetteers java.io.FileNotFoundException: ?/.cogcomp-datastore/readonly.org.cogcomp.gazetteers/1.6/gazetteers/gazetteers/gazetteers-list.txt (No s uch file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.init(TreeGazetteers.java:72) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.TreeGazetteers.<init>(TreeGazetteers.java:50) at edu.illinois.cs.cogcomp.ner.ExpressiveFeatures.GazetteersFactory.get(GazetteersFactory.java:50) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readAndLoadConfig(Parameters.java:265) at edu.illinois.cs.cogcomp.ner.LbjTagger.Parameters.readConfigAndLoadExternalData(Parameters.java:56) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:119) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126) at edu.illinois.cs.cogcomp.annotation.Annotator.lazyAddView(Annotator.java:201) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:167) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97) Downloading the folder from datastore . . . GroupId: readonly.edu.illinois.cs.cogcomp.ner ArtifactId: 4.0/ner-model-enron-conll-all-data.zip augmentedGroupId: readonly.edu.illinois.cs.cogcomp.ner versionedFileName: 4.0/ner-model-enron-conll-all-data.zip zippedFileName: ?/.cogcomp-datastore/tmp/4.0/ner-model-enron-conll-all-data.zip java.net.SocketTimeoutException: connect timed out [12/1887] at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394) at java.net.Socket.connect(Socket.java:606) at com.squareup.okhttp.internal.Platform.connectSocket(Platform.java:101) at com.squareup.okhttp.internal.io.RealConnection.connectSocket(RealConnection.java:137) at com.squareup.okhttp.internal.io.RealConnection.connect(RealConnection.java:108) at com.squareup.okhttp.internal.http.StreamAllocation.findConnection(StreamAllocation.java:184) at com.squareup.okhttp.internal.http.StreamAllocation.findHealthyConnection(StreamAllocation.java:126) at com.squareup.okhttp.internal.http.StreamAllocation.newStream(StreamAllocation.java:95) at com.squareup.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:281) at com.squareup.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:224) at com.squareup.okhttp.Call.getResponse(Call.java:286) at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243) at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205) at com.squareup.okhttp.Call.execute(Call.java:80) at io.minio.MinioClient.execute(MinioClient.java:826) at io.minio.MinioClient.executeHead(MinioClient.java:1018) at io.minio.MinioClient.statObject(MinioClient.java:1154) at io.minio.MinioClient.getObject(MinioClient.java:1343) at org.cogcomp.Datastore.getDirectory(Datastore.java:556) at edu.illinois.cs.cogcomp.ner.ModelLoader.load(ModelLoader.java:104) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:123) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126) at edu.illinois.cs.cogcomp.annotation.Annotator.lazyAddView(Annotator.java:201) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:167) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97) java.io.FileNotFoundException: ?/.cogcomp-datastore/tmp/4.0/ner-model-enron-conll-all-data.zip (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at java.io.FileInputStream.<init>(FileInputStream.java:93) at org.cogcomp.ZipHelper.unZipIt(ZipHelper.java:71) at org.cogcomp.Datastore.getDirectory(Datastore.java:585) at edu.illinois.cs.cogcomp.ner.ModelLoader.load(ModelLoader.java:104) at edu.illinois.cs.cogcomp.ner.NERAnnotator.initialize(NERAnnotator.java:123) at edu.illinois.cs.cogcomp.annotation.Annotator.doInitialize(Annotator.java:126) at edu.illinois.cs.cogcomp.annotation.Annotator.lazyAddView(Annotator.java:201) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:167) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97) zippedFileName: ?/.cogcomp-datastore/tmp/4.0/ner-model-enron-conll-all-data.zip path: ?/.cogcomp-datastore/readonly.edu.illinois.cs.cogcomp.ner/4.0/ner-model-enron-conll-all-data artifactId: ner-model-enron-conll-all-data java.lang.IllegalArgumentException: View NER_CONLL not found at edu.illinois.cs.cogcomp.core.datastructures.textannotation.AbstractTextAnnotation.getView(AbstractTextAnnotation.java:134) at edu.illinois.cs.cogcomp.annotation.Annotator.getView(Annotator.java:168) at edu.illinois.cs.cogcomp.ner.Main.processInputFile(Main.java:544) at edu.illinois.cs.cogcomp.ner.Main.execute(Main.java:392) at edu.illinois.cs.cogcomp.ner.Main.processCommand(Main.java:168) at edu.illinois.cs.cogcomp.ner.AbstractMain.run(AbstractMain.java:97)

    I have already compiled the code as described in README and everything was built successfully. May I ask how this issue can be resolved, or if this is due to an expired link somewhere? Thank you!

    opened by ShuhengL 0
  • Bump gson from 2.3.1 to 2.8.9 in /big-data-utils

    Bump gson from 2.3.1 to 2.8.9 in /big-data-utils

    Bumps gson from 2.3.1 to 2.8.9.

    Release notes

    Sourced from gson's releases.

    Gson 2.8.9

    • Make OSGi bundle's dependency on sun.misc optional (#1993).
    • Deprecate Gson.excluder() exposing internal Excluder class (#1986).
    • Prevent Java deserialization of internal classes (#1991).
    • Improve number strategy implementation (#1987).
    • Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990).
    • Support arbitrary Number implementation for Object and Number deserialization (#1290).
    • Bump proguard-maven-plugin from 2.4.0 to 2.5.1 (#1980).
    • Don't exclude static local classes (#1969).
    • Fix RuntimeTypeAdapterFactory depending on internal Streams class (#1959).
    • Improve Maven build (#1964).
    • Make dependency on java.sql optional (#1707).

    Gson 2.8.8

    • Fixed issue with recursive types (#1390).
    • Better behaviour with Java 9+ and Unsafe if there is a security manager (#1712).
    • EnumTypeAdapter now works better when ProGuard has obfuscated enum fields (#1495).
    Changelog

    Sourced from gson's changelog.

    Version 2.8.9

    • Make OSGi bundle's dependency on sun.misc optional (#1993).
    • Deprecate Gson.excluder() exposing internal Excluder class (#1986).
    • Prevent Java deserialization of internal classes (#1991).
    • Improve number strategy implementation (#1987).
    • Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990).
    • Support arbitrary Number implementation for Object and Number deserialization (#1290).
    • Bump proguard-maven-plugin from 2.4.0 to 2.5.1 (#1980).
    • Don't exclude static local classes (#1969).
    • Fix RuntimeTypeAdapterFactory depending on internal Streams class (#1959).
    • Improve Maven build (#1964).
    • Make dependency on java.sql optional (#1707).

    Version 2.8.8

    • Fixed issue with recursive types (#1390).
    • Better behaviour with Java 9+ and Unsafe if there is a security manager (#1712).
    • EnumTypeAdapter now works better when ProGuard has obfuscated enum fields (#1495).

    Version 2.8.7

    • Fixed ISO8601UtilsTest failing on systems with UTC+X.
    • Improved javadoc for JsonStreamParser.
    • Updated proguard.cfg (#1693).
    • Fixed IllegalStateException in JsonTreeWriter (#1592).
    • Added JsonArray.isEmpty() (#1640).
    • Added new test cases (#1638).
    • Fixed OSGi metadata generation to work on JavaSE < 9 (#1603).

    Version 2.8.6

    2019-10-04 GitHub Diff

    • Added static methods JsonParser.parseString and JsonParser.parseReader and deprecated instance method JsonParser.parse
    • Java 9 module-info support

    Version 2.8.5

    2018-05-21 GitHub Diff

    • Print Gson version while throwing AssertionError and IllegalArgumentException
    • Moved utils.VersionUtils class to internal.JavaVersion. This is a potential backward incompatible change from 2.8.4
    • Fixed issue google/gson#1310 by supporting Debian Java 9

    Version 2.8.4

    2018-05-01 GitHub Diff

    • Added a new FieldNamingPolicy, LOWER_CASE_WITH_DOTS that mapps JSON name someFieldName to some.field.name
    • Fixed issue google/gson#1305 by removing compile/runtime dependency on sun.misc.Unsafe

    Version 2.8.3

    2018-04-27 GitHub Diff

    • Added a new API, GsonBuilder.newBuilder() that clones the current builder
    • Preserving DateFormatter behavior on JDK 9

    ... (truncated)

    Commits
    • 6a368d8 [maven-release-plugin] prepare release gson-parent-2.8.9
    • ba96d53 Fix missing bounds checks for JsonTreeReader.getPath() (#2001)
    • ca1df7f #1981: Optional OSGi bundle's dependency on sun.misc package (#1993)
    • c54caf3 Deprecate Gson.excluder() exposing internal Excluder class (#1986)
    • e6fae59 Prevent Java deserialization of internal classes (#1991)
    • bda2e3d Improve number strategy implementation (#1987)
    • cd748df Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990)
    • fe30b85 Support arbitrary Number implementation for Object and Number deserialization...
    • 1cc1627 Fix incorrect feature request template label (#1982)
    • 7b9a283 Bump bnd-maven-plugin from 5.3.0 to 6.0.0 (#1985)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    Dependency Parsing 
    opened by dependabot[bot] 0
  • Bump gson from 2.3.1 to 2.8.9 in /core-utilities

    Bump gson from 2.3.1 to 2.8.9 in /core-utilities

    Bumps gson from 2.3.1 to 2.8.9.

    Release notes

    Sourced from gson's releases.

    Gson 2.8.9

    • Make OSGi bundle's dependency on sun.misc optional (#1993).
    • Deprecate Gson.excluder() exposing internal Excluder class (#1986).
    • Prevent Java deserialization of internal classes (#1991).
    • Improve number strategy implementation (#1987).
    • Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990).
    • Support arbitrary Number implementation for Object and Number deserialization (#1290).
    • Bump proguard-maven-plugin from 2.4.0 to 2.5.1 (#1980).
    • Don't exclude static local classes (#1969).
    • Fix RuntimeTypeAdapterFactory depending on internal Streams class (#1959).
    • Improve Maven build (#1964).
    • Make dependency on java.sql optional (#1707).

    Gson 2.8.8

    • Fixed issue with recursive types (#1390).
    • Better behaviour with Java 9+ and Unsafe if there is a security manager (#1712).
    • EnumTypeAdapter now works better when ProGuard has obfuscated enum fields (#1495).
    Changelog

    Sourced from gson's changelog.

    Version 2.8.9

    • Make OSGi bundle's dependency on sun.misc optional (#1993).
    • Deprecate Gson.excluder() exposing internal Excluder class (#1986).
    • Prevent Java deserialization of internal classes (#1991).
    • Improve number strategy implementation (#1987).
    • Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990).
    • Support arbitrary Number implementation for Object and Number deserialization (#1290).
    • Bump proguard-maven-plugin from 2.4.0 to 2.5.1 (#1980).
    • Don't exclude static local classes (#1969).
    • Fix RuntimeTypeAdapterFactory depending on internal Streams class (#1959).
    • Improve Maven build (#1964).
    • Make dependency on java.sql optional (#1707).

    Version 2.8.8

    • Fixed issue with recursive types (#1390).
    • Better behaviour with Java 9+ and Unsafe if there is a security manager (#1712).
    • EnumTypeAdapter now works better when ProGuard has obfuscated enum fields (#1495).

    Version 2.8.7

    • Fixed ISO8601UtilsTest failing on systems with UTC+X.
    • Improved javadoc for JsonStreamParser.
    • Updated proguard.cfg (#1693).
    • Fixed IllegalStateException in JsonTreeWriter (#1592).
    • Added JsonArray.isEmpty() (#1640).
    • Added new test cases (#1638).
    • Fixed OSGi metadata generation to work on JavaSE < 9 (#1603).

    Version 2.8.6

    2019-10-04 GitHub Diff

    • Added static methods JsonParser.parseString and JsonParser.parseReader and deprecated instance method JsonParser.parse
    • Java 9 module-info support

    Version 2.8.5

    2018-05-21 GitHub Diff

    • Print Gson version while throwing AssertionError and IllegalArgumentException
    • Moved utils.VersionUtils class to internal.JavaVersion. This is a potential backward incompatible change from 2.8.4
    • Fixed issue google/gson#1310 by supporting Debian Java 9

    Version 2.8.4

    2018-05-01 GitHub Diff

    • Added a new FieldNamingPolicy, LOWER_CASE_WITH_DOTS that mapps JSON name someFieldName to some.field.name
    • Fixed issue google/gson#1305 by removing compile/runtime dependency on sun.misc.Unsafe

    Version 2.8.3

    2018-04-27 GitHub Diff

    • Added a new API, GsonBuilder.newBuilder() that clones the current builder
    • Preserving DateFormatter behavior on JDK 9

    ... (truncated)

    Commits
    • 6a368d8 [maven-release-plugin] prepare release gson-parent-2.8.9
    • ba96d53 Fix missing bounds checks for JsonTreeReader.getPath() (#2001)
    • ca1df7f #1981: Optional OSGi bundle's dependency on sun.misc package (#1993)
    • c54caf3 Deprecate Gson.excluder() exposing internal Excluder class (#1986)
    • e6fae59 Prevent Java deserialization of internal classes (#1991)
    • bda2e3d Improve number strategy implementation (#1987)
    • cd748df Fix LongSerializationPolicy null handling being inconsistent with Gson (#1990)
    • fe30b85 Support arbitrary Number implementation for Object and Number deserialization...
    • 1cc1627 Fix incorrect feature request template label (#1982)
    • 7b9a283 Bump bnd-maven-plugin from 5.3.0 to 6.0.0 (#1985)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    Dependency Parsing 
    opened by dependabot[bot] 0
Releases(4.0.13)
  • 4.0.13(Sep 10, 2018)

    NER now supports use of multiple models within same VM context. Improved documentation for configuration in pipeline and in core-utilities. Various minor fixes to improve performance (core-utilities, Chunker). Minor improvements to StatefulTokenizer.

    Source code(tar.gz)
    Source code(zip)
  • 4.0.12(Aug 3, 2018)

    Changes:

    • Added the ability to use JSON Serialized Format with NerTagger #676
    • bugs with multiple NERAnnotators per process space #675
    • Revert "Incremental Training" #672
    Source code(tar.gz)
    Source code(zip)
  • 4.0.10(Jul 25, 2018)

    1. fix the timex bug when may and sat appear as verbs instead of timexe #663
    2. Incremental Training #667
    3. Added get and post functions for adding Views to JsonStr serialized TA #671
    Source code(tar.gz)
    Source code(zip)
  • 4.0.9(Jul 22, 2018)

    CoreUtils:

    • BasicTextAnnotationBuilder.java now accepts list of list of tokens #670
    • fixed a bug in json serializer. Also updated to explicitly store and… #662
    • fix a TextAnnotation builder bug on Windows #639
    • deleted duplicated DBHelper.java #632

    NER:

    • NER training #666
    • NER Model Loading #654

    CorpusReaders:

    • Add MascXCESReaderTest corpus to resources #650
    • Ontonotes 5 readers #627
    • TACReader #615

    Chunker:

    • Chunker training data fix #627

    Similarity:

    • Allow user to provide types for one or both names in NESim.compare() #625

    Tokenizer:

    • Fix common dates like "10/14/2016" are not parsed to a single token #654
    • Add option to split on multiple newlines, capture emails as single token. #647

    Temporal normalizer:

    • Bug fixes #646
    • Fix temporal component cannot capture obvious timexes #636
    • Populate timex normalization type to the TIMEX View in TextAnnotation #630
    Source code(tar.gz)
    Source code(zip)
  • 4.0.2(Feb 17, 2018)

  • 4.0.1(Dec 17, 2017)

  • 4.0.0(Nov 10, 2017)

    • Fixes in the readmes #585
    • Clean up old dependencies in transliterator #585
    • Double to Float when loading models #583
    • An ACE reader with TrueCaser #581
    Source code(tar.gz)
    Source code(zip)
  • 3.1.35(Oct 30, 2017)

    • Added relation extraction #572
    • Added transliteration models #577
    • Extend the list of languages and add ISO 639-3 standard 3-digit ids, with @mayhewsw 's suggestion. #576
    • AnnotatorService can receive parameters now parameters. #576
    • Ignore a few external test, to make CIs faster #577
    Source code(tar.gz)
    Source code(zip)
  • 3.1.34(Oct 24, 2017)

    • Adding Transliteration #563
    • Limit testing logs #564
    • Propbank readers for Ontonotes 5 #569
    • Improvements to MD and NER #570
    • support for initializing MD with local model #571
    Source code(tar.gz)
    Source code(zip)
  • 3.1.22(Oct 16, 2017)

  • 3.1.33(Oct 16, 2017)

Owner
CogComp
Cognitive Computation Group, led by Prof. Dan Roth
CogComp
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.

twitter-text This repository is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, tes

Twitter 2.9k Jan 8, 2023
An efficient and flexible token-based regular expression language and engine.

OpenRegex OpenRegex is written by Michael Schmitz at the Turing Center http://turing.cs.washington.edu/. It is licensed under the lesser GPL. Please s

KnowItAll 74 Jul 12, 2022
A fast and accurate POS and morphological tagging toolkit (EACL 2014)

RDRPOSTagger RDRPOSTagger is a robust and easy-to-use toolkit for POS and morphological tagging. It employs an error-driven approach to automatically

Dat Quoc Nguyen 137 Sep 9, 2022
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

null 900 Jan 2, 2023
Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text

Welcome to Apache OpenNLP! The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. This toolkit is

The Apache Software Foundation 1.2k Dec 29, 2022
👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike

Quick Info this library tries to solve language detection of very short words and phrases, even shorter than tweets makes use of both statistical and

Peter M. Stahl 532 Dec 28, 2022
For English vocabulary analysis and sentence analysis in natural language, model trainin

Sword Come ?? For English vocabulary analysis and sentence analysis in natural language, model training, intelligent response and emotion analysis rea

James Zow 2 Apr 9, 2022
Language-Natural Persistence Layer for Java

Permazen is a better persistence layer for Java Persistence is central to most applications. But there are many challenges involved in persistence pro

Permazen 322 Dec 12, 2022
lazy-language-loader improves loading times when changing your language by only reloading the language instead of all the game resources!

lazy-language-loader lazy-language-loader improves loading times when changing your language by only reloading the language instead of all the game re

Shalom Ademuwagun 7 Sep 7, 2022
Stream Processing and Complex Event Processing Engine

Siddhi Core Libraries Siddhi is a cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to captur

Siddhi - Cloud Native Stream Processor 1.4k Jan 6, 2023
esProc SPL is a scripting language for data processing, with well-designed rich library functions and powerful syntax, which can be executed in a Java program through JDBC interface and computing independently.

esProc esProc is the unique name for esProc SPL package. esProc SPL is an open-source programming language for data processing, which can perform comp

null 990 Dec 27, 2022
Makes fire created by natural lightning cosmetic, meaning no blocks are destroyed from bad weather

Lightning Podoboo Makes fire created by natural lightning cosmetic, meaning no blocks are destroyed from bad weather. Keep the doFireTick gamerule ena

Lilly Rose Berner 10 Dec 15, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Dec 28, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Jan 3, 2023
This application can recognize the sign language alphabets and help people who do not understand sign language to communicate with the speech and hearing impaired.

Sign Language Recognition App This application can recognize the sign language alphabets and help people who do not understand sign language to commun

Mihir Gandhi 12 Oct 7, 2021
Kotlin-decompiled - (Almost) every single language construct of the Kotlin programming language compiled to JVM bytecode and then decompiled to Java again for better readability

Kotlin: Decompiled (Almost) every single language construct of the Kotlin programming language compiled to JVM bytecode and then decompiled to Java ag

The Self-Taught Software Engineer 27 Dec 14, 2022
Jamal is a macro language (JAmal MAcro Language)

Jamal Macro Language Jamal is a complex text processor with a wide variety of possible use. The first version of Jamal was developed 20 years ago in P

Peter Verhas 29 Dec 20, 2022
For Jack language. Most of codes were commented with their usage, which can be useful for beginner to realize the running principle of a compiler for object-oriented programming language.

Instructions: Download the Java source codes Store these codes into a local folder and open this folder Click the right key of mouse and click ‘Open i

gooooooood 1.1k Jan 5, 2023
Netflix, Inc. 23.1k Jan 5, 2023