CMU ARK Twitter Part-of-Speech Tagger v0.3.2 http://www.ark.cs.cmu.edu/TweetNLP/ Basic usage for released version ================================ Requires Java 6. To run the tagger on example data, try: java -Xmx500m -jar ark-tweet-nlp-0.3.2.jar examples/example_tweets.txt where the jar file is the one included in the release download. The tagger outputs tokens, predicted part-of-speech tags, and confidences. Use the "--help" flag for more information. On Unix systems, "./runTagger.sh" invokes the tagger; e.g. ./runTagger.sh examples/example_tweets.txt ./runTagger.sh --help We also include a script that invokes just the tokenizer: ./twokenize.sh examples/example_tweets.txt You may have to adjust the parameters to "java" depending on your system. If instead you are using a source checkout, see docs/hacking.txt for info. Information =========== Version 0.3 of the tagger is much faster and more accurate. Please see the tech report on the website for details. For the Java API, see src/cmu/arktweetnlp; especially Tagger.java. See also documentation in docs/ and src/cmu/arktweetnlp/package.html. This tagger is described in the following two papers, available at the website. Please cite these if you write a research paper using this software. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith In Proceedings of the Annual Meeting of the Association for Computational Linguistics, companion volume, Portland, OR, June 2011. http://www.ark.cs.cmu.edu/TweetNLP/gimpel+etal.acl11.pdf Part-of-Speech Tagging for Twitter: Word Clusters and Other Advances Olutobi Owoputi, Brendan O'Connor, Chris Dyer, Kevin Gimpel, and Nathan Schneider. Technical Report, Machine Learning Department. CMU-ML-12-107. September 2012. Contact ======= Please contact Brendan O'Connor ([email protected]) and Kevin Gimpel ([email protected]) if you encounter any problems.
CMU ARK Twitter Part-of-Speech Tagger
Overview
Comments
-
Project refactoring
As talked about in #15 I tried to refactor the project so that is a little easier to use. I removed the jargs dependency in the maven project because it's not used at all. The
pom.xml
is more simpler now. I also replaced the add-jars-plugin with a shell script that does what it should. The mentioned plugin manipulated the pom what I consider as evil. Since there are no tests I cannot check if everything works as expected but I trained a model successfully and also tagged the example tweets with the trained model. It all worked. -
Mavenize ark-tweet-nlp
I've made some steps towards being able to use ark-tweet-nlp easily as a library some time ago.
The first part of this was to provide a proper maven build that produces a self-contained jar (not custom zsh build script or hardcoded paths anymore; all dependencies are cleanly specified and I've made all required external files Resources that get bundled with the jar).
I got diverted onto other things before I could do part two (don't always output stuff to stdout etc.) but you might find the mavenization already useful in itself, so I decided to quickly merge in the last master and send you a pull request before my branch bitrots.
I'm happy to clean up things more if required, but first I wanted to see if there is interest in this patch.
-
Setup doesn't work
Hey folks,
I just wanted to try out your tagger, but I can't get it to run. First of I tried following your hacking.txt but no success.
Also the project structure is weird for a java project. So I have some questions about this project:
- Why are you providing the jargs jar? Did you change something in it so you cannot use the standard version that is accessible through maven?
- The same goes for the gnu trove jar that you provide. Any changes made to the library?
- Why are you separating the actual src files into the separate
src
folder in the root of the project while maintaining the resources in theark-tweet-nlp
folder? - Are
metaphone-map2.txt
andptb_ordered_metaphone.txt
that are contained in thelib
directory external resources or are they created by you? If so, why are they in thelib
directory? - Where is the
posBerkeley.jar
from? Is it available to the public (e.g. from here)?
Since I want to use/try/evaluate it, I'm very interested in your project. I'm also experienced with maven, java, eclipse so I could help you with restructuring this stuff.
-
Changed the URL regex to use the "web-only" version of Gruber's URL regex...
I happened to have an implementation of Gruber's URL regex lying around in a different program, and agreed with the comment in the twokenizer about the Larry David stare. :-) I don't know what sorts of unit tests you guys have for the twokenizer, but this seems to work for me, and this particular regex has worked very well for me in other projects.
-
Word Cluster
How to run algorith with word cluster? "\n --word-clusters
Alternate word clusters file (see FeatureExtractor)" + Which filename we have to write here? Thanks in advance -
Bump jackson-databind from 2.0.0 to 2.9.10.7 in /ark-tweet-nlp
Bumps jackson-databind from 2.0.0 to 2.9.10.7.
Commits
- See full diff in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the Security Alerts page.
-
LICENSE Issue GPLv2 compatibility with GPLv3
Hi Brendan,
We are using your library for a twitter application which we plan to release under GPLv3 however, we cannot release your code with with our GPLv3 as it doesn't specify that your version of software is licensed under GPLv2 and later versions.
So if you can change your license to GPLv2 or later then it will be easier to use your code in GPLv3 released code.
You can see this compatibility matrix to see that GPLv2 or later is compatible with GPLv3 but not GPLv2
I would look forward to your response.
-
Missing default model.20120919 after building from source code
After I use
mvn package
to build the ark-tweet-nlp-0.3.2.jar, It will report an IOException when I run./runTagger.sh examples/example_tweets.txt
.Details:
Exception in thread "main" java.io.IOException: Neither file nor resource found for: /cmu/arktweetnlp/model.20120919 at cmu.arktweetnlp.util.BasicFileIO.openFileOrResource(BasicFileIO.java:250) at cmu.arktweetnlp.impl.Model.loadModelFromText(Model.java:409) at cmu.arktweetnlp.Tagger.loadModel(Tagger.java:40) at cmu.arktweetnlp.RunTagger.runTagger(RunTagger.java:85) at cmu.arktweetnlp.RunTagger.main(RunTagger.java:373)
-
some config files in scripts/train.sh
When i try to use the semi-crf tools to train a crf model.I try to use the file in scipts/train.sh. There is an option:--noahsFeaturesFile noah.feats
I've no idea what is the noah.feats looks like. So can you give me an example?
-
Tweetnlp crashed on this input, note that there are lines with no words at all...
A musician must make music, an artist must paint.., to be ultimately at peace with himself. What a man can be, he must be ~Maslow A small body of determined spirits fired by an unquenchable faith in their mission can alter the course of history.~Gandhi #quote Never for the sake of peace and quiet deny your convictions ~ Dag Hammarskjold #quote 5020
@kz713twt Amanpour 2
Oh my god ! It was an wedding anniversary today, but I stayed at home unfortunately.
-
Cannot execute runTagger.sh script from other directories
For example, from the parent directory of
ark-tweet-nlp
, I get:Noahs feature file:null File with initial transition probs:null Reading embeddings file... java.io.FileNotFoundException: lib/embeddings.txt (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:137) at java.io.FileInputStream.<init>(FileInputStream.java:96) at java.io.FileReader.<init>(FileReader.java:58) at edu.cmu.cs.lti.ark.ssl.util.BasicFileIO.openFileToRead(BasicFileIO.java:39) at edu.cmu.cs.lti.ark.ssl.pos.SemiSupervisedPOSTagger.readDistSim(SemiSupervisedPOSTagger.java:630) at edu.cmu.cs.lti.ark.ssl.pos.SemiSupervisedPOSTagger.setVariousOptions(SemiSupervisedPOSTagger.java:531) at edu.cmu.cs.lti.ark.ssl.pos.SemiSupervisedPOSTagger.<init>(SemiSupervisedPOSTagger.java:178) at edu.cmu.cs.lti.ark.tweetnlp.TweetTaggerInstance.<init>(TweetTaggerInstance.java:62) at edu.cmu.cs.lti.ark.tweetnlp.TweetTaggerInstance.getInstance(TweetTaggerInstance.java:24) at edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger.tweetTagging(RunPOSTagger.java:43) at edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger.doPOSTagging(RunPOSTagger.java:39) at edu.cmu.cs.lti.ark.tweetnlp.RunPOSTagger.main(RunPOSTagger.java:63) 12-Sep-2011 15:25:03 edu.cmu.cs.lti.ark.ssl.util.BasicFileIO openFileToRead SEVERE: Could not open file:lib/embeddings.txt
-
Bump jackson-databind from 2.0.0 to 2.12.7.1 in /ark-tweet-nlp
Bumps jackson-databind from 2.0.0 to 2.12.7.1.
Commits
- See full diff in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the Security Alerts page.
-
Trying to get in touch regarding a security issue
Hey there!
I'd like to report a security issue but cannot find contact instructions on your repository.
If not a hassle, might you kindly add a
SECURITY.md
file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.Thank you for your consideration, and I look forward to hearing from you!
(cc @huntr-helper)
-
Bump junit from 4.8.2 to 4.13.1 in /ark-tweet-nlp
Bumps junit from 4.8.2 to 4.13.1.
Release notes
Sourced from junit's releases.
JUnit 4.13.1
Please refer to the release notes for details.
JUnit 4.13
Please refer to the release notes for details.
JUnit 4.13 RC 2
Please refer to the release notes for details.
JUnit 4.13 RC 1
Please refer to the release notes for details.
JUnit 4.13 Beta 3
Please refer to the release notes for details.
JUnit 4.13 Beta 2
Please refer to the release notes for details.
JUnit 4.13 Beta 1
Please refer to the release notes for details.
JUnit 4.12
Please refer to the release notes for details.
JUnit 4.12 Beta 3
Please refer to the release notes for details.
JUnit 4.12 Beta 2
No release notes provided.
JUnit 4.12 Beta 1
No release notes provided.
JUnit 4.11
No release notes provided.
Changelog
Sourced from junit's changelog.
Summary of changes in version 4.13.1
Rules
Security fix:
TemporaryFolder
now limits access to temporary folders on Java 1.7 or laterA local information disclosure vulnerability in
TemporaryFolder
has been fixed. See the published security advisory for details.Test Runners
[Pull request #1669:](junit-team/junit#1669) Make
FrameworkField
constructor publicPrior to this change, custom runners could make
FrameworkMethod
instances, but notFrameworkField
instances. This small change allows for both now, becauseFrameworkField
's constructor has been promoted from package-private to public.Commits
1b683f4
[maven-release-plugin] prepare release r4.13.1ce6ce3a
Draft 4.13.1 release notesc29dd82
Change version to 4.13.1-SNAPSHOT1d17486
Add a link to assertThrows in exception testing543905d
Use separate line for annotation in Javadoc510e906
Add sub headlines to class Javadoc610155b
Merge pull request from GHSA-269g-pwp5-87ppb6cfd1e
Explicitly wrap float parameter for consistency (#1671)a5d205c
Fix GitHub link in FAQ (#1672)3a5c6b4
Deprecated since jdk9 replacing constructor instance of Double and Float (#1660)- Additional commits viewable in compare view
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot close
will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot ignore this major version
will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor version
will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the Security Alerts page.
-
[SECURITY] Use HTTPS to resolve dependencies in Maven Build
- Want to take over the Java ecosystem? All you need is a MITM!
- Update: Want to take over the Java ecosystem? All you need is a MITM!
This is a security fix for a vulnerability in your Apache Maven
pom.xml
file(s).The build files indicate that this project is resolving dependencies over HTTP instead of HTTPS. This leaves your build vulnerable to allowing a Man in the Middle (MITM) attackers to execute arbitrary code on your or your computer or CI/CD system.
This vulnerability has a CVSS v3.0 Base Score of 8.1/10.
POC code has existed since 2014 to maliciously compromise a JAR file in-flight. MITM attacks against HTTP are increasingly common, for example Comcast is known to have done it to their own users.
This contribution is a part of a submission to the GitHub Security Lab Bug Bounty program.
Detecting this and Future Vulnerabilities
This vulnerability was automatically detected by LGTM.com using this CodeQL Query.
As of September 2019 LGTM.com and Semmle are officially a part of GitHub.
You can automatically detect future vulnerabilities like this by enabling the free (for open-source) LGTM App.
I'm not an employee of GitHub nor of Semmle, I'm simply a user of LGTM.com and an open-source security researcher.
Source
Yes, this contribution was automatically generated, however, the code to generate this PR was lovingly hand crafted to bring this security fix to your repository.
The source code that generated and submitted this PR can be found here: JLLeitschuh/bulk-security-pr-generator
Opting-Out
If you'd like to opt-out of future automated security vulnerability fixes like this, please consider adding a file called
.github/GH-ROBOTS.txt
to your repository with the line:User-agent: JLLeitschuh/bulk-security-pr-generator Disallow: *
This bot will respect the ROBOTS.txt format for future contributions.
Alternatively, if this project is no longer actively maintained, consider archiving the repository.
CLA Requirements
This section is only relevant if your project requires contributors to sign a Contributor License Agreement (CLA) for external contributions.
It is unlikely that I'll be able to directly sign CLAs. However, all contributed commits are already automatically signed-off.
The meaning of a signoff depends on the project, but it typically certifies that committer has the rights to submit this work under the same license and agrees to a Developer Certificate of Origin (see https://developercertificate.org/ for more information).
If signing your organization's CLA is a strict-requirement for merging this contribution, please feel free to close this PR.
Tracking
All PR's generated as part of this fix are tracked here: https://github.com/JLLeitschuh/bulk-security-pr-generator/issues/2
-
Bump lucene-core from 3.0.3 to 7.1.0 in /ark-tweet-nlp
Bumps lucene-core from 3.0.3 to 7.1.0.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebase
will rebase this PR@dependabot recreate
will recreate this PR, overwriting any edits that have been made to it@dependabot merge
will merge this PR after your CI passes on it@dependabot squash and merge
will squash and merge this PR after your CI passes on it@dependabot cancel merge
will cancel a previously requested merge and block automerging@dependabot reopen
will reopen this PR if it is closed@dependabot ignore this [patch|minor|major] version
will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependency
will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)@dependabot use these labels
will set the current labels as the default for future PRs for this repo and language@dependabot use these reviewers
will set the current reviewers as the default for future PRs for this repo and language@dependabot use these assignees
will set the current assignees as the default for future PRs for this repo and language@dependabot use this milestone
will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the Security Alerts page.
-
GPL
Hi,
Has anyone managed removing / replacing the components which make the library GPL, and lived to tell about it?
In particular, for only the POS Tagging, but also, in general?
Thanks!
Twitter Text Libraries. This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform.
twitter-text This repository is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, tes
This application can recognize the sign language alphabets and help people who do not understand sign language to communicate with the speech and hearing impaired.
Sign Language Recognition App This application can recognize the sign language alphabets and help people who do not understand sign language to commun
Text to Speech Project for Spring Boot and Kotlin, Auth Server, Python with Fast API (gTTS)
TTS-App Text to Speech Project for Spring Boot Module (etc Resource, Auth Server, Python with Fast API (gTTS)) Python의 gTTS lib를 활용하여 텍스트를 음성으로 변환하는 서
A simple Discord bot, which shows the server status of the Lost Ark server Beatrice
Beatrice A simple Discord bot, which shows the server status of the Lost Ark server Beatrice. Example Usage Clone the repository. Edit the property fi
sofa-ark-spring-guides
sofa-ark-spring-guides 实验内容 通过 SOFAArk 提供的官方maven插件将一个 Spring Boot 应用启动成一个标准Ark包,即宿主机; 通过 Telnet指令 动态安装另一个Spring Boot应用到宿主机上,同时在JVM中运行; 任务 1、任务准备 从 gi
Apache Heron (Incubating) is a realtime, distributed, fault-tolerant stream processing engine from Twitter
Heron is a realtime analytics platform developed by Twitter. It has a wide array of architectural improvements over it's predecessor. Heron in Apache
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.
Elephant Bird About Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats,
A working fucking minecraft sex mod which includes actual intercourse (Not made by me, made by https://twitter.com/schnurri_tv?lang=en) (His acc is private now because of 13y TikTok tards showing up with their cringey cancer)(Consider following my github acc if you like python and java stuff or are cool)
Minecraft-Sex-Mod-Jenny-1.12.2-Forge A working fucking minecraft sex mod which includes actual intercourse (Not made by me, made by https://twitter.co
Twiscord es una simple aplicación que permite conectar Twitter y Discord para poder publicar cosas en ambas plataformas a la vez.
Twiscord Twiscord es una simple aplicación que permite conectar Twitter y Discord para poder publicar en ambas plataformas a la vez dedicado a streame
A Twitter-API library JAVA
Tweety A Twitter-API library for JAVA. Code for Authorization (Oauth 1) can be found here :Authorization This api conta
Twitter like web application
Sweater Twitter like web application Read Me First The following was discovered as part of building this project: The original package name 'io.github
Core part of Jackson that defines Streaming API as well as basic shared abstractions
Overview This project contains core low-level incremental ("streaming") parser and generator abstractions used by Jackson Data Processor. It also incl
The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.
The Ludii General Game System Ludii is a general game system being developed as part of the ERC-funded Digital Ludeme Project (DLP). This repository h
A repository that contains the backend part of the Human Resources Management System.
Human Resources Management System Backend A human resources management system is a form of human resources (HR) software that combines several systems
This repo is created to help people with the machine coding interview. There is no free website to provide complete guide for machine coding round so I have created this repo where I have shared all my machine coding practices and created a medium post as well to help with theory part.
machineCoding This repo is created to help people with the machine coding interview. There is no free website to provide complete guide for machine co
Core part of pipes framework plus some commonly used extensions
Pipes Pipes is a simple, lightweight data processing framework for Java. This repo comes with the core part plus three extensions (For Google Big Quer
This project was done as a part of Hug61B taught by Josh Hug.
byow This project was done as a final project for Hug61B taught by UC Berkeley's Josh Hug, The main aim of this project was to design a world generati
Share the chat messages across Minecraft Servers via HTTP backend powered by Spring Boot, this is the backend part of the project.
InterconnectedChat-Backend Share the chat messages across Minecraft Servers via HTTP backend powered by Spring Boot, this is the backend part of the p
This repository is for Todo application. This contains the Backend part of the application.
Todo Application 개요(Abstract) 개인용 할일 목록 리스트 앱플리케이션 구축 (Personal Todo List Application) 목적 1. React.js기초, AWS서버 활용, 스프링 부트 공부 목적으로 프로젝트 시작했습니다.
Linked List - a part of the Collection framework present in java.util package
Linked List is a part of the Collection framework present in java.util package. This class is an implementation of the LinkedList data structure which is a linear data structure where the elements are not stored in contiguous locations and every element is a separate object with a data part and address part