GPT2 Tokenizer Java

Java implementation of GPT2 tokenizer

Requirements

Please install the following dependencies to use the library.

implementation 'com.google.api-client:google-api-client:1.32.2'
implementation 'org.apache.commons:commons-lang3:3.12.0'
implementation 'org.springframework.boot:spring-boot-starter-web'

testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1'

Add tokenizer files to resources directory

Please add encoder.json and vocab.bpe files to your project resources directory. these files can be found here.

Usage

The following are simple examples of this library. To check test code for this, refer to here.

Encoding text to tokens

import ai.tunib.tokenizer.GPT2Tokenizer;
import java.util.List;

GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
List<Integer> result = tokenizer.encode("Hello my name is Kevin.");

[15496, 616, 1438, 318, 7939, 13]

Decoding tokens to text

import ai.tunib.tokenizer.GPT2Tokenizer;

GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
String result = tokenizer.decode(List.of(15496, 616, 1438, 318, 7939, 13));

"Hello my name is Kevin."

License

This project is licensed under the terms of the Apache License 2.0.

A implementation of shadowsocks that base on java's netty framework

Shadowsocks shadowsocks is a fast tunnel proxy that helps you bypass firewalls. shadowsocks-java is a implementation of shadowsocks protocol that base

Oct 17, 2022

Budget Proof Key for Code Exchange (PKCE) implementation using Java Spring-boot

Low Budget Proof Key for Code Exchange (PKCE) Implementation using Java Spring-boot Just for fun, low budget implementation of PKCE Auth Flow using a

Dec 11, 2022

The Java implementation of "A Survey of Trajectory Distance Measures and Performance Evaluation". VLDBJ 2020

A Survey of Trajectory Distance Measures and Performance Evaluation The Java implementation of the following paper: Han Su, Shuncheng Liu, Bolong Zhen

Oct 19, 2022

JSON Web Token implementation for Java according to RFC 7519. Easily create, parse and validate JSON Web Tokens using a fluent API.

JWT-Java JSON Web Token library for Java according to RFC 7519. Table of Contents What are JSON Web Tokens? Header Payload Signature Features Supporte

Jul 10, 2022