Java implementation of GPT2 tokenizer.

Overview

GPT2 Tokenizer Java

Java implementation of GPT2 tokenizer

Requirements

Please install the following dependencies to use the library.

implementation 'com.google.api-client:google-api-client:1.32.2'
implementation 'org.apache.commons:commons-lang3:3.12.0'
implementation 'org.springframework.boot:spring-boot-starter-web'

testImplementation 'org.junit.jupiter:junit-jupiter-api:5.3.1'
testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.3.1'

Add tokenizer files to resources directory

Please add encoder.json and vocab.bpe files to your project resources directory. these files can be found here.

Usage

The following are simple examples of this library. To check test code for this, refer to here.

Encoding text to tokens

import ai.tunib.tokenizer.GPT2Tokenizer;
import java.util.List;

GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
List<Integer> result = tokenizer.encode("Hello my name is Kevin.");
[15496, 616, 1438, 318, 7939, 13]

Decoding tokens to text

import ai.tunib.tokenizer.GPT2Tokenizer;

GPT2Tokenizer tokenizer = GPT2Tokenizer.fromPretrained("PATH/IN/RESOURCES");
String result = tokenizer.decode(List.of(15496, 616, 1438, 318, 7939, 13));
"Hello my name is Kevin."

License

This project is licensed under the terms of the Apache License 2.0.

Copyright 2022 Hyunwoong Ko. All Rights Reserved.

You might also like...

A implementation of shadowsocks that base on java's netty framework

Shadowsocks shadowsocks is a fast tunnel proxy that helps you bypass firewalls. shadowsocks-java is a implementation of shadowsocks protocol that base

Oct 17, 2022

Budget Proof Key for Code Exchange (PKCE) implementation using Java Spring-boot

Low Budget Proof Key for Code Exchange (PKCE) Implementation using Java Spring-boot Just for fun, low budget implementation of PKCE Auth Flow using a

Dec 11, 2022

The Java implementation of "A Survey of Trajectory Distance Measures and Performance Evaluation". VLDBJ 2020

The Java implementation of

A Survey of Trajectory Distance Measures and Performance Evaluation The Java implementation of the following paper: Han Su, Shuncheng Liu, Bolong Zhen

Oct 19, 2022

JSON Web Token implementation for Java according to RFC 7519. Easily create, parse and validate JSON Web Tokens using a fluent API.

JWT-Java JSON Web Token library for Java according to RFC 7519. Table of Contents What are JSON Web Tokens? Header Payload Signature Features Supporte

Jul 10, 2022

A simple implementation of the Dubbo protocol.

Codec-dubbo Codec-dubbo is a binary codec framework for dubbo protocol Features Fully compatible with Dubbo protocol Completely rewritten based on Net

Nov 21, 2022

Realtime Data Processing and Search Engine Implementation.

Realtime Data Processing and Search Engine Implementation.

Mutad The name Mutad is a reverse spelling of datum. Overview An implementation of a real-time data platform/search engine based on various technology

Aug 4, 2022

A Graphics2D implementation targeting Skija as a backend.

A Graphics2D implementation targeting Skija as a backend.

SkijaGraphics2D Version 1.0.2, 4 August 2021 Overview SkijaGraphics2D is an implementation of Java2D's Graphics2D API that targets Skia via the Skija

Dec 29, 2022

A minimal WHIP implementation for the Raspberry Pi. It sends Mic and Camera to a WHIP endpoint

whipi A minimal WHIP implementation for the Raspberry Pi. It sends Camera Mic to a WHIP endpoint. Requires a Raspberry Pi with a PiCam and Java 11. It

Oct 27, 2022

A visual implementation of OSHI, to view information about the system and hardware.

A visual implementation of OSHI, to view information about the system and hardware.

MooInfo A visual implementation of OSHI, to view information about the system and hardware. Such as OS, processes, memory, CPU, disks, devices, sensor

Jan 6, 2023
Owner
Kevin Ko
Large-scale modeling & MLOps
Kevin Ko
Pure Java implementation of ONCRPC/SUNRPC

ONCRPC4J This is a part of dCache.ORG's NFSv4.1 work. Technically, this is not a fork of Remote Tea RPC library, but formally it is as we was inspired

dCache Project 26 Oct 27, 2022
Pure Java NFSv3 and NFSv4.1 implementation

NFS4J The pure java implementation of NFS server version 3, 4.0 and 4.1 including pNFS extension with nfs4.1-files and flex-files layout types. Buildi

dCache Project 189 Dec 13, 2022
Hashids algorithm v1.0.0 implementation in Java

Hashids.java A small Java class to generate YouTube-like hashes from one or many numbers. Ported from javascript hashids.js by Ivan Akimov What is it?

YoMo 944 Dec 29, 2022
Implementation of mustache.js for Java

Mustache.java Mustache.java is not designed to allow untrusted parties to provide templates. It may be possible to lock it down to provide that safely

Sam Pullara 1.8k Jan 1, 2023
This repository contains CQRS implementation in Java

CQRS Design Pattern Java This repository contains CQRS implementation in Java. I've written this code-base step by step on Medium that is my Turkish c

Yusuf Yılmaz 14 Oct 25, 2022
SimpleIcons4J is a Java implementation of the simple-icons JavaScript library

SimpleIcons4J SimpleIcons4J is a Java implementation of the simple-icons JavaScript library and is inspired by simpleicons.org. This library currently

Hyesung Lee 3 Apr 9, 2022
This program is a simple machine learning implementation in Java for detecting skin pixels.

Skin Detector ?? ?? Detects human skin from images This program is a simple machine learning implementation in Java for detecting skin pixels. How to

Tasmia Zerin 1 Jan 21, 2022
Java implementation of Beacon Chain for Ethereum 2.0, and its Backend API and full Infrastructure.

hailong Implementation of the Ethereum 2.0 Beacon Chain. Based on the (evolving) specification. Build Instructions Install Prerequisites 1) Java 11 Ub

我是高天才! 14 Feb 6, 2022
Search API with spelling correction using ngram-index algorithm: implementation using Java Spring-boot and MySQL ngram full text search index

Search API to handle Spelling-Corrections Based on N-gram index algorithm: using MySQL Ngram Full-Text Parser Sample Screen-Recording Screen.Recording

Hardik Singh Behl 5 Dec 4, 2021
This project illustrates TDD & Clean Architecture implementation in Java

Banking Kata - Java Overview This project illustrates TDD & Clean Architecture implementation in Java, showing the Use Case Driven Development Approac

Valentina Cupać 191 Dec 28, 2022