Extract text from a PDF (pdf to text). Api for PHP/JS/Python and others.

dotcode.moscow

Last update: May 13, 2022

Related tags

Overview

Extract text from a PDF (pdf to text). API in docker.

Why did we create this project?

In the Laravel project, it was necessary to extract texts from large files. Existing packages do not work with files larger than 50 megabytes.
Text extraction is an expensive operation. Running on a separate server will reduce the load.
It was necessary to create a cover for the source.

Installation

Install Docker and Docker Compose

git clone https://github.com/dotcode-moscow/pdf-api.git
cd pdf-api
docker-compose up -d pdf-api

Method /api/extractText

Extracts text from a file. As a parameter, we pass the URL to the file.

Method /api/pdf/ping

ping-pong method

Method /api/imageToPDF

Image to pdf converter

Basic example

curl -d "url=https://trove.nla.gov.au/newspaper/rendition/nla.news-page29291123.pdf" "http://localhost:8080/api/extractText"

POST(HTTP) example

http://localhost:8080/api/extractText?url=https://trove.nla.gov.au/newspaper/rendition/nla.news-page29291123.pdf

Response (JSON) example

"Page number" (without sorting) and "extracted text".
"img" - jpeg base64 front page cover

{
  "1":"National Library of Australia...",
  "img": "data:image/jpeg;base64..."
}

Production mode

network_mode: "host"

Credit

PDFBox

Contributing

Pull requests are welcome.

A Java API for checking if text contains profanity via the alt-profanity-checker Python library.

ProfanityCheckerAPI A Java API for checking if text contains profanity via the alt-profanity-checker Python library. It uses jep to run and interpret

Feb 19, 2022

💻 Machine Coding - leetcode LLD (coding blox) - It is an Online Coding Platform that allows a user to Sign Up, Create Contests and participate in Contests hosted by Others.

leetcode-lld-flipkart-coding-blox Machine coding - leetcode LLD (coding blox) My Approach : https://leetcode.com/discuss/interview-question/object-ori

Sep 15, 2022

A minecraft minigame where you have to defend your bed and destroy the others. Once your bed is destroyed, you cannot respawn.

As from November 1st 2021 BedWars1058 by Andrei Dascălu becomes open source under GNU GPL 3.0 license. If you are a developer I would really appreciat

Dec 26, 2022

Collection of homework assignments I did for myself and for others while as an undergrad @ UNLV.

Mona Lisa Collection of homework assignments I did for myself and for others while as an undergrad @ UNLV. If you have questions or concerns please fe

May 10, 2022

A Java library for serializing objects as PHP serialization format.

Java PHP Serializer Latest release: A Java library for serializing objects as PHP serialization format. The library fully implements the PHP serializa

Jun 13, 2022

SpringBoot service to pick up CAN messages retransmitted by CANBridge and extract certain values for reporting/monitoring/alerting via DataDog

Mar 12, 2022

archifacts is a library to extract your architectural concepts out of your application's code

archifacts is a free (Apache 2.0 license) library for describing and detecting architectural building blocks and their relationships in your Java appl

Nov 29, 2022

A Flutter plugin to extract waveform data from an audio file suitable for visual rendering.

just_waveform This plugin extracts waveform data from an audio file that can be used to render waveform visualisations. Usage final progressStream = J

Dec 4, 2022

SBSRE is an eclipse plugin for extract method refactoring based on the single responsibility principle(SRP)

SBSRE is a slice-based single responsibility extraction approach supported by an eclipse plugin for identifying Single responsibility violations in the methods.

Jul 8, 2022

Run Fabric Mods on Forge! It's an mod loading api, too (not implemented yet). No any releationship between Python library PILlow.

Pillow Mod Loader 中文 | English Quilt that runs on Forge Not implemented yet. Yes, you can believe it. This mod will make Quilt compatible with Forge.

Dec 20, 2022

This repository contains codes for various data structures and algorithms in C, C++, Java, Python, C#, Go, JavaScript and Kotlin.

Overview The goal of this project is to have codes for various data structures and algorithms - in C, C++, Java, Python, C#, Go, JavaScript and Kotlin

Mar 2, 2022

OpenPDF is a free Java library for creating and editing PDF files with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository. ⛺

OpenPDF is an open source Java library for PDF files OpenPDF is a Java library for creating and editing PDF files with a LGPL and MPL open source lice

Jan 4, 2023

A working fucking minecraft sex mod which includes actual intercourse (Not made by me, made by https://twitter.com/schnurri_tv?lang=en) (His acc is private now because of 13y TikTok tards showing up with their cringey cancer)(Consider following my github acc if you like python and java stuff or are cool)

Minecraft-Sex-Mod-Jenny-1.12.2-Forge A working fucking minecraft sex mod which includes actual intercourse (Not made by me, made by https://twitter.co

Nov 28, 2022

ST (StringTemplate) is a java template engine (with ports for C#, Python, and Objective-C coming) for generating source code

ST (StringTemplate) is a java template engine (with ports for C#, Python, and Objective-C coming) for generating source code, web pages, emails, or an

Jan 5, 2023