Apache POI - A Java library for reading and writing Microsoft Office binary and OOXML file formats.

Overview

Apache POI

A Java library for reading and writing Microsoft Office binary and OOXML file formats.

The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate.

OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT as well as MFC serialization API based file formats. The project provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document Properties (HPSF).

Office OpenXML Format is the new standards based XML file format found in Microsoft Office 2007 and 2008. This includes XLSX, DOCX and PPTX. The project provides a low level API to support the Open Packaging Conventions using openxml4j.

For each MS Office application there exists a component module that attempts to provide a common high level Java api to both OLE2 and OOXML document formats. This is most developed for Excel workbooks (SS=HSSF+XSSF). Work is progressing for Word documents (WP=HWPF+XWPF) and PowerPoint presentations (SL=HSLF+XSLF).

The project has some support for Outlook (HSMF). Microsoft opened the specifications to this format in October 2007. We would welcome contributions.

There are also projects for Visio (HDGF and XDGF), TNEF (HMEF), and Publisher (HPBF).

This library includes the following components, roughly in descending order of maturity:

  • Excel spreadsheets (Common SS = HSSF, XSSF, and SXSSF)
  • PowerPoint slideshows (Common SL = HSLF and XSLF)
  • Word processing documents (Common WP = HWPF and XWPF)
  • Outlook email (HSMF and HMEF)
  • Visio diagrams (HDGF and XDGF)
  • Publisher (HPBF)

And lower-level, supporting components:

  • OLE2 Filesystem (POIFS)
  • OLE2 Document Properties (HPSF)
  • TNEF (HMEF) for Outlook winmail.dat files
  • OpenXML4J (OOXML)
Components named H??F are for reading or writing OLE2 binary formats.
Components named X??F are for reading or writing OpenOffice XML (OOXML) formats.

Getting started

Website: https://poi.apache.org/

Mailing lists:

Bug tracker:

Source code:

Requires Java 1.8 or later.

Contributing

  • Download and install svn or git, Java JDK 1.8+, and Apache Ant 1.8+ or Gradle
  • Check out the code from svn or git
  • Import the project into Eclipse or your favorite IDE
  • Write a unit test:
    • Binary formats and Common APIs: poi/src/test/java/org/apache/poi/
    • OOXML APIs only: poi-ooxml/src/test/java/org/apache/poi/
    • Scratchpad (Binary formats): poi-scratchpad/src/test/java/org/apache/poi/
    • Test files: test-data/
  • Navigate the source, make changes, and run unit tests to verify
    • Binary formats and Common APIs: poi/src/main/java/org/apache/poi/
    • OOXML APIs only: poi-ooxml/src/main/java/org/apache/poi/
    • Scratchpad (Binary formats): poi-scratchpad/src/main/java/org/apache/poi/
    • Examples: poi-examples/src/main/java/org/apache/poi/
  • More info: How To Build page at apache.org

Building jar files

To build the jar files for poi, poi-ooxml, poi-ooxml-lite, poi-ooxml-full and poi-examples:

./gradlew jar

gradlew jar
Comments
  • prototype for low memory footprint shared string table

    prototype for low memory footprint shared string table

    https://bz.apache.org/bugzilla/show_bug.cgi?id=61832

    Early days.

    • Need to make the new temp file shared string table an optional behavioir
    • Need to write the shared string data to a temp file - I'm proposing to use http://www.h2database.com/html/mvstore.html (making h2 a dependency for poi-ooxml, or at least an optional dependency)
      • already the code uses less memory because it does not have the XMLBeans SST wrapper doc stored in memory
    • Need to support optional encryption of the temp files (H2 MVStore supports this)
    • much more test coverage
    opened by pjfanning 22
  • Formula adjusting in context of column shifting.

    Formula adjusting in context of column shifting.

    FormulaShifter class is upgraded in a way that now it can also adjust cell references in formulas when columns are shifted. It works simple :

    • transpose cell reference;
    • adjust it using existing algorithm for adjusting when rows are shifted;
    • transpose result back.

    XSSFSheet is upgraded. Now there is shiftColumns() methods, and it works similar to shiftRows(). Except formula adjusting, other issues like update of hyperlinks and named ranges are also covered for column shifting. Most of functionality of XSSFRowShifter class is moved to new class XSSFShiftingManager, and is used from both shiftRows() and shiftColumns() methods.

    opened by zmau 18
  • Fix some LGTM alerts

    Fix some LGTM alerts

    Some time ago I started to fix the LGTM alerts, grouping them by their type.

    https://lgtm.com/projects/g/apache/poi/alerts/?mode=tree

    Many groups are still to be handled but these few fixes might be worth to be pushed to trunk already.

    opened by Alain-Bearez 17
  • Super-streaming SXSSF

    Super-streaming SXSSF

    https://bz.apache.org/bugzilla/show_bug.cgi?id=63100

    SXSSF works as designed and manages a small memory footprint when generating large files from a database. But it only writes data to an output stream once everything has been written to SXSSF. This is problematic when used in web applications:

    In our use case (our website allows users to generate Excel from the database), generating the SXSSF on the server takes about 5 minutes. Most clients give up within a minute (or the browser does it automatically), or the proxy times out due to no data being sent. Some users also retry the download request. A new request for download is initiated (while the server is busy generating the SXSSF for a client that already gave up). This can potentially lead to DOS.

    To work around this issue, I've implemented a super-streaming version of SXSSF, a SuperSXSSF, that relies on rowWriter callback to generate row data.

    With this approach our service is able to stream the generated Excel directly to the client and, best of all, is terminated in case the user cancels the download request.

    The SuperSXSSF prevents both download timeouts and potential DOS, while allowing developers all other XSSF actions (i.e. define styles) that don't take much processing time.

    opened by mobreza 14
  • Add support for new java.time API, deprecate HSSFDateUtil, move test cases to org.apache.poi.ss.usermodel

    Add support for new java.time API, deprecate HSSFDateUtil, move test cases to org.apache.poi.ss.usermodel

    While preparing for adding support for the new Java date/time API I noticed that DateUtil is extended in HSSFDateUtil which itself doesn't add any functionality. Sole purpose seems to be making the protected method absoluteDay() accessible for unit testing in package org.apache.poi.hssf.usermodel. This should not be necessary as the test code should live in the same package as where the methods are defined and also makes maintenance harder.

    I left the TestHSSFDateUtil with onARealFile(), the only method that really belongs there because it has a dependency on HSSF.

    Instead of completely removing HSSFDateUtil, I marked it as deprecated and changed static access to DateUtil throughout the codebase. Since HSSFDateUtil is public, it should not be removed before the next major version.

    Please review and comment. If possible, I'd like to have this applied to trunk soon so that I can focus on the real task, aka adding the new date/time API.

    opened by xzel23 12
  • [bug-57342] Excel compatible Zip64 implementation

    [bug-57342] Excel compatible Zip64 implementation

    https://bz.apache.org/bugzilla/show_bug.cgi?id=57342

    I did an in depth analysis of this issue. Turns out the problem is not with the OOXML data generated by POI. The problem has to do with the ZIP format. Specifically with ZIP64 extension. That's why it's all OK up until sheet1.xml reaches over 4GB (uncompressed). I have all the details written up in a blog post: https://rzymek.github.io/post/excel-zip64/ Short story: Excel will want to repair the file if uncompressed size of a zip entry exceeds 4GB and ZIP's Local File Header (LFH) does not specify zip spec version 4.5

    This pull request uses custom (Excel compatible) Zip64 implementation when Zip64Mode is set to Always.

    opened by rzymek 11
  • fix bug: when transfer .ppt(which include the object such as excel...…

    fix bug: when transfer .ppt(which include the object such as excel...…

    When i try to transfer .ppt to .pdf,that can work well.But When i try to transfer .ppt(Note: this file include the object such as excel...) to .pdf,that can cause infinite loop.

    I found one way to solve this by calling graphics.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON); in HemfImageRenderer.java#drawImage()

    opened by wanglunhui2012 10
  • Bugzilla 59623: code inspired from Axel Richter on StackOverflow

    Bugzilla 59623: code inspired from Axel Richter on StackOverflow

    opened by Alain-Bearez 10
  • Xmlbeans 2.6.5

    Xmlbeans 2.6.5

    I'm suggesting that we use xmlbeans fork in POI 4.0.0 while we consider the options around replacing xmlbeans or coming up with a better way of maintaining xmlbeans.

    opened by pjfanning 10
  • Seven more XDDF charts and bug fixes

    Seven more XDDF charts and bug fixes

    The code in this pull request builds on code from Sandeep Tiwari in https://github.com/apache/poi/pull/144

    Some code formatting and code patterns have been improved, as well as some light improvements in overall functionalities.

    opened by Alain-Bearez 9
  • hsmf: support writing properties, fix reading embedded message properties, support getting actual type of property value

    hsmf: support writing properties, fix reading embedded message properties, support getting actual type of property value

    Hello!

    I have found a problem reading properties chunks of embedded/attached MSGs inside a MSG file. There are 8 bytes less reserved bytes in the header sequence.

    Further I have added support for writing message properties chunks and getting the actual property type (not only the usual type) and the raw bytes value from a property value.

    This ca be e.g. used when extracting embedded/attached MSGs from a MSG file.

    • MessagePropertiesChunk: Fix reading properties from embedded/attached MSG
    • Add support for writing MessagePropertiesChunk
    • PropertyValue: Add support for getting the actual property type and getting the raw bytes

    Regards, Dominik

    opened by dhoelzl 9
  • Implement specificWorkbookSheet method

    Implement specificWorkbookSheet method

    The specificWorkbookSheet() takes a workbook and sheetNumber as an argument, and creates an html of a specific sheet using the predefined method processSheet().

    Initially it was not possible as processSheet() is protected and cannot be used.

    opened by HasaanA16 0
  • poi-ooxml: add new API to add text transparency on XLSX

    poi-ooxml: add new API to add text transparency on XLSX

    This API uses XSSFColor built with alpha channel to set up the right value. We must take care of the fact Excel uses 1000th of percent and positive numbers

    opened by artragis 0
Owner
The Apache Software Foundation
The Apache Software Foundation
Simple library to create GraphML documents to be consumed by yEd

Simple library to create GraphML documents to be consumed by yEd

/bin/dd 1 Jan 23, 2022
Classpy is a GUI tool for investigating Java class file, Lua binary chunk, Wasm binary code, and other binary file formats.

Classpy Classpy is a GUI tool for investigating Java class file, Lua binary chunk, Wasm binary code, and other binary file formats. Inspiration This t

null 1k Dec 17, 2022
The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types.

Apache Commons CSV The Apache Commons CSV library provides a simple interface for reading and writing CSV files of various types. Documentation More i

The Apache Software Foundation 307 Dec 26, 2022
A simple program that is realized by entering data, storing it in memory (in a file) and reading from a file to printing that data.

Pet project A simple program that is realized by entering data, storing it in memory (in a file) and reading from a file to printing that data. It can

Ljubinko Stojanović 3 Apr 28, 2022
A Java library that facilitates reading, writing and processing of sensor events and raw GNSS measurements encoded according to the Google's GNSS Logger application format.

google-gnss-logger This library facilitates reading, writing and processing of sensor events and raw GNSS measurements encoded according to the Google

Giulio Scattolin 5 Dec 21, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Dec 28, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR v4 Build status ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating

Antlr Project 13.6k Jan 3, 2023
Split into data blocks,In this format, efficient reading can be realized,Avoid unnecessary data reading operations.

dataTear 切换至:中文文档 knowledge base dataTear Split into data fragments for data management. In this format, efficient reading can be achieved to avoid un

LingYuZhao 24 Dec 15, 2022
Benchmark testing number reading/writing in Java.

double-reader-writer Benchmark testing number reading/writing in Java. Relates to FasterXML/jackson-core#577 So far, FastDoubleParser looks useful if

PJ Fanning 2 Apr 12, 2022
uniVocity-parsers is a suite of extremely fast and reliable parsers for Java. It provides a consistent interface for handling different file formats, and a solid framework for the development of new parsers.

Welcome to univocity-parsers univocity-parsers is a collection of extremely fast and reliable parsers for Java. It provides a consistent interface for

univocity 874 Dec 15, 2022
Ethylene is a open-source, lightweight, general-purpose compatibility layer standing between the developer and the chaotic world of configuration file formats.

Ethylene Ethylene is a open-source, lightweight, general-purpose compatibility layer standing between the developer and the chaotic world of configura

Steank 7 Aug 9, 2022
A sidecar to run alongside Trino to gather metrics using the JMX connector and expose them in different formats using Apache velocity

Overview A sidecar to run alongside Trino to gather metrics using the JMX connector and expose them in different formats using Apache Velocity. Click

BlueCat Engineering 4 Nov 18, 2021
LaetLang is an interpreted C style language. It has file reading/writting, TCP network calls and awaitable promises.

LaetLang ?? LaetLang is an interpreted C style language built by following along Robert Nystrom's book Crafting Interpreters. This is a toy language t

Alexander Shevchenko 6 Mar 14, 2022
A simple configuration library for Java applications providing a node structure, a variety of formats, and tools for transformation

Configurate Configurate is a simple configuration library for Java applications that provides a node-based representation of data, able to handle a wi

SpongePowered 274 Jan 3, 2023
This repository consists of Leetcode questions and answers specifically asked in Microsoft

Microsoft LeetCode Challenges This repository consists of Leetcode questions and answers specifically asked in Microsoft sorted based on frequency. Fe

GOPINATH M B 1 Oct 23, 2021
Authentication server for Minecraft Microsoft accounts

Minecraft_MSAuth An authentication server for Microsoft accounts on Minecraft. How to use: Create an Azure app of the type "web", set your redirect UR

charlie 46 Dec 7, 2022
Patches for the old minecraft official launcher to add microsoft account support

MSA4Legacy Patches for the old minecraft official launcher to add microsoft account support My code here is quite atrocious in some parts (for example

Nep Nep 26 Nov 3, 2022
JFXNodeMapper - a simple library that focuses on mapping data from common data represntation formats to JavaFx Nodes

JFXNodeMapper - a simple library that focuses on mapping data from common data represntation formats to JavaFx Nodes. Our main focus is to build a library that,

Aby Kuruvilla 7 Oct 15, 2021
A GUI-based file manager based on a Java file management and I/O framework using object-oriented programming ideas.

FileManager A GUI-based file manager based on a Java file management and I/O framework using object-oriented programming ideas. Enables folder creatio

Zongyu Wu 4 Feb 7, 2022