OpenRefine is a free, open source power tool for working with messy data and improving it

Overview

OpenRefine

DOI Join the chat at https://gitter.im/OpenRefine/OpenRefine Java CI Coverage Status Translation progress Total alerts

OpenRefine is a Java-based power tool that allows you to load data, understand it, clean it up, reconcile it, and augment it with data coming from the web. All from a web browser and the comfort and privacy of your own computer.

Download

Run from source

If you have cloned this repository to your computer, you can run OpenRefine with:

  • ./refine on Mac OS and Linux
  • refine.bat on Windows

This requires JDK 8 and Apache Maven.

Documentation and Videos

Contributing to the project

Contact us

Licensing and legal issues

OpenRefine is open source software and is licensed under the BSD license located in the LICENSE.txt. See the folder licenses for information on open source libraries that OpenRefine depends on.

Credits

This software was created by Metaweb Technologies, Inc. and originally written and conceived by David Huynh [email protected]. Metaweb Technologies, Inc. was acquired by Google, Inc. in July 2010 and the product was renamed Google Refine. In October 2012, it was renamed OpenRefine as it transitioned to a community-supported product.

See AUTHORS.md for the list of OpenRefine contributors and CONTRIBUTING.md for instructions on how to contribute yourself.

Comments
  • Enhancement : Add fields for projects metadata

    Enhancement : Add fields for projects metadata

    I have close to 300 projects in my workspace directory. Some have several years. The problem is I don't know where the data comes from, when exactly I processed them (two years ago, OK, but what month / day?), or sometimes simply what they correspond to. It should be recorded somewhere, for example, in metadata fields beside the project name.

    screenshot-127 0 0 1-3333-2017-08-01-13-56-24

    I do not know if it's difficult to implement. If someone else is interested, maybe we could associate to launch a decent bounty? I already put 150 dollars on BountySource.

    enhancement priority: High metadata 
    opened by ettorerizza 81
  • Wikidata extension

    Wikidata extension

    This adds an overlay model to OpenRefine which let users translate their projects to Wikidata edits. It is inspired by the RDF and Freebase extensions. It works like this:

    • Users can create a schema, pretty much in the same way they would do manual edits on Wikidata, except that they can drag and drop OpenRefine columns in place of the values of their statements, qualifiers, references and terms. These columns will act as variables: for each row, they will be replaced by their value, generating statements and terms.
    • They get feedback about the issues their edits may have (before the edits are performed). This largely relies on Wikidata's constraint system, but not all constraint violations will be reported. Some of the issues that are reported do not correspond to any constraint either (for instance, issues with the labels, descriptions and aliases).
    • They can export their project to the QuickStatements format (https://www.wikidata.org/wiki/Help:QuickStatements) and perform their edits with that tool.
    • They can also perform the edits directly from OpenRefine. This counts as an operation in OpenRefine, so using the JSON history of operations it is effectively possible to write simple bots as OpenRefine scripts, with no knowledge of programming.

    In both export modes, new items are supported (using the existing "reconcile to new" option). New items are not created at reconcilation time, only once the rest of the edits are performed.

    Important note regarding testing this extension

    As this extension introduces new operations, switching back and forth between versions of OpenRefine which contain the extension and those which do not can cause loss of the content of these operations (which makes it impossible to extract JSON from these operations, for instance). Make sure you backup your workspace before testing the extension on any important project.

    Reviewing this PR on GitHub will probably be quite tedious because of the large number of added files - I guess the simplest way is just to check out the branch directly. I am looking for seasoned Wikidata editors to give it a spin. Documentation also needs to get written somewhere.

    Test coverage can be found here: http://pintoch.ulminfo.fr/wikidata-extension-test-coverage/ (and generated with ./refine test as always).

    This closes #1213.

    enhancement reconciliation export 
    opened by wetneb 69
  • Nothing is on the Web Page

    Nothing is on the Web Page

    Describe the bug A clear and concise description of what the bug is. When OpenRefine is running, my browser at http://127.0.0.1:3333/. However, nothing is on the web page except an icon

    Screenshots If applicable, add screenshots to help explain your problem. image image

    Desktop (please complete the following information):

    • OS: Windows
    • Browser: Chrome

    OpenRefine (please complete the following information):

    • Version: OpenRefine 3.0
    bug 
    opened by wentianq 67
  • OpenRefine on Spark

    OpenRefine on Spark

    Hello,

    I've seen a repo to have OpenRefine running on Spark. Is-it something you could somehow integrate ?

    https://github.com/andreybratus/RefineOnSpark

    Thanks.

    Regards, Yann

    enhancement large project support 
    opened by YannBrrd 66
  • data package metadata

    data package metadata

    • The import, export for data package is done, the json editor works well and there is validation from the frond end.
    • Data schema alignment base on the "table schema". for validation, there is ValidateOperationTests.java. you can run it with JUnit, not testng.
    • Also there is a command for validation to return a json response with all the problem.
    • csvw can be added easily with current infrastructure. Next step:
    • Will create the "data validation" UI
    • Add CVSW metadata
    metadata 
    opened by jackyq2015 50
  • Inconsistency with handling null and empty string - (blank)

    Inconsistency with handling null and empty string - (blank)

    Google Refine has severe problems with handling nulls (https://github.com/OpenRefine/OpenRefine/issues/332, https://github.com/OpenRefine/OpenRefine/issues/252)

    My problem was, that I cannot concatenate column values when one cell was null. The solution is to "blank" cells not with null but with empty string. But there is no such option!

    1)When you use option Edit cells -> Common transformations -> Blank out cells - Google Refine sets all cells to null. 2)When you do Edit cells -> Transform and you put "" - Google Refine sets all cells to null either! 3)Edit cells -> Transform and " ".trim() doesn't do anything 4)Luckily Edit cells -> Transform and " " and then Edit cells -> Common transformations -> Trim leading and trailing whitespace does the job right.

    Conclusion 1)Bullets no. 2 and 3 above should perform exactly the same as no. 4 2)There should be two options: "Blank out cells" and "Null out cells" 3)Google refine should allow to concatenate nulls treating them as empty strings 4)"Text facet" should not treat nulls exactly the same as empty stings - currently both are displayed as (blank) and you cannot distinguish one from another

    Kind regards

    bug enhancement logic good first issue grel 
    opened by eximius313 47
  • Non printable characters

    Non printable characters

    Signed-off-by: Agha Saad Fraz [email protected] I have added a feature of displaying non-printable characters. It is a fix for #1286.

    Steps:

    • Import the messy data
    • Create the project
    • Toggling the non-printable character checkbox would show/hide the non-printable characters

    Demo:

    Data grid without non-printable characters:

    image

    Data grid with non-printable characters:

    image

    opened by AghaSaad04 44
  • Migrate to a new documentation platform

    Migrate to a new documentation platform

    We want to get ourselves better docs than those we have on the GitHub wiki at the moment.

    We need to decide which documentation system to use, where to host it, and so on. Let's use this issue to map the possibilities we are aware of. I will start:

    • Sphinx: documentation system based on reStructuredText or Markdown, which can be hosted for free on ReadTheDocs. Example docs: https://editgroups.readthedocs.io/en/latest/
    • Gitbook: based on Markdown, it looks like it can be hosted on gitbook.com. Example docs: https://devdocs.foodsharing.network/
    • I really like Django's docs - not sure if the framework is reusable though: https://docs.djangoproject.com/en/3.0/topics/

    I think our docs should be:

    • [ ] versioned: one sub-site per version, ideally without duplicating content too much to keep things maintainable?
    • [ ] translated: it should be easy for people who currently contribute on Weblate to also translate the docs.

    Any other requirements we should have? Which existing docs should we take inspiration from?

    enhancement documentation 
    opened by wetneb 43
  • Why not a column variable ?

    Why not a column variable ?

    I see several questions on StackOverflow that involve being able to iterate each cell of a column, for example here or here. Since Open Refine has not a "column" variable, I have to advise each time to create a unique false record for the entire dataset, and then use row.record.etc.

    Would it be a good idea to create a column variable that allow to treat each cell of a column as an element of an array?

    enhancement design discussions 
    opened by ettorerizza 37
  • Open XML file from URL generates lots of empty lines

    Open XML file from URL generates lots of empty lines

    In the 2.6RC2 installed on Linux from https://github.com/OpenRefine/OpenRefine/releases/download/2.6-rc.2 I get blank lines displayed when importing XML from a URL.

    eg importing XML from http://api.worldbank.org/countries/all/indicators/SP.POP.TOTL?date=2000:2001

    image

    Empty lines:

    image

    If the parsing of this can't be fixed, an ignore empty rows setting would be useful?

    bug import XML 
    opened by psychemedia 37
  • OAuth support for Wikidata extension

    OAuth support for Wikidata extension

    See #1612

    Update:

    As discussed below , three-legged OAuth support for the Wikidata extension is not useful now or in the near future, since it requires the multi-user support of OpenRefine, which still has a long way to go.

    This PR doesn't achieve all the goals of #1612 , it just adds two-legged OAuth support (enable using owner-only consumer to login).

    opened by afkbrb 36
  • header: move

    header: move "OpenRefine" header text from image

    So that the background and foreground text can be styled.

    side effects:

    • minor style changes (acceptable in my opinion)
    • accessibility improvements
    • dropped some absolute positioning
    • fixed a bug causing the slogan to shift from the left during page load

    Part of #3017

    opened by Abbe98 2
  • The database extension overrides CSS styles from core

    The database extension overrides CSS styles from core

    The database extensions loads Yahoo's prue.css library which styles elements regardless of classes and as a result one must always ensure that core got styles which take priority to avoid side effects.

    To Reproduce

    Inspect any element on either the index or project page and check for styles coming from prue.css.

    bug UI maintainability extension 
    opened by Abbe98 2
  • Show installed extensions in the GUI

    Show installed extensions in the GUI

    When using a hosted or managed OpenRefine instance it isn't necessarily obvious which extensions are installed, and thus it can be tricky to know the available features or where to find relevant documentation.

    Proposed solution

    It would be nice if the about page listed all installed extensions.

    Alternatives considered

    Maybe there could be cases where a host doesn't want to show the installed extensions?

    #3223 would cover this but it's a much longer way there.

    enhancement UI extension 
    opened by Abbe98 0
  • Bump cypress from 11.2.0 to 12.3.0 in /main/tests/cypress

    Bump cypress from 11.2.0 to 12.3.0 in /main/tests/cypress

    Bumps cypress from 11.2.0 to 12.3.0.

    Release notes

    Sourced from cypress's releases.

    v12.3.0

    Changelog: https://docs.cypress.io/guides/references/changelog#12-3-0

    v12.2.0

    Changelog: https://docs.cypress.io/guides/references/changelog#12-2-0

    v12.1.0

    Changelog: https://docs.cypress.io/guides/references/changelog#12-1-0

    v12.0.2

    Changelog: https://docs.cypress.io/guides/references/changelog#12-0-2

    v12.0.1

    Changelog: https://docs.cypress.io/guides/references/changelog#12-0-1

    v12.0.0

    Changelog: https://docs.cypress.io/guides/references/changelog#12-0-0

    Commits
    • d0ba032 chore: bump to 12.3.0 [skip ci] (#25355)
    • 5f536fe fix: make NODE_ENV "production" for prod builds of launchpad (#25320)
    • 05b9f10 fix: .contains() should only return one element at all times (#25250)
    • acc61d8 feat: add currentRetry to Cypress API (#25297)
    • 736c599 chore: release @​cypress/webpack-dev-server-v3.2.2
    • f20c6f5 chore: release create-cypress-tests-v2.0.1
    • c12a7e3 fix: change wording for spec creation (#25271)
    • 3925ae0 fix: truncate text to fix layout (#25270)
    • 7cbd2c5 chore: release @​cypress/webpack-preprocessor-v5.16.1
    • 6fc13e6 fix: added missing pending data which caused incorrect mochaawesome reports (...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies javascript 
    opened by dependabot[bot] 0
  • Bump jena.version from 4.6.1 to 4.7.0

    Bump jena.version from 4.6.1 to 4.7.0

    Bumps jena.version from 4.6.1 to 4.7.0. Updates jena-arq from 4.6.1 to 4.7.0

    Updates jena-core from 4.6.1 to 4.7.0

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies java 
    opened by dependabot[bot] 0
Releases(3.7-beta2)
  • 3.7-beta2(Dec 12, 2022)

    This is the second beta release of the 3.7 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Most text exposed to users in OpenRefine's UI can now be translated. Some strings (generated server-side) were not translatable so far. To help translators catch up on this backlog, do not hesitate to join us on Weblate. (#5030)
    • New media files can be uploaded to Wikibase instances such as Wikimedia Commons. The wikitext of existing files can also be edited thanks to the new fields introduced. (#4682)
    • A button "Discover Wikibase instances…" was added on the dialog which lists the registered Wikibase instances (#5007), whose design was improved (#5009)
    • In the Wikibase schema editor, statements with non-standard datatypes (such as EDTF dates or musical notations) are now supported, assuming they use strings as underlying representation (#3263)
    • The Wikibase issues tab now makes it possible to locate which rows are responsible for certain issues, using facets (#5033)
    • The default throttle delay for the "Add column by fetching URLs" operation was reduced to 500ms and the error reporting for this field was improved (#5188)
    • Wikibase templates (incomplete Wikibase schemas) can be saved and shared, as a way of helping contributors use the same way of structuring data in a Wikibase instance (#5043, #5303)
    • The line-based importer now supports a custom delimiter, instead of only newlines (#4103)
    • The Excel importer can be configured to import all cells as text, disabling the use of other datatypes supported by OpenRefine (#4838)
    • The "some value" and "no value" Wikibase values can now be uploaded by OpenRefine (#5360)
    • The Excel importer will also avoid coercing cell values to OpenRefine datatypes which do not fully fit them, such as representing a date as a date with time (#5389, #5390).

    GREL changes

    • Improved error handling in number formatting with the GREL toString function (#816)
    • The behaviour of the GREL function wholeText() has changed slightly in the way it handles newlines, following an upstream change in the jsoup library (jsoup issue #1636)
    • A new parent GREL function, to obtain the parent element of an XML element, was added (#5176)

    For developers

    And many bug fixes, see the full list of changes for 3.7.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.7-beta2.tar.gz(141.78 MB)
    openrefine-mac-3.7-beta2.dmg(172.66 MB)
    openrefine-win-3.7-beta2.zip(142.85 MB)
    openrefine-win-with-java-3.7-beta2.zip(142.85 MB)
  • 3.7-beta1(Dec 12, 2022)

    This is the first beta release of the 3.7 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Most text exposed to users in OpenRefine's UI can now be translated. Some strings (generated server-side) were not translatable so far. To help translators catch up on this backlog, do not hesitate to join us on Weblate. (#5030)
    • New media files can be uploaded to Wikibase instances such as Wikimedia Commons. The wikitext of existing files can also be edited thanks to the new fields introduced. (#4682)
    • A button "Discover Wikibase instances…" was added on the dialog which lists the registered Wikibase instances (#5007), whose design was improved (#5009)
    • In the Wikibase schema editor, statements with non-standard datatypes (such as EDTF dates or musical notations) are now supported, assuming they use strings as underlying representation (#3263)
    • The Wikibase issues tab now makes it possible to locate which rows are responsible for certain issues, using facets (#5033)
    • The default throttle delay for the "Add column by fetching URLs" operation was reduced to 500ms and the error reporting for this field was improved (#5188)
    • Wikibase templates (incomplete Wikibase schemas) can be saved and shared, as a way of helping contributors use the same way of structuring data in a Wikibase instance (#5043, #5303)
    • The line-based importer now supports a custom delimiter, instead of only newlines (#4103)
    • The Excel importer can be configured to import all cells as text, disabling the use of other datatypes supported by OpenRefine (#4838)
    • The "some value" and "no value" Wikibase values can now be uploaded by OpenRefine (#5360)
    • The Excel importer will also avoid coercing cell values to OpenRefine datatypes which do not fully fit them, such as representing a date as a date with time (#5389, #5390).

    GREL changes

    • Improved error handling in number formatting with the GREL toString function (#816)
    • The behaviour of the GREL function wholeText() has changed slightly in the way it handles newlines, following an upstream change in the jsoup library (jsoup issue #1636)
    • A new parent GREL function, to obtain the parent element of an XML element, was added (#5176)

    For developers

    And many bug fixes, see the full list of changes for 3.7.

    Download

    Source code(tar.gz)
    Source code(zip)
  • 3.6.2(Oct 3, 2022)

    This is the third stable release of the 3.6 series. Please backup your workspace directory before installing and report any problems that you encounter.

    Starting with version 3.6, OpenRefine requires Java 11 or later.

    New in 3.6.2

    • An overflow issue with the reconciliation dialog was fixed (#5286)

    New features

    • The user is now warned when applying the "Fill down" or "Blank down" operations with a pending sorting criterion (#3256)
    • The import preview refreshing can be disabled (#4009)
    • Menu items to reveal collapsed columns were added to the column menus (#4067)
    • The path to the refine.ini configuration file can now be changed on the command line (#4113)
    • It is now possible to download the JSON representation of the operation history, without resorting to copy and paste (#4498)
    • It is now possible to work with Wikibase instances with federation enabled (#4287)
    • The merge strategy for statements can be configured in the Wikibase schema editor. This also adds support for deleting statements. Beware that schemas created with earlier versions of OpenRefine will still use the original merge strategy. (#3383, #2116, #4130)
    • OpenRefine can edit MediaInfo entities and not just Items (#4270)
    • It is possible to disable the new version notification by setting the configuration variable refine.display.new.version.notice=false (#4410)
    • The dialog to reorder and delete columns was improved to easily delete most columns (#4557)
    • The maximum editing speed and the Wikibase tag to apply to all Wikibase edits is now configurable for each Wikibase instance via its manifest (#3359)
    • Extra URL fields in the starting page can be removed thanks to a new button (#4606)
    • The "Use values as identifiers" operation now warns that it does not validate the identifiers (#3172)

    GREL changes

    • A new GREL function, parent, was introduced to obtain the parent element of an XML element (#4181)
    • A new GREL function, scriptText, was introduced to obtain the text contained in a <script> or <style> element in HTML (#4189)
    • The random (previously randomNumber) GREL function was improved (#3143)
    • A new GREL function parseUri was introduced (#1857)
    • A new GREL function detectLanguage was introduced (#642)
    • New GREL functions encode and decode were introduced (#148)
    • The error handling of the pow and exp functions was improved (#3062)
    • The division operator returns NaN when computing 0/0 (#377)
    • A function timeSinceUnixEpochToDate was introduced, to convert a duration since Epoch to a date object (#608)
    • A function replaceEach was introduced, to replace multiple substrings in one go (#2606)

    For extension developers

    • (from 3.6-rc1 on) We migrated to jQuery 3.6.0. If you are using jQuery in your extension, some jQuery syntaxes that have been deprecated earlier might have been removed. If your extension runs with OpenRefine 3.5.2, you can check the web developer console for warning messages when the extension is used: fixing those should be enough for your extension to be compatible with OpenRefine 3.6 (#4891)

    And many bug fixes, see the complete list of changes for 3.6.

    Download

    Source code(tar.gz)
    Source code(zip)
  • 3.6.1(Aug 22, 2022)

    This is the second stable release of the 3.6 series. Please backup your workspace directory before installing and report any problems that you encounter.

    Starting with version 3.6, OpenRefine requires Java 11 or later.

    New in 3.6.1

    • The editing of redirected Wikibase entities was fixed (#5162)
    • A bug with selection of clusters in the clustering dialog was fixed (#5138)
    • Date handling in the Google data extension was fixed (#5107)
    • A packaging issue in MacOS was tentatively fixed (#5160)

    New features

    • The user is now warned when applying the "Fill down" or "Blank down" operations with a pending sorting criterion (#3256)
    • The import preview refreshing can be disabled (#4009)
    • Menu items to reveal collapsed columns were added to the column menus (#4067)
    • The path to the refine.ini configuration file can now be changed on the command line (#4113)
    • It is now possible to download the JSON representation of the operation history, without resorting to copy and paste (#4498)
    • It is now possible to work with Wikibase instances with federation enabled (#4287)
    • The merge strategy for statements can be configured in the Wikibase schema editor. This also adds support for deleting statements. Beware that schemas created with earlier versions of OpenRefine will still use the original merge strategy. (#3383, #2116, #4130)
    • OpenRefine can edit MediaInfo entities and not just Items (#4270)
    • It is possible to disable the new version notification by setting the configuration variable refine.display.new.version.notice=false (#4410)
    • The dialog to reorder and delete columns was improved to easily delete most columns (#4557)
    • The maximum editing speed and the Wikibase tag to apply to all Wikibase edits is now configurable for each Wikibase instance via its manifest (#3359)
    • Extra URL fields in the starting page can be removed thanks to a new button (#4606)
    • The "Use values as identifiers" operation now warns that it does not validate the identifiers (#3172)

    GREL changes

    • A new GREL function, parent, was introduced to obtain the parent element of an XML element (#4181)
    • A new GREL function, scriptText, was introduced to obtain the text contained in a <script> or <style> element in HTML (#4189)
    • The random (previously randomNumber) GREL function was improved (#3143)
    • A new GREL function parseUri was introduced (#1857)
    • A new GREL function detectLanguage was introduced (#642)
    • New GREL functions encode and decode were introduced (#148)
    • The error handling of the pow and exp functions was improved (#3062)
    • The division operator returns NaN when computing 0/0 (#377)
    • A function timeSinceUnixEpochToDate was introduced, to convert a duration since Epoch to a date object (#608)
    • A function replaceEach was introduced, to replace multiple substrings in one go (#2606)

    For extension developers

    • (from 3.6-rc1 on) We migrated to jQuery 3.6.0. If you are using jQuery in your extension, some jQuery syntaxes that have been deprecated earlier might have been removed. If your extension runs with OpenRefine 3.5.2, you can check the web developer console for warning messages when the extension is used: fixing those should be enough for your extension to be compatible with OpenRefine 3.6 (#4891)

    And many bug fixes, see the complete list of changes for 3.6.

    Download

    Source code(tar.gz)
    Source code(zip)
  • 3.6.0(Jul 22, 2022)

    This is the first stable release of the 3.6 series. Please backup your workspace directory before installing and report any problems that you encounter.

    Starting with version 3.6, OpenRefine requires Java 11 or later.

    New features

    • The user is now warned when applying the "Fill down" or "Blank down" operations with a pending sorting criterion (#3256)
    • The import preview refreshing can be disabled (#4009)
    • Menu items to reveal collapsed columns were added to the column menus (#4067)
    • The path to the refine.ini configuration file can now be changed on the command line (#4113)
    • It is now possible to download the JSON representation of the operation history, without resorting to copy and paste (#4498)
    • It is now possible to work with Wikibase instances with federation enabled (#4287)
    • The merge strategy for statements can be configured in the Wikibase schema editor. This also adds support for deleting statements. Beware that schemas created with earlier versions of OpenRefine will still use the original merge strategy. (#3383, #2116, #4130)
    • OpenRefine can edit MediaInfo entities and not just Items (#4270)
    • It is possible to disable the new version notification by setting the configuration variable refine.display.new.version.notice=false (#4410)
    • The dialog to reorder and delete columns was improved to easily delete most columns (#4557)
    • The maximum editing speed and the Wikibase tag to apply to all Wikibase edits is now configurable for each Wikibase instance via its manifest (#3359)
    • Extra URL fields in the starting page can be removed thanks to a new button (#4606)
    • The "Use values as identifiers" operation now warns that it does not validate the identifiers (#3172)

    GREL changes

    • A new GREL function, parent, was introduced to obtain the parent element of an XML element (#4181)
    • A new GREL function, scriptText, was introduced to obtain the text contained in a <script> or <style> element in HTML (#4189)
    • The random (previously randomNumber) GREL function was improved (#3143)
    • A new GREL function parseUri was introduced (#1857)
    • A new GREL function detectLanguage was introduced (#642)
    • New GREL functions encode and decode were introduced (#148)
    • The error handling of the pow and exp functions was improved (#3062)
    • The division operator returns NaN when computing 0/0 (#377)
    • A function timeSinceUnixEpochToDate was introduced, to convert a duration since Epoch to a date object (#608)
    • A function replaceEach was introduced, to replace multiple substrings in one go (#2606)

    For extension developers

    • (from 3.6-rc1 on) We migrated to jQuery 3.6.0. If you are using jQuery in your extension, some jQuery syntaxes that have been deprecated earlier might have been removed. If your extension runs with OpenRefine 3.5.2, you can check the web developer console for warning messages when the extension is used: fixing those should be enough for your extension to be compatible with OpenRefine 3.6 (#4891)

    And many bug fixes, see the complete list of changes for 3.6.

    Download

    Source code(tar.gz)
    Source code(zip)
  • 3.6-rc1(Jul 5, 2022)

    This is the first release candidate of the 3.6 series. Please backup your workspace directory before installing and report any problems that you encounter.

    Starting with version 3.6, OpenRefine requires Java 11 or later.

    New features

    • The user is now warned when applying the "Fill down" or "Blank down" operations with a pending sorting criterion (#3256)
    • The import preview refreshing can be disabled (#4009)
    • Menu items to reveal collapsed columns were added to the column menus (#4067)
    • The path to the refine.ini configuration file can now be changed on the command line (#4113)
    • It is now possible to download the JSON representation of the operation history, without resorting to copy and paste (#4498)
    • It is now possible to work with Wikibase instances with federation enabled (#4287)
    • The merge strategy for statements can be configured in the Wikibase schema editor. This also adds support for deleting statements. Beware that schemas created with earlier versions of OpenRefine will still use the original merge strategy. (#3383, #2116, #4130)
    • OpenRefine can edit MediaInfo entities and not just Items (#4270)
    • It is possible to disable the new version notification by setting the configuration variable refine.display.new.version.notice=false (#4410)
    • The dialog to reorder and delete columns was improved to easily delete most columns (#4557)
    • The maximum editing speed and the Wikibase tag to apply to all Wikibase edits is now configurable for each Wikibase instance via its manifest (#3359)
    • Extra URL fields in the starting page can be removed thanks to a new button (#4606)
    • The "Use values as identifiers" operation now warns that it does not validate the identifiers (#3172)

    GREL changes

    • A new GREL function, parent, was introduced to obtain the parent element of an XML element (#4181)
    • A new GREL function, scriptText, was introduced to obtain the text contained in a <script> or <style> element in HTML (#4189)
    • The random (previously randomNumber) GREL function was improved (#3143)
    • A new GREL function parseUri was introduced (#1857)
    • A new GREL function detectLanguage was introduced (#642)
    • New GREL functions encode and decode were introduced (#148)
    • The error handling of the pow and exp functions was improved (#3062)
    • The division operator returns NaN when computing 0/0 (#377)
    • A function timeSinceUnixEpochToDate was introduced, to convert a duration since Epoch to a date object (#608)
    • A function replaceEach was introduced, to replace multiple substrings in one go (#2606)

    For extension developers

    • (from 3.6-rc1 on) We migrated to jQuery 3.6.0. If you are using jQuery in your extension, some jQuery syntaxes that have been deprecated earlier might have been removed. If your extension runs with OpenRefine 3.5.2, you can check the web developer console for warning messages when the extension is used: fixing those should be enough for your extension to be compatible with OpenRefine 3.6 (#4891)

    And many bug fixes, see the complete list of changes for 3.6.

    Download

    Source code(tar.gz)
    Source code(zip)
  • 3.6-beta2(Jun 6, 2022)

    This is the second beta release of the 3.6 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • The user is now warned when applying the "Fill down" or "Blank down" operations with a pending sorting criterion (#3256)
    • The import preview refreshing can be disabled (#4009)
    • Menu items to reveal collapsed columns were added to the column menus (#4067)
    • The path to the refine.ini configuration file can now be changed on the command line (#4113)
    • It is now possible to download the JSON representation of the operation history, without resorting to copy and paste (#4498)
    • It is now possible to work with Wikibase instances with federation enabled (#4287)
    • The merge strategy for statements can be configured in the Wikibase schema editor. This also adds support for deleting statements. Beware that schemas created with earlier versions of OpenRefine will still use the original merge strategy. (#3383, #2116, #4130)
    • OpenRefine can edit MediaInfo entities and not just Items (#4270)
    • It is possible to disable the new version notification by setting the configuration variable refine.display.new.version.notice=false (#4410)
    • The dialog to reorder and delete columns was improved to easily delete most columns (#4557)
    • The maximum editing speed and the Wikibase tag to apply to all Wikibase edits is now configurable for each Wikibase instance via its manifest (#3359)
    • Extra URL fields in the starting page can be removed thanks to a new button (#4606)
    • The "Use values as identifiers" operation now warns that it does not validate the identifiers (#3172)

    GREL changes

    • A new GREL function, parent, was introduced to obtain the parent element of an XML element (#4181)
    • A new GREL function, scriptText, was introduced to obtain the text contained in a <script> or <style> element in HTML (#4189)
    • The random (previously randomNumber) GREL function was improved (#3143)
    • A new GREL function parseUri was introduced (#1857)
    • A new GREL function detectLanguage was introduced (#642)
    • New GREL functions encode and decode were introduced (#148)
    • The error handling of the pow and exp functions was improved (#3062)
    • The division operator returns NaN when computing 0/0 (#377)
    • A function timeSinceUnixEpochToDate was introduced, to convert a duration since Epoch to a date object (#608)
    • A function replaceEach was introduced, to replace multiple substrings in one go (#2606)

    And many bug fixes, see the complete list of changes for 3.6.

    Source code(tar.gz)
    Source code(zip)
    openrefine-3.6-beta2-linux.tar.gz(134.99 MB)
    openrefine-3.6-beta2-mac.dmg(179.28 MB)
    openrefine-3.6-beta2-win-with-java.zip(176.65 MB)
    openrefine-3.6-beta2-win.zip(135.99 MB)
  • 3.6-beta1(Jun 6, 2022)

  • 3.5.2(Jan 26, 2022)

    This is the third stable release of the 3.5 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New in 3.5.2

    • Log4j was upgraded to 2.17.1

    New in 3.5.1

    • Log4j was upgraded to version 2.16.0
    • OpenRefine is compatible with Java versions 8 to 17 (#4106)

    New features in 3.5

    • Wikidata support has been generalized to arbitrary Wikibase instances. (#1640)
    • The cross function now accepts implicit project and column names (#2504)
    • The left panel can be collapsed (#1038) and resized (#2771)
    • Support for more Wikidata constraints was added (multi-value, difference within range, conflicts with, and citation needed constraints) (#2354)
    • Splitting multi-valued cells is now possible by transition between uppercase/lowercase (#2238)
    • When importing multiple archive files, importers can store the filename of the archive file each row was extracted from (#1963)
    • It is now possible to go to a page of the project table directly (#2638)
    • The pagination sizes offered by the UI can now be configured by setting the ui.browsing.pageSize preference to values such as [100,500,1000,2000] (#2624)
    • Format detection at the import stage was improved (#2805, #2800)
    • The split/join multivalued cells dialogs now remember the last separator used (#2197)
    • The forEach GREL function works on JSON objects (#3149)
    • A new GREL function wholeText can be used to extract all the text inside an XML element (including in its children) (#3180)
    • A dialog to confirm the removal of starred expressions was added (#501)
    • HTTP host validation was added (#3288)
    • The Wikibase extension can now be used to add BCE dates (#3816)
    • The common cell transforms can be run on a selection of columns easily (#1843)
    • Greater numbers of rows per page can now be selected (#3249) (after 3.5-beta1)

    See the full list of changes for 3.5.

    Checksums

    • openrefine-linux-3.5.2.tar.gz: SHA f5b295d62179a9ba218607a74b7d53d0b16e44ce
    • openrefine-win-3.5.2.zip: SHA e3d457b8a6366ae7837f0dac8878d63a75236e04
    • openrefine-win-with-java-3.5.2.zip: SHA f47910898c92d61d3610a5b5a18076a6ce9669c2
    • openrefine-mac-3.5.2.dmg: SHA 7f0f1ed81ca41ea7a2f57852cc672d1781adc7b7
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.5.2.tar.gz(126.74 MB)
    openrefine-mac-3.5.2.dmg(170.73 MB)
    openrefine-win-3.5.2.zip(127.73 MB)
    openrefine-win-with-java-3.5.2.zip(164.86 MB)
  • 4.0-alpha1(Dec 30, 2021)

    This is the first alpha release of the 4.0 series. Expect many bugs: your help is welcome to test this new architecture.

    Main changes

    • This new version uses a different workspace: your projects from OpenRefine 3.x will not appear in this version. They will not be deleted though: you can always open them again by running OpenRefine 3.x. Project archives exported from OpenRefine 3.x can be read in OpenRefine 4.x, but the operation history will be discarded.
    • Project data no longer needs to fit in the working memory (RAM) of your machine. This makes it easier to work on large datasets. (#242)
    • It is possible to execute OpenRefine operations in Apache Spark (#1433). The execution engine used by OpenRefine is currently selected at startup with the -r (Unix) or /r (Windows) parameter (it is foreseen that this will change before a stable release as Spark support will be moved to an extension, see #4396).
    • Facet statistics are computed on a sample of rows by default. The size of the sample can be configured.
    • The CSV/TSV importer supports a new option which controls whether rows are allowed to span multiple lines of the source file.

    Documentation about those new features will be published soon.

    For developers

    Most extensions will be incompatible with this new version, as many incompatible changes have been introduced.

    • OpenRefine now uses the org.openrefine namespace instead of com.google.refine.
    • The code base was split into more granular Maven modules. Those modules are published to Maven Central to ease the development of extensions (currently in the snapshot repository as their structure is not final yet). Feedback about the module structure is welcome.
    • The architecture of the data processing engine changed to make it extensible. The execution of workflows can happen fully in memory, off disk or in an Apache Spark, or in other execution engines if the corresponding runners are implemented. Feedback about the data model API is welcome.

    A documentation of the new architecture will be published soon.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-4.0-alpha1.tar.gz(156.61 MB)
    openrefine-mac-4.0-alpha1.dmg(268.91 MB)
    openrefine-win-4.0-alpha1.zip(156.95 MB)
    openrefine-win-with-java-4.0-alpha1.zip(197.54 MB)
  • 3.5.1(Dec 19, 2021)

    This is the second stable release of the 3.5 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New in 3.5.1

    • Log4j was upgraded to version 2.16.0
    • OpenRefine is compatible with Java versions 8 to 17 (#4106)

    New features in 3.5

    • Wikidata support has been generalized to arbitrary Wikibase instances. (#1640)
    • The cross function now accepts implicit project and column names (#2504)
    • The left panel can be collapsed (#1038) and resized (#2771)
    • Support for more Wikidata constraints was added (multi-value, difference within range, conflicts with, and citation needed constraints) (#2354)
    • Splitting multi-valued cells is now possible by transition between uppercase/lowercase (#2238)
    • When importing multiple archive files, importers can store the filename of the archive file each row was extracted from (#1963)
    • It is now possible to go to a page of the project table directly (#2638)
    • The pagination sizes offered by the UI can now be configured by setting the ui.browsing.pageSize preference to values such as [100,500,1000,2000] (#2624)
    • Format detection at the import stage was improved (#2805, #2800)
    • The split/join multivalued cells dialogs now remember the last separator used (#2197)
    • The forEach GREL function works on JSON objects (#3149)
    • A new GREL function wholeText can be used to extract all the text inside an XML element (including in its children) (#3180)
    • A dialog to confirm the removal of starred expressions was added (#501)
    • HTTP host validation was added (#3288)
    • The Wikibase extension can now be used to add BCE dates (#3816)
    • The common cell transforms can be run on a selection of columns easily (#1843)
    • Greater numbers of rows per page can now be selected (#3249) (after 3.5-beta1)

    See the full list of changes for 3.5.

    Checksums

    • openrefine-linux-3.5.1.tar.gz: SHA d2b4298db9c771a85b955f20e923d3c0bf70c658
    • openrefine-win-3.5.1.zip: SHA 1d30a812060c7832580b56d6de31a85f4a4c4deb
    • openrefine-win-with-java-3.5.1.zip: SHA c0102e157db8470dd4d3da0f8cd9bf085f3f28c6
    • openrefine-mac-3.5.1.dmg: SHA 40f8a40c18b366a142777ef49c8c7e87dbffeb06
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.5.1.tar.gz(126.76 MB)
    openrefine-mac-3.5.1.dmg(169.42 MB)
    openrefine-win-3.5.1.zip(127.75 MB)
    openrefine-win-with-java-3.5.1.zip(168.34 MB)
  • 3.5.0(Nov 7, 2021)

    This is the first stable release of the 3.5 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Wikidata support has been generalized to arbitrary Wikibase instances. (#1640)
    • The cross function now accepts implicit project and column names (#2504)
    • The left panel can be collapsed (#1038)
    • Support for more Wikidata constraints was added (multi-value, difference within range, conflicts with, and citation needed constraints) (#2354)
    • Splitting multi-valued cells is now possible by transition between uppercase/lowercase (#2238)
    • When importing multiple archive files, importers can store the filename of the archive file each row was extracted from (#1963)
    • It is now possible to go to a page of the project table directly (#2638)
    • The pagination sizes offered by the UI can now be configured by setting the ui.browsing.pageSize preference to values such as [100,500,1000,2000] (#2624)
    • Format detection at the import stage was improved (#2805, #2800)
    • The split/join multivalued cells dialogs now remember the last separator used (#2197)
    • The forEach GREL function works on JSON objects (#3149)
    • A new GREL function wholeText can be used to extract all the text inside an XML element (including in its children) (#3180)
    • A dialog to confirm the removal of starred expressions was added (#501)
    • HTTP host validation was added (#3288)
    • The Wikibase extension can now be used to add BCE dates (#3816)
    • The common cell transforms can be run on a selection of columns easily (#1843)
    • Greater numbers of rows per page can now be selected (#3249) (after 3.5-beta1)

    See the full list of changes for 3.5.

    Checksums

    • openrefine-linux-3.5.0.tar.gz: SHA 49354715470f0a71b3f5ec9911071f87adf54275
    • openrefine-win-3.5.0.zip: SHA 42c8971afc996c8015d19b98af33dc4ae8602ee3
    • openrefine-win-with-java-3.5.0.zip: SHA 055f185796ce2849f945485ee171a3a857982a6b
    • openrefine-mac-3.5.0.dmg: SHA a7d34ac4c8528ecd53e84de1f90ad453a0ecc256
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.5.0.tar.gz(125.73 MB)
    openrefine-mac-3.5.0.dmg(169.67 MB)
    openrefine-win-3.5.0.zip(126.72 MB)
    openrefine-win-with-java-3.5.0.zip(163.85 MB)
  • 3.5-beta2(Oct 25, 2021)

    This is the second beta release for the 3.5 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Wikidata support has been generalized to arbitrary Wikibase instances. (#1640)
    • The cross function now accepts implicit project and column names (#2504)
    • The left panel can be collapsed (#1038)
    • Support for more Wikidata constraints was added (multi-value, difference within range, conflicts with, and citation needed constraints) (#2354)
    • Splitting multi-valued cells is now possible by transition between uppercase/lowercase (#2238)
    • When importing multiple archive files, importers can store the filename of the archive file each row was extracted from (#1963)
    • It is now possible to go to a page of the project table directly (#2638)
    • The pagination sizes offered by the UI can now be configured by setting the ui.browsing.pageSize preference to values such as [100,500,1000,2000] (#2624)
    • Format detection at the import stage was improved (#2805, #2800)
    • The split/join multivalued cells dialogs now remember the last separator used (#2197)
    • The forEach GREL function works on JSON objects (#3149)
    • A new GREL function wholeText can be used to extract all the text inside an XML element (including in its children) (#3180)
    • A dialog to confirm the removal of starred expressions was added (#501)
    • HTTP host validation was added (#3288)
    • The Wikibase extension can now be used to add BCE dates (#3816)
    • The common cell transforms can be run on a selection of columns easily (#1843)
    • Greater numbers of rows per page can now be selected (#3249) (after 3.5-beta1)

    See the full list of changes for 3.5.

    Checksums

    • openrefine-linux-3.5-beta2.tar.gz: SHA 20eee16459a3d246381aa6b64c8686e77063478d
    • openrefine-win-3.5-beta2.zip: SHA 9799aa8d5bfc30edee44e64796979118c9c9e983
    • openrefine-win-with-java-3.5-beta2.zip: SHA 9faf14441c588ae21dc6bd675acbb24a8c3640bb
    • openrefine-mac-3.5-beta2.dmg: SHA c12423caac7b9227614554ea48c35eccb94ceabe
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.5-beta2.tar.gz(125.74 MB)
    openrefine-mac-3.5-beta2.dmg(169.67 MB)
    openrefine-win-3.5-beta2.zip(126.74 MB)
    openrefine-win-with-java-3.5-beta2.zip(163.87 MB)
  • 3.5-beta1(May 29, 2021)

    This is the first beta release for the 3.5 series. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Wikidata support has been generalized to arbitrary Wikibase instances. (#1640)
    • The cross function now accepts implicit project and column names (#2504)
    • The left panel can be collapsed (#1038)
    • Support for more Wikidata constraints was added (multi-value, difference within range, conflicts with, and citation needed constraints) (#2354)
    • Splitting multi-valued cells is now possible by transition between uppercase/lowercase (#2238)
    • When importing multiple archive files, importers can store the filename of the archive file each row was extracted from (#1963)
    • It is now possible to go to a page of the project table directly (#2638)
    • The pagination sizes offered by the UI can now be configured by setting the ui.browsing.pageSize preference to values such as [100,500,1000,2000] (#2624)
    • Format detection at the import stage was improved (#2805, #2800)
    • The split/join multivalued cells dialogs now remember the last separator used (#2197)
    • The forEach GREL function works on JSON objects (#3149)
    • A new GREL function wholeText can be used to extract all the text inside an XML element (including in its children) (#3180)
    • A dialog to confirm the removal of starred expressions was added (#501)
    • HTTP host validation was added (#3288)
    • The Wikibase extension can now be used to add BCE dates (#3816)
    • The common cell transforms can be run on a selection of columns easily (#1843)

    See the full list of changes for 3.5.

    Checksums

    • openrefine-linux-3.5-beta1.tar.gz: SHA ada5f7b6d6efd670b4aef8632fc5571cc4dd2ea5
    • openrefine-mac-3.5-beta1.dmg: SHA c11d156f04eb968740e1c8e724a13cc31996baa0
    • openrefine-win-3.5-beta1.zip: SHA b594f3a9b2617cff6a18b13cba73c9595c14c858
    • openrefine-win-with-java-3.5-beta1.zip: SHA dcfc3135c3e1ebff28b805c8a290c71fdedbc03c
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.5-beta1.tar.gz(125.70 MB)
    openrefine-mac-3.5-beta1.dmg(165.83 MB)
    openrefine-win-3.5-beta1.zip(126.69 MB)
    openrefine-win-with-java-3.5-beta1.zip(163.77 MB)
  • 3.4.1(Sep 24, 2020)

    This is a bug fix release for the 3.4 series. Please backup your workspace directory before installing and report any problems that you encounter.

    Bug fixes

    • The MacOS build displays the correct version number (#3196)
    • The Google Drive and Google Sheets integration was fixed.

    See the full list of changes for 3.4.

    Checksums

    • openrefine-linux-3.4.1.tar.gz: SHA bcff1764f9a6420024267e2a87b3e6b88641dd3d
    • openrefine-mac-3.4.1.dmg: SHA ca1567dff7e83bc1d0f748df7631e67c1f5d8e25
    • openrefine-win-3.4.1.zip: SHA e7f4183111fddf42ce2b51e9bb7d0715f00db50e
    • openrefine-win-with-java-3.4.1.zip: SHA f38320d957da3ebf15471cb0727045e47ff0b051
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.4.1.tar.gz(114.69 MB)
    openrefine-mac-3.4.1.dmg(182.14 MB)
    openrefine-win-3.4.1.zip(115.61 MB)
    openrefine-win-with-java-3.4.1.zip(151.70 MB)
  • 3.4(Sep 6, 2020)

    This is the final release for OpenRefine 3.4. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • We now offer a Windows package with embedded Java runtime engine (no need to install Java with this one) (#2272)
    • SQLite importer added (#1951)
    • More languages including Bengali, Chinese (Simplified), Czech, and Punjabi, as well as better coverage for existing languages including Cebuano, English (UK), French, German, Hungarian, Italian, Japanese, Korean, Norwegian Bokmål, and Portuguese (Brazil).
    • Clojure updated to 1.10 (#2608)
    • Modal dialogs can now be closed with the ESC key (#1018)
    • A cell.errorMessage field has been added, to fetch the error message stored in a cell (it was originally cell.error in 3.4 beta) (#525)
    • Google OAuth credentials for Google Sheets and Google Drive integration are configurable (#2383)
    • A new menu item was created to extract entity identifiers from a reconciled column (#1975)
    • It is now possible to quote all cell values in the custom tabular exporter (#1869)
    • An option was added in the CSV/TSV importer to strip whitespace in cell values (#791)
    • The Google Sheets and Google Drive export have been added to the main "Export" menu (#2453)
    • The cross function now supports any value for input (instead of just cells) and is no longer restricted to the column where it is invoked (#1950)
    • The cross function now works for any type of cell value (#2461)
    • It is now possible to configure the maxlag value used by the Wikidata extension by setting wikibase.upload.maxLag to some integer in the preferences (in 3.4 beta, it was wikibase:upload:maxLag and that was renamed later to match the naming convention of other preferences) (#2304)
    • Facets can be minimized (#2553)
    • Excel XLSX export column limit increased from 256 to 16K columns (#2600)
    • Character encoding detection added for import (#486)

    See the full list of changes.

    Checksums

    • openrefine-linux-3.4.tar.gz: SHA e75a4b1c7a4e5c0af2a1d2d278f7707980a59e18
    • openrefine-mac-3.4.dmg: SHA f0aa34025b3137c6ccd6b49ee243cbdf06195995
    • openrefine-win-3.4.zip: SHA d2bd2b36c9304dca8408d7b5719216286b0662c7
    • openrefine-win-with-java-3.4.zip: SHA 1a57b8811af1aabe8a7632697b991db6a10f7ef4
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.4.tar.gz(114.68 MB)
    openrefine-mac-3.4.dmg(182.10 MB)
    openrefine-win-3.4.zip(115.59 MB)
    openrefine-win-with-java-3.4.zip(151.68 MB)
  • 3.4-beta2(Jul 4, 2020)

    This is the second beta release of OpenRefine 3.4. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • We now offer a Windows package with embedded Java runtime engine (no need to install Java with this one) (#2272)
    • Importing data from SQLite is now possible (#1951)
    • Modal dialogs can now be closed with the ESC key (#1018)
    • A cell.errorMessage field has been added, to fetch the error message stored in a cell (it was originally cell.error in 3.4 beta) (#525)
    • Google OAuth credentials for Google Sheets and Google Drive integration are configurable (#2383)
    • A new menu item was created to extract entity identifiers from a reconciled column (#1975)
    • It is now possible to quote all cell values in the custom tabular exporter (#1869)
    • An option was added in the CSV/TSV importer to strip whitespace in cell values (#791)
    • The Google Sheets and Google Drive export have been added to the main "Export" menu (#2453)
    • The cross function now supports any value for input (instead of just cells) and is no longer restricted to the column where it is invoked (#1950)
    • The cross function now works for any type of cell value (#2461)
    • It is now possible to configure the maxlag value used by the Wikidata extension by setting wikibase.upload.maxLag to some integer in the preferences (in the first beta, it was wikibase:upload:maxLag and that was renamed later to match the naming convention of other preferences) (#2304)
    • Facets can be minimized (#2553)

    See the full list of changes.

    Checksums

    • openrefine-linux-3.4-beta2.tar.gz: SHA afdb901d67c7cb65ae1edd704a2050986f9f829e
    • openrefine-mac-3.4-beta2.dmg: SHA f74eb021a4949790d01f1a92ad96431d532ef8ff
    • openrefine-win-3.4-beta2.zip: SHA 751c25b7a0750e0e4a34ba356616248b904a0e58
    • openrefine-win-with-java-3.4-beta2.zip: SHA c99a5974f0bf240f2b2cc3aef80236b3b7650c03
    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.4-beta2.tar.gz(111.91 MB)
    openrefine-mac-3.4-beta2.dmg(179.29 MB)
    openrefine-win-3.4-beta2.zip(112.82 MB)
    openrefine-win-with-java-3.4-beta2.zip(148.91 MB)
  • 3.4-beta(May 14, 2020)

    This is the first beta release of OpenRefine 3.4. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • We now offer a Windows package with embedded Java runtime engine (no need to install Java with this one) (#2272)
    • Importing data from SQLite is now possible (#1951)
    • Modal dialogs can now be closed with the ESC key (#1018)
    • A cell.errorMessage field has been added, to fetch the error message stored in a cell (#525)
    • A new menu item was created to extract entity identifiers from a reconciled column (#1975)
    • It is now possible to quote all cell values in the custom tabular exporter (#1869)
    • An option was added in the CSV/TSV importer to strip whitespace in cell values (#791)
    • The Google Sheets and Google Drive export have been added to the main "Export" menu (#2453)
    • The cross function is no longer restricted to source values from the column where it is invoked (#1950)
    • The cross function now works for any type of cell value (#2461)
    • It is now possible to configure the maxlag value used by the Wikidata extension by setting wikibase:upload:maxLag to some integer in the preferences (#2304)
    • Facets can be minimized (#2553)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.4-beta.tar.gz(111.73 MB)
    openrefine-mac-3.4-beta.dmg(179.11 MB)
    openrefine-win-3.4-beta.zip(112.64 MB)
    openrefine-win-with-java-3.4-beta.zip(148.73 MB)
  • 3.3(Jan 31, 2020)

    This is the final release for OpenRefine 3.3. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • A new menu for joining (concatenating) columns has been added (#2109)
    • Commonly used fields in dialogs now have autofocus (#2130)
    • The Wikidata extension now supports adding dates with custom calendars (#2136)
    • Calling reconciliation services via CORS is now supported (#2260)
    • All columns can be blanked down or filled down at once (#2280)
    • New "Blank values per column" and "Blank records per column" facets were added (#2220)

    Vulnerabilities fixed

    • A cross-site scripting vulnerability in the database extension was fixed (#2151)
    • Cross-Site Request Forgery (CSRF) protection was added to POST API endpoints. If you rely on OpenRefine's server API you will need to adapt your calls accordingly (#2164)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.3.tar.gz(101.68 MB)
    openrefine-mac-3.3.dmg(169.05 MB)
    openrefine-win-3.3.zip(102.61 MB)
  • 3.3-rc1(Jan 6, 2020)

    This is the first release candidate for OpenRefine 3.3. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • A new menu for joining (concatenating) columns has been added (#2109)
    • Commonly used fields in dialogs now have autofocus (#2130)
    • The Wikidata extension now supports adding dates with custom calendars (#2136)
    • Calling reconciliation services via CORS is now supported (#2260)

    Vulnerabilities fixed

    • A cross-site scripting vulnerability in the database extension was fixed (#2151)
    • Cross-Site Request Forgery (CSRF) protection was added to POST API endpoints. If you rely on OpenRefine's server API you will need to adapt your calls accordingly (#2164)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.3-rc1.tar.gz(101.69 MB)
    openrefine-mac-3.3-rc1.dmg(169.03 MB)
    openrefine-win-3.3-rc1.zip(102.64 MB)
  • 3.3-beta(Oct 21, 2019)

    This is the first beta release of OpenRefine 3.3. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • A new menu for joining (concatenating) columns has been added (#2109)
    • Commonly used fields in dialogs now have autofocus (#2130)
    • The Wikidata extension now supports adding dates with custom calendars (#2136)

    Vulnerabilities

    • A cross-site scripting vulnerability in the database extension was fixed (#2151)
    • Cross-Site Request Forgery (CSRF) protection was added to POST API endpoints. If you rely on OpenRefine's server API you will need to adapt your calls accordingly (#2164)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.3-beta.tar.gz(100.86 MB)
    openrefine-mac-3.3-beta.dmg(168.28 MB)
    openrefine-win-3.3-beta.zip(101.78 MB)
  • 3.2(Jul 26, 2019)

    This is the final release of OpenRefine 3.2. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • New action to replace smart quotes to their ASCII equivalent (#1676)
    • New phonetic clustering methods are available: Beider-Morse (#926) and Daitch-Mokotoff (#927)
    • The "Uses values as identifiers" operation now accepts cells with RDF uris instead of just identifiers, using the identifierSpace declared by the reconciliation service (#1953)
    • References in the Wikidata schema can be copied across statements and items (#1912)
    • Items suggested by auto-complete can now be clicked with the middle button, which opens their URL in a new tab (#1934)
    • Reconciliation previews are now shown when hovering the candidate (no click is needed). Clicking the candidate opens its page in a new tab. It is possible to disable this feature for matched cells by adding cell-ui.previewMatchedCells=false in the preferences. (#1943)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.2.tar.gz(101.13 MB)
    openrefine-mac-3.2.dmg(176.32 MB)
    openrefine-win-3.2.zip(102.08 MB)
  • 3.2-beta(Mar 1, 2019)

    This is the beta release of OpenRefine 3.2. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • New action to replace smart quotes to their ASCII equivalent (#1676)
    • New phonetic clustering methods are available: Beider-Morse (#926) and Daitch-Mokotoff (#927)
    • The "Uses values as identifiers" operation now accepts cells with RDF uris instead of just identifiers, using the identifierSpace declared by the reconciliation service (#1953)
    • References in the Wikidata schema can be copied across statements and items (#1912)
    • Items suggested by auto-complete can now be clicked with the middle button, which opens their URL in a new tab (#1934)
    • Reconciliation previews are now shown when hovering the candidate (no click is needed). Clicking the candidate opens its page in a new tab. It is possible to disable this feature for matched cells by adding cell-ui.previewMatchedCells=false in the preferences. (#1943)

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.2-beta.tar.gz(94.85 MB)
    openrefine-mac-3.2-beta.dmg(170.06 MB)
    openrefine-win-3.2-beta.zip(95.89 MB)
  • 3.1(Nov 29, 2018)

    This is the final release of OpenRefine 3.1. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Importing n-triples, ttl, and JSON-LD files is now possible (#1758)
    • The smartSplit function now supports any string, not just a single character. (#1761)
    • A new menu to search and replace was added (#1742)
    • A field to specify custom column names was added in the CSV/TSV importer
    • It is now possible to import and export a Wikidata schema in JSON (#1776)
    • Strings are now automatically trimmed in Wikidata schemas. The corresponding issues have been removed. (#1781)
    • Browser-based autocomplete has been enabled for Wikidata edit summaries. (#1596)
    • It is now possible to mark a column of identifiers as reconciled without calling the reconciliation service (#1778)
    • The GREL function parseXml was added (#1818)
    • The way text facets handle non-text values was changed. If you rely on this, make sure you add .toString() to the expressions used for text facets in your workflows. (#1662)

    See the full list of changes.

    OpenRefine is funded by the Google News Initiative.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.1.tar.gz(92.15 MB)
    openrefine-mac-3.1.dmg(159.51 MB)
    openrefine-win-3.1.zip(93.05 MB)
  • 3.1-beta(Nov 7, 2018)

    This is the beta release of OpenRefine 3.1. Please backup your workspace directory before installing and report any problems that you encounter.

    New features

    • Importing n-triples, ttl, and JSON-LD files is now possible (#1758)
    • The smartSplit function now supports any string, not just a single character (#1761)
    • A new menu to search and replace was added (#1742)
    • A field to specify custom column names was added in the CSV/TSV importer
    • It is now possible to import and export a Wikidata schema in JSON (#1776)
    • Strings are now automatically trimmed in Wikidata schemas. The corresponding issues have been removed. (#1781)
    • Browser-based autocomplete has been enabled for Wikidata edit summaries. (#1596)
    • It is now possible to mark a column of identifiers as reconciled without calling the reconciliation service. (#1778)

    See the full list of changes.

    OpenRefine is funded by the Google News Initiative.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.1-beta.tar.gz(91.81 MB)
    openrefine-mac-3.1-beta.dmg(159.20 MB)
    openrefine-win-3.1-beta.zip(92.68 MB)
  • 3.0(Sep 16, 2018)

  • 3.0-rc.1(Jul 17, 2018)

    This is the RC release of OpenRefine 3.0. Please backup your workspace directory before installing and report any problems that you encounter.

    New features:

    • Wikidata extension
    • Data package metadata
    • Tag system
    • Google drive API
    • OpenRefine Database Import Extension
    • Add coalesce function
    • Implement "Facet by null" and "Facet by empty string" and add to customized facets menu
    • Feature Request: Export SqlDump
    • Migrate from JRDF to JENA library
    • Added option to toggle show/hide null values in cells in data-table
    • Unify the internal date type
    • Update OpenRefine logo
    • Set http req headers
    • Add find function
    • Some bug fixes

    See the full list of changes.

    OpenRefine is funded by Google News Initiative

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.0-rc.1.tar.gz(87.33 MB)
    openrefine-mac-3.0-rc.1.dmg(148.65 MB)
    openrefine-win-3.0-rc.1.zip(87.78 MB)
  • 3.0-beta(May 27, 2018)

    This is the beta release of OpenRefine 3.0. Please backup your workspace directory before installing and report any problems that you encounter.

    New features:

    • Wikidata extension
    • Data package metadata
    • Tag system
    • Google drive API
    • OpenRefine Database Import Extension
    • Add coalesce function
    • Implement "Facet by null" and "Facet by empty string" and add to customized facets menu
    • Feature Request: Export SqlDump
    • Migrate from JRDF to JENA library
    • Added option to toggle show/hide null values in cells in data-table
    • Unify the internal date type
    • Update OpenRefine logo
    • Set http req headers
    • Add find function
    • Some bug fixes

    See the full list of changes.

    OpenRefine is funded by Google News Initiative

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-3.0-beta.tar.gz(86.37 MB)
    openrefine-mac-3.0-beta.dmg(147.70 MB)
    openrefine-win-3.0-beta.zip(86.83 MB)
  • 2.8(Nov 19, 2017)

    This is the official release of OpenRefine 2.8. Please backup your workspace directory before installing and report any problems that you encounter.

    New features:

    • Project metadata support
    • Enhancement of the reconciliation API
    • Support split multivalued-cells by regex/special characters
    • Text filter exclude
    • Add free memory detection and Notification to user
    • Improved UI for better usability
    • New importer for Wikitables
    • Some bug fixes

    See the full list of changes.

    OpenRefine is funded by Google News Initiative

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-2.8.tar.gz(66.44 MB)
    openrefine-mac-2.8.dmg(127.94 MB)
    openrefine-win-2.8.zip(66.79 MB)
  • 2.7(Jun 18, 2017)

    This is the official release of OpenRefine 2.7. Please backup your workspace directory before installing and report any problems that you encounter.

    New features:

    • Wikidata Reconcile (replaced old Freebase Reconcile service) and hosted by Wikimedia Foundation.
    • Export Clusters button on Clustering dialog.
    • Japanese translation
    • Support multiple "logical and" and "logical or" instead of just 2
    • "Transform All" support to apply the operations to multiple columns
    • Some bug fixes

    See the full list of changes.

    Source code(tar.gz)
    Source code(zip)
    openrefine-linux-2.7.tar.gz(60.22 MB)
    openrefine-mac-2.7.dmg(121.70 MB)
    openrefine-win-2.7.zip(60.58 MB)
Owner
OpenRefine
A free, open source, power tool for working with messy data.
OpenRefine
SAMOA (Scalable Advanced Massive Online Analysis) is an open-source platform for mining big data streams.

SAMOA: Scalable Advanced Massive Online Analysis. This repository is discontinued. The development of SAMOA has moved over to the Apache Software Foun

Yahoo Archive 424 Dec 28, 2022
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Apache Zeppelin Documentation: User Guide Mailing Lists: User and Dev mailing list Continuous Integration: Contributing: Contribution Guide Issue Trac

The Apache Software Foundation 5.9k Jan 8, 2023
A platform for visualization and real-time monitoring of data workflows

Status This project is no longer maintained. Ambrose Twitter Ambrose is a platform for visualization and real-time monitoring of MapReduce data workfl

Twitter 1.2k Dec 31, 2022
Flink CDC Connectors is a set of source connectors for Apache Flink

Flink CDC Connectors is a set of source connectors for Apache Flink, ingesting changes from different databases using change data capture (CDC). The Flink CDC Connectors integrates Debezium as the engine to capture data changes.

null 6 Mar 23, 2022
Netflix's distributed Data Pipeline

Suro: Netflix's Data Pipeline Suro is a data pipeline service for collecting, aggregating, and dispatching large volume of application events includin

Netflix, Inc. 772 Dec 9, 2022
Hadoop library for large-scale data processing, now an Apache Incubator project

Apache DataFu Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by

LinkedIn's Attic 589 Apr 1, 2022
The official home of the Presto distributed SQL query engine for big data

Presto Presto is a distributed SQL query engine for big data. See the User Manual for deployment instructions and end user documentation. Requirements

Presto 14.3k Jan 5, 2023
Program finds average number of words in each comment given a large data set by use of hadoop's map reduce to work in parallel efficiently.

Finding average number of words in all the comments in a data set ?? Mapper Function In the mapper function we first tokenize entire data and then fin

Aleezeh Usman 3 Aug 23, 2021
Access paged data as a "stream" with async loading while maintaining order

DataStream What? DataStream is a simple piece of code to access paged data and interface it as if it's a single "list". It only keeps track of queued

Thomas 1 Jan 19, 2022
Distributed and fault-tolerant realtime computation: stream processing, continuous computation, distributed RPC, and more

IMPORTANT NOTE!!! Storm has Moved to Apache. The official Storm git repository is now hosted by Apache, and is mirrored on github here: https://github

Nathan Marz 8.9k Dec 26, 2022
Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

Elephant Bird About Elephant Bird is Twitter's open source library of LZO, Thrift, and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats,

Twitter 1.1k Jan 5, 2023
Stream summarizer and cardinality estimator.

Description A Java library for summarizing data in streams for which it is infeasible to store all events. More specifically, there are classes for es

AddThis 2.2k Dec 30, 2022
Machine Learning Platform and Recommendation Engine built on Kubernetes

Update January 2018 Seldon Core open sourced. Seldon Core focuses purely on deploying a wide range of ML models on Kubernetes, allowing complex runtim

Seldon 1.5k Dec 15, 2022
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop

Elasticsearch Hadoop Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Apache Hive, Apache Pig, Apach

elastic 1.9k Dec 22, 2022
Desktop app to browse and administer your MongoDB cluster

UMONGO, the MongoDB GUI UMONGO, the MongoDB GUI About This version of UMongo is provided as free software by EdgyTech LLC. UMongo is open source, and

Antoine Girbal 583 Nov 11, 2022
A scalable, mature and versatile web crawler based on Apache Storm

StormCrawler is an open source collection of resources for building low-latency, scalable web crawlers on Apache Storm. It is provided under Apache Li

DigitalPebble Ltd 776 Jan 2, 2023
Jacksum (JAva ChecKSUM) is a free, open source, cross-platform, feature-rich, multi-threaded command line tool for calculating hash values, verifying data integrity, finding files by their fingerprints, and finding algorithms to a hash value.

Jacksum (JAva ChecKSUM) is a free, open source, cross-platform, feature-rich, multi-threaded command line tool for calculating hash values, verifying data integrity, finding files by their fingerprints, and finding algorithms to a hash value.

Johann N. Löfflmann 17 Dec 26, 2022
The open-source Java obfuscation tool working with Ant and Gradle by yWorks - the diagramming experts

yGuard yGuard is an open-source Java obfuscation tool. With yGuard it is easy as pie ( ?? ) to configure obfuscation through an extensive ant task. yG

yWorks GmbH 265 Jan 2, 2023
The Apache PDFBox library is an open source Java tool for working with PDF documents

Apache PDFBox The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents,

The Apache Software Foundation 1.8k Dec 31, 2022