Embulk: Pluggable Bulk Data Loader.

Overview

What's Embulk?

Embulk is a parallel bulk data loader that helps data transfer between various storages, databases, NoSQL and cloud services.

Embulk supports plugins to add functions. You can share the plugins to keep your custom scripts readable, maintainable, and reusable.

Embulk Embulk, an open-source plugin-based parallel bulk data loader at Slideshare

Document

Embulk documents: https://www.embulk.org/

Using plugins

You can use plugins to load data from/to various systems and file formats. Here is the list of publicly released plugins: list of plugins by category.

An example is embulk-output-command plugin. It executes an external command to output the records.

To install plugins, you can use embulk gem install <name> command:

embulk gem install embulk-output-command
embulk gem list

Embulk bundles some built-in plugins such as embulk-encoder-gzip or embulk-formatter-csv. You can use those plugins with following configuration file:

in:
  type: file
  path_prefix: "./try1/csv/sample_"
  ...
out:
  type: command
  command: "cat - > task.$INDEX.$SEQID.csv.gz"
  encoders:
    - {type: gzip}
  formatter:
    type: csv

Resuming a failed transaction

Embulk supports resuming failed transactions. To enable resuming, you need to start transaction with -r PATH option:

embulk run config.yml -r resume-state.yml

If the transaction fails, embulk stores state some states to the yaml file. You can retry the transaction using exactly same command:

embulk run config.yml -r resume-state.yml

If you give up on resuming the transaction, you can use embulk cleanup subcommand to delete intermediate data:

embulk cleanup config.yml -r resume-state.yml

Using plugin bundle

embulk mkbundle subcommand creates a isolated bundle of plugins. You can install plugins (gems) to the bundle directory instead of ~/.embulk directory. This makes it easy to manage versions of plugins. To use the bundle, add -b <bundle_dir> option to guess, preview, or run subcommand. embulk mkbundle also generates some example plugins to <bundle_dir>/embulk/*.rb directory.

See the generated <bundle_dir>/Gemfile file how to plugin bundles work.

embulk mkbundle ./embulk_bundle  # please edit ./embulk_bundle/Gemfile to add plugins. Detailed usage is written in the Gemfile
embulk guess -b ./embulk_bundle ...
embulk run   -b ./embulk_bundle ...

Use cases

For further details, visit Embulk documentation.

Upgrading to the latest version

Following command updates embulk itself to the specific released version.

embulk selfupdate x.y.z

Embulk Development

Build

./gradlew cli  # creates pkg/embulk-VERSION.jar

You can see JaCoCo's test coverage report at ${project}/build/reports/tests/index.html You can see Findbug's report at ${project}/build/reports/findbug/main.html # FIXME coverage information is not included somehow

You can use classpath task to use bundle exec ./bin/embulk for development:

./gradlew -t classpath  # -x test: skip test
./bin/embulk

To deploy artifacts to your local maven repository at ~/.m2/repository/:

./gradlew install

To compile the source code of embulk-core project only:

./gradlew :embulk-core:compileJava

Task dependencies shows dependency tree of embulk-core project:

./gradlew :embulk-core:dependencies

Update JRuby

Modify jrubyVersion in build.gradle to update JRuby of Embulk.

Release

You need to add your bintray account information to ~/.gradle/gradle.properties

bintray_user=(bintray user name)
bintray_api_key=(bintray api key)

Modify version in build.gradle at a detached commit to bump Embulk version up.

git checkout --detach master
(Remove "-SNAPSHOT" in "version" in build.gradle.)
git add build.gradle
git commit -m "Release vX.Y.Z"
git tag -a vX.Y.Z
(Write the release note for vX.Y.Z in the tag annotation.)
./gradlew clean && ./gradlew release
git push -u origin vX.Y.Z

See also:

Comments
  • Add clean illegal characters mode to json parser.

    Add clean illegal characters mode to json parser.

    A JSON which including broken-encoded character is throwing an exception from Jackson because that is supported pure JSON specification in below reasons.

    • https://github.com/FasterXML/jackson-core/issues/222#issuecomment-146697502

    But ideally, I want to force loading the broken-encoded JSON files. Therefore, I add clean_illegal_char mode to the JSON parser and that works cleansing to illegal char inside of JSON characters.

    Please, check this PRs.

    • Stack Trace.
    Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 middle byte 0x5c
     at [Source: org.embulk.spi.util.FileInputInputStream@4f26425b; line: 17821, column: 69]
    	at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581)
    	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3473)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidOther(UTF8StreamJsonParser.java:3480)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeUtf8_2(UTF8StreamJsonParser.java:3254)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2462)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2414)
    	at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:285)
    	at org.embulk.spi.json.JsonParser$AbstractParseContext.jsonTokenToValue(JsonParser.java:188)
    	at org.embulk.spi.json.JsonParser$AbstractParseContext.jsonTokenToValue(JsonParser.java:220)
    	at org.embulk.spi.json.JsonParser$AbstractParseContext.next(JsonParser.java:152)
    
    enhancement 
    opened by smdmts 26
  • Java timestamp parser and RubyDateParser v2

    Java timestamp parser and RubyDateParser v2

    To improve the performance of parsing timestamp values, this PR introduces Java timestamp parser. It implements RubyDateParser, which is compatible with Ruby's strptime parser. The parser parses timestamp values by strptime tokens pre-tokenized by strptime lexer.

    This PR is a replacement of #608. #608 will be closed.

    enhancement topic:timestamp 
    opened by muga 22
  • Easy way to extract data from multiple data sources

    Easy way to extract data from multiple data sources

    Hello everyone,

    I have an Excel file and I have to join data in this excel file to data stored in a MySQL db in order to store the result in an other excel file. Is there an easy and fast way to do that with embulk (= 1 yml file) ? Or do I have to first import both data to 2 db tables and then extract the data crossed by a query to an excel file ? (3 YML files + one SQL statement to delete the 2 tables)

    Thanks Vincent

    opened by Olaktal 19
  • [Redshift] Merge mode + Parser none Identity column issue

    [Redshift] Merge mode + Parser none Identity column issue

    Hello,

    I'm using a redshift db this is my usecase: I have multiple platforms to store ("France","Spain", ..)

    create table platform( id identity (1,1) int, original_id varchar(2), label varchar(50), active int)

    And I have the following data in input: (original_id,label,active) ('FR','France',1) ('SP','Spain',1)

    Original_id is unique and this is my merge key but for performance issues I want to have a integer as primary key so this is why I added an "id" column in my table "platform".

    My input config is like: parser: type: none columns: - {name: original_id, type: string} - {name: label, type: string} - {name: active, type: long}

    --> It should allow me to insert data into a table with an Identity column

    My output config is like: mode: merge merge_keys: ["original_id"]

    --> This should allow me to merge data if the original_id is already in the platform table, if not, insert it.

    But I encountered the following message:

    "Error: java.lang.RuntimeException: org.postgresql.util.PSQLException: ERROR: cannot set an identity column to a value"

    Do you have any idea on how to solve this problem ? I tried to add in the output part options like "incremental: true" and "incremental_columns: [id] " but it didn't worked. If it is not clear, do not hesitate to ask me.

    Best regards, Vincent

    opened by Olaktal 17
  • Fix path extraction and POSIX permission handling in Selfupdate.

    Fix path extraction and POSIX permission handling in Selfupdate.

    see #806.

    I fix selfupdate subcommand raises an exception on a Windows platform.

    1. I changed "creating Path" process EmbulkSelfUpdate.class.getProtectionDomain().getCodeSource().getLocation().toURI().getPath() to EmbulkSelfUpdate.class.getProtectionDomain().getCodeSource().getLocation().toURI() (Using Paths#get(URI))
    2. I added "checking POSIX support" process for source jar file Need support POSIX, in source and destination.
    3. I added "create temp jar file, when selfupdate" process Windows can not delete running jar file, So used temp jar file, when selfupdate.
    bug 
    opened by mikoto2000 16
  • PreviewExecutor: Make sampling buffer bytes configurable

    PreviewExecutor: Make sampling buffer bytes configurable

    This PR enables users configuring bytes of sampling buffer that preview command tries to read from input data source. Since users have various files and they sometimes want to change the # of rows that preview shows, it's better to configure the bytes by users. In the latest version of Embulk, v0.8.18, the bytes is 32KB. For example, if an user has a TSV file that includes one record line and the record bytes is over 32KB, preview will fail by NoSampleError.

    To do that, it introduces preview_sample_buffer_bytes option. Users can configure sampling buffer bytes in Embulk config exec: section as following.

    exec:
      preview_sample_buffer_bytes: 65536
    in:
      type: file
      ... ...
    
    opened by muga 16
  • Add ConfigInputPlugin for easier testing.

    Add ConfigInputPlugin for easier testing.

    When quick-testing in developing the core or plugins (later than input), it is often annoying to set up input. Prepare CSV on a filesystem, confirm the path, ...

    An input which does not depend on external resources may help solving the situation. This is it. An input from the Embulk config itself. :)

    in:
      type:
        name: config
      columns:
      - {name: id, type: long}
      - {name: name, type: string}
      values:
      - - [ 12, "foo" ]
        - [ 24, "bar" ]
      - - [ 98, "hoge" ]
        - [ 21, "fuga" ]
    out:
      type: stdout
    
    2017-06-16 17:34:41.041 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 2 * 2
    2017-06-16 17:34:41.053 +0900 [INFO] (0001:transaction): {done:  0 / 2, running: 0}
    98,hoge
    12,foo
    21,fuga
    24,bar
    2017-06-16 17:34:41.130 +0900 [INFO] (0001:transaction): {done:  2 / 2, running: 0}
    2017-06-16 17:34:41.130 +0900 [INFO] (0001:transaction): {done:  2 / 2, running: 0}
    2017-06-16 17:34:41.138 +0900 [INFO] (main): Committed.
    2017-06-16 17:34:41.138 +0900 [INFO] (main): Next config diff: {"in":{},"out":{}}
    
    kaizen 
    opened by dmikurube 15
  • Can not find InputPlugin 'postgresql' error

    Can not find InputPlugin 'postgresql' error

    Hello, members. I got the following errors when i running embulk tasks.

    org.embulk.config.ConfigException: InputPlugin 'postgresql' is not found.

    The embulk bundle install was successful. Some task was run successfully, some was got this error.

    Environment

    • Digdag 0.9.24
    • Embulk 0.9.5
    • embulk-input-postgresql 0.9.1

    Logs

    2018-04-08 08:00:00.368 +0000 [INFO] (main): Started Embulk v0.9.5
    2018-04-08 08:00:00.539 +0000 [INFO] (0018:task-0000): > 0.36 seconds
    2018-04-08 08:00:00.549 +0000 [INFO] (0018:task-0000): Fetched 500 rows.
    2018-04-08 08:00:00.555 +0000 [INFO] (0018:task-0000): Fetched 1,000 rows.
    2018-04-08 08:00:00.578 +0000 [INFO] (0018:task-0000): Fetched 2,000 rows.
    2018-04-08 08:00:00.645 +0000 [INFO] (0018:task-0000): Fetched 4,000 rows.
    2018-04-08 08:00:00.745 +0000 [INFO] (0018:task-0000): Fetched 8,000 rows.
    2018-04-08 08:00:00.777 +0000 [INFO] (0018:task-0000): SQL: FETCH FORWARD 10000 FROM cur
    2018-04-08 08:00:00.965 +0000 [INFO] (0001:transaction): BUNDLE_GEMFILE is being set: "/home/xxxx/./Gemfile"
    2018-04-08 08:00:00.966 +0000 [INFO] (0001:transaction): Gem's home and path are being cleared.
    org.embulk.config.ConfigException: InputPlugin 'postgresql' is not found.
    java.lang.NullPointerException: Inflater has been closed
    at org.embulk.plugin.PluginManager.buildPluginNotFoundException(PluginManager.java:78)
    at org.embulk.plugin.PluginManager.newPluginWithoutWrapper(PluginManager.java:64)
    at org.embulk.plugin.PluginManager.newPlugin(PluginManager.java:31)
    at org.embulk.spi.ExecSession.newPlugin(ExecSession.java:147)
    at org.embulk.spi.Exec.newPlugin(Exec.java:63)
    at org.embulk.exec.BulkLoader$ProcessPluginSet.<init>(BulkLoader.java:409)
    at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:484)
    at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
    at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
    at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
    at org.embulk.spi.Exec.doWith(Exec.java:22)
    at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
    at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:162)
    at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:292)
    at org.embulk.EmbulkRunner.run(EmbulkRunner.java:156)
    at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:436)
    at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:91)
    at org.embulk.cli.Main.main(Main.java:26)
    Suppressed: org.embulk.plugin.PluginSourceNotMatchException
    at org.embulk.plugin.InjectedPluginSource.newPlugin(InjectedPluginSource.java:53)
    at org.embulk.plugin.PluginManager.newPluginWithoutWrapper(PluginManager.java:52)
    ... 16 more
    Suppressed: org.embulk.plugin.PluginSourceNotMatchException
    at org.embulk.plugin.maven.MavenPluginSource.newPlugin(MavenPluginSource.java:65)
    at org.embulk.plugin.PluginManager.newPluginWithoutWrapper(PluginManager.java:52)
    ... 16 more
    Suppressed: org.embulk.plugin.PluginSourceNotMatchException: org.jruby.embed.EvalFailedException: java.lang.NullPointerException: Inflater has been closed
    at org.embulk.jruby.JRubyPluginSource.newPlugin(JRubyPluginSource.java:63)
    at org.embulk.plugin.PluginManager.newPluginWithoutWrapper(PluginManager.java:59)
    ... 16 more
    Caused by: org.jruby.embed.EvalFailedException: java.lang.NullPointerException: Inflater has been closed
    at org.jruby.embed.internal.EmbedEvalUnitImpl.run(EmbedEvalUnitImpl.java:137)
    at org.jruby.embed.ScriptingContainer.runUnit(ScriptingContainer.java:1307)
    at org.jruby.embed.ScriptingContainer.runScriptlet(ScriptingContainer.java:1300)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.embulk.jruby.ScriptingContainerDelegateImpl.runScriptlet(ScriptingContainerDelegateImpl.java:764)
    at org.embulk.jruby.JRubyInitializer.setBundlerPluginSourceDirectory(JRubyInitializer.java:222)
    at org.embulk.jruby.JRubyInitializer.initialize(JRubyInitializer.java:109)
    at org.embulk.jruby.LazyScriptingContainerDelegate.getInitialized(LazyScriptingContainerDelegate.java:206)
    at org.embulk.jruby.LazyScriptingContainerDelegate.runScriptlet(LazyScriptingContainerDelegate.java:186)
    at org.embulk.jruby.JRubyPluginSource.newPlugin(JRubyPluginSource.java:60)
    ... 17 more
    Caused by: java.lang.NullPointerException: Inflater has been closed
    at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
    at java.util.zip.Inflater.inflate(Inflater.java:257)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
    at org.jruby.util.io.PosixShim.read(PosixShim.java:157)
    at org.jruby.util.io.OpenFile$2.run(OpenFile.java:1314)
    at org.jruby.util.io.OpenFile$2.run(OpenFile.java:1304)
    at org.jruby.RubyThread.executeTask(RubyThread.java:1486)
    at org.jruby.util.io.OpenFile.readInternal(OpenFile.java:1377)
    at org.jruby.util.io.OpenFile.ioBufread(OpenFile.java:1714)
    at org.jruby.util.io.OpenFile.bufreadCall(OpenFile.java:1754)
    at org.jruby.util.io.OpenFile.fread(OpenFile.java:1771)
    at org.jruby.util.io.OpenFile.readAll(OpenFile.java:1677)
    at org.jruby.RubyIO.read(RubyIO.java:3082)
    at org.jruby.RubyIO.read(RubyIO.java:3066)
    at org.jruby.RubyIO.read19(RubyIO.java:3680)
    at org.jruby.RubyIO$INVOKER$s$0$3$read19.call(RubyIO$INVOKER$s$0$3$read19.gen)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:212)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:208)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:338)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:183)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:324)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:84)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:318)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:155)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:315)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:78)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:144)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:130)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:192)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:298)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:127)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:344)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:84)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:318)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:155)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:315)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:84)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:318)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:155)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:315)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.InterpreterEngine.interpret(InterpreterEngine.java:84)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.INTERPRET_METHOD(MixedModeIRMethod.java:179)
    at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:165)
    at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
    at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:318)
    at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:155)
    at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:315)
    at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:74)
    at org.jruby.ir.interpreter.Interpreter.INTERPRET_ROOT(Interpreter.java:112)
    at org.jruby.ir.interpreter.Interpreter.execute(Interpreter.java:99)
    at org.jruby.ir.interpreter.Interpreter.execute(Interpreter.java:35)
    at org.jruby.ir.IRTranslator.execute(IRTranslator.java:42)
    at org.jruby.Ruby.runInterpreter(Ruby.java:848)
    at org.jruby.Ruby.runInterpreter(Ruby.java:852)
    at org.jruby.embed.internal.EmbedEvalUnitImpl.run(EmbedEvalUnitImpl.java:118)
    ... 29 more
    Error: InputPlugin 'postgresql' is not found.
    java.lang.NullPointerException: Inflater has been closed
    2018-04-08 08:00:01.124 +0000 [INFO] (0018:task-0000): > 0.35 seconds
    2018-04-08 08:00:01.128 +0000 [INFO] (0018:task-0000): Fetched 500 rows.
    
    
    opened by gccj 14
  • Remove unnecessary error validations

    Remove unnecessary error validations

    fix https://github.com/embulk/embulk/issues/630

    As I wrote in https://github.com/embulk/embulk/issues/630, ConfigError, DataError and PluginLoadError cannot have cause. Since the behavior is difficult for Ruby Plugin Creators, I think the error class should behave like inherited Ruby Exception class. Now I found the error class behaves like inherited Ruby Exception class even if they inherit Java Error class. See https://gist.github.com/civitaspo/415142e2d567ec5ff79827b35d50f9fb (This is just the same on issue 630.) So, I remove the special implementations.

    opened by civitaspo 14
  • Match with path_prefix in the proper way of the file system in LocalFileInputPlugin

    Match with path_prefix in the proper way of the file system in LocalFileInputPlugin

    For example, on MS Windows, the path_prefix is matched case-insensitively.

    This PR has been already discussed on the issue #1021. But you can drop this PR because of some reason.

    Thanks.

    kaizen 
    opened by jca02266 12
  • Upgrade jruby-gradle-jar-plugin to 1.5.0.

    Upgrade jruby-gradle-jar-plugin to 1.5.0.

    @muga Can you have a look? It may solve the problem of the "rubygems.lasagna.io" problem in building Embulk. (Thanks to @hiroyuki-sato's finding. :) )

    development-core-internal 
    opened by dmikurube 12
  • [Draft] add new spi ParserPlugin method - runThenReturnTaskReport

    [Draft] add new spi ParserPlugin method - runThenReturnTaskReport

    This PR not to merge, this just shows the idea to get advice or a discussion

    The current PaserPlugin method void run(TaskSource taskSource, Schema schema, FileInput input, PageOutput output); return nothing, then when executing a File Input Plugin, we have no way to know the result of an execution

    • number of success lines?
    • number of warning lines and which lines warning happens?

    The idea is adding new spi parser method same behavior with run method and return TaskReport By default, this new method will callback to run method, so it will compatible with current parser plugins

    opened by vietnguyen-td 0
  • Write up retrospective EEPs about changes through v0.9 - v0.10

    Write up retrospective EEPs about changes through v0.9 - v0.10

    We had many changes in Embulk through v0.9 - v0.10. Planning to write up retrospective EEPs about them to leave historical documents about design decisions on them.

    They are numbered tentatively like below.

    1. EEP Purpose and Guidelines (done)
    2. "Compact Core" Principle
      • embulk-util libraries
    3. Class Loading
      • parent_first
      • built-in plugins (?) => EEP-20
      • migration plan (v0.10)
    4. SPI Separation and Compatibility Contracts
      • includes merging API and SPI
      • refers to 3 (Class Loading) and 6 (Java 9+)
      • Exec, ExecInternal, ...
      • Type#getJavaType and Type#getFixedStorageSize deprecation
    5. JRuby Deprioritization
    6. Supporting Java 9+
      • refers to 4 (SPI Separation) about Java Modules
      • JRuby version selected by users
      • Bundler and Liquid installation by users
      • JEP 320
    7. Deprecating Executable Binary
      • refers to 6 (Java 9+)
    8. Plugins in Maven Format
    9. Logging, SLF4J and Reporter Plugins
      • SLF4J logging from the very beginning (related to 10 (Embulk System Properties))
    10. System-wide Configuration by Embulk System Properties and Embulk Home
    11. Timestamps and Time Zones without JRuby and Joda-Time
    12. JSON Values without MessagePack for Java
    13. Standard Plugins Independence
    14. Processing Configurations (embulk-util-config)
      • ModelManager
    15. Deprecation of Dependency Librarires
      • refers to 2 (Class Loading)
    16. Deprecation of Dependency Injection (Guice)
      • refers to 2 (Class Loading)
    17. Removal of embulk selfupdate
    18. Removal of embulk new and migrate
    19. Removal of EmbulkService
      • refers to 16 (Guice)
    20. Plugin embedding in one executable binary
    documentation 
    opened by dmikurube 0
  • embulk help message show the default path even if I set `embulk_home` explicitly.

    embulk help message show the default path even if I set `embulk_home` explicitly.

    Issue Type: Bug Report

    • Write the following environmental information.
      • OS version: macOS 12.3.1
      • Java version: 1.8.0_331
      • Embulk version: 0.10.35

    This command specifies embulk_home and jruby path and just shows the embulk help message.

    java -jar /path/to/embulk/embulk-0.10.35.jar \
      -X embulk_home=/tmp/embulk_with_space \
      -X jruby file:///path/to/jruby/jruby-complete-9.3.4.0.jar
    

    This command output the following messages. It contains embulk_home is set by the location of embulk.properties found in: /Users/user/.embulk even I set embulk_home path explicitly. A user may confuse the path of embulk_home.

    Usage: embulk [common options] <command> [command options]
    
    Commands:
       run          Run a bulk load transaction.
       cleanup      Cleanup resume state.
       preview      Dry-run a bulk load transaction, and preview it.
       guess        Guess missing parameters to complete configuration.
       example      Create example files for a quick trial of Embulk.
       selfupdate   Upgrade Embulk to the specified version.
       gem          Run "gem" to install a RubyGem plugin.
       mkbundle     Create a new plugin bundle environment.
       bundle       Update a plugin bundle environment.
       new          Generate new plugin template
    
    Common options:
       -h, --help                Print help
       -version, --version       Show Embulk version
       -l, --log-level LEVEL     Set log level (error, warn, info, debug, trace)
           --log-path PATH       Output log messages to a file (default: -)
       -X KEY=VALUE              Set Embulk system properties
       -R OPTION                 Command-line option for JRuby. (Only '--dev')
    
    2022-04-22 10:50:11.566 +0900 [INFO] (main): embulk_home is set by the location of embulk.properties found in: /Users/user/.embulk
    2022-04-22 10:50:11.569 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /Users/user/.embulk/lib/m2/repository
    2022-04-22 10:50:11.569 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /Users/user/.embulk/lib/gems
    2022-04-22 10:50:11.569 +0900 [INFO] (main): gem_path is set empty.
    

    When I use some command (i.e: embulk gem install) the message outputs the expected path.

    opened by hiroyuki-sato 0
  • -X option with space (like -X embulk_home path/to/home) behavior.

    -X option with space (like -X embulk_home path/to/home) behavior.

    Embulk -X option use the equal character for key-value pair. It also works space instead of the =. But It's not expected behavior. dmikurube comment in twitter

    • Write the following environmental information.
      • OS version: macOS 12.3.1
      • Java version: 1.8.0_331
      • Embulk version: 0.10.35

    = and -X options. (Expected behavior)

    % java -jar /path/to/embulk/embulk-0.10.35.jar \
      -X embulk_home=/tmp/embulk_with_equal \
      -X jruby=file:///path/to/jruby/jruby-complete-9.3.4.0.jar gem install msgpack
    2022-04-22 09:04:27.578 +0900 [INFO] (main): embulk_home is set from command-line: /tmp/embulk_with_equal
    2022-04-22 09:04:27.582 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /tmp/embulk_with_equal/lib/m2/repository
    2022-04-22 09:04:27.583 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /tmp/embulk_with_equal/lib/gems
    2022-04-22 09:04:27.583 +0900 [INFO] (main): gem_path is set empty.
    2022-04-22 09:04:31.720 +0900 [INFO] (main): Environment variable "GEM_HOME" is not set. Setting "GEM_HOME" to "/tmp/embulk_with_equal/lib/gems" from Embulk system property "gem_home" for the "gem" command.
    Fetching msgpack-1.5.1-java.gem
    Successfully installed msgpack-1.5.1-java
    Parsing documentation for msgpack-1.5.1-java
    Installing ri documentation for msgpack-1.5.1-java
    Done installing documentation for msgpack after 3 seconds
    1 gem installed
    

    (space) and -X option (Unexpected behavior)

    % java -jar /path/to/embulk/embulk-0.10.35.jar \
      -X embulk_home /tmp/embulk_with_space \
      -X jruby file:///path/to/jruby/jruby-complete-9.3.4.0.jar gem install msgpack
    2022-04-22 09:05:17.730 +0900 [INFO] (main): embulk_home is set from command-line: /tmp/embulk_with_space
    2022-04-22 09:05:17.734 +0900 [INFO] (main): m2_repo is set as a sub directory of embulk_home: /tmp/embulk_with_space/lib/m2/repository
    2022-04-22 09:05:17.734 +0900 [INFO] (main): gem_home is set as a sub directory of embulk_home: /tmp/embulk_with_space/lib/gems
    2022-04-22 09:05:17.734 +0900 [INFO] (main): gem_path is set empty.
    2022-04-22 09:05:21.352 +0900 [INFO] (main): Environment variable "GEM_HOME" is not set. Setting "GEM_HOME" to "/tmp/embulk_with_space/lib/gems" from Embulk system property "gem_home" for the "gem" command.
    Fetching msgpack-1.5.1-java.gem
    Successfully installed msgpack-1.5.1-java
    Parsing documentation for msgpack-1.5.1-java
    Installing ri documentation for msgpack-1.5.1-java
    Done installing documentation for msgpack after 2 seconds
    1 gem installed
    

    confirm installed gem

    % cd /tmp/embulk_with_space/
    % fd msgpack
    lib/gems/cache/msgpack-1.5.1-java.gem
    lib/gems/doc/msgpack-1.5.1-java
    lib/gems/doc/msgpack-1.5.1-java/ri/Array/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Bignum/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/FalseClass/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Fixnum/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Float/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Hash/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Integer/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/Bigint/from_msgpack_ext-c.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/Bigint/to_msgpack_ext-c.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/CoreExt/to_msgpack-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/ExtensionValue/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/Timestamp/from_msgpack_ext-c.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/Timestamp/to_msgpack_ext-c.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/MessagePack/Timestamp/to_msgpack_ext-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/NilClass/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/String/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Symbol/from_msgpack_ext-c.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/Symbol/to_msgpack_with_packer-i.ri
    lib/gems/doc/msgpack-1.5.1-java/ri/TrueClass/to_msgpack_with_packer-i.ri
    lib/gems/gems/msgpack-1.5.1-java
    lib/gems/gems/msgpack-1.5.1-java/lib/msgpack
    lib/gems/gems/msgpack-1.5.1-java/lib/msgpack/msgpack.jar
    lib/gems/gems/msgpack-1.5.1-java/lib/msgpack.rb
    lib/gems/specifications/msgpack-1.5.1-java.gemspec
    
    opened by hiroyuki-sato 0
  • Eliminate `PluginSource`-based plugin lookup, do things more integrated

    Eliminate `PluginSource`-based plugin lookup, do things more integrated

    While Embulk used Guice to look through PluginSources, they needed to inherit PluginSource. But it has been making the error messages cryptic.

    We can make the plugin lookup mechanism more organized through the whole, and make error messages more understandable.

    opened by dmikurube 0
Releases(v0.10.41)
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.

Dex Dex : The data explorer is a powerful tool for data science. It is written in Groovy and Java on top of JavaFX and offers the ability to: Read in

Patrick Martin 1.3k Jan 8, 2023
A Parser tool which actually tries to convert XML data into JSON data

SpringBoot A Parser tool which actually tries to convert XML data into JSON data Tools Required Postman (Testing API's) IDE - Eclipse / NetBeans/ Inte

null 1 Jan 27, 2022
Utility for developers and QAs what helps minimize time wasting on writing the same data for testing over and over again. Made by Stfalcon

Stfalcon Fixturer A Utility for developers and QAs which helps minimize time wasting on writing the same data for testing over and over again. You can

Stfalcon LLC 31 Nov 29, 2021
🌏🎮 Integrate data provided from Minecraft server with Web API.

MCWebIntegration ?? ?? Integrate data provided from Minecraft server with Web API.

yude 2 Oct 14, 2021
Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs

Diff Utils library is an OpenSource library for performing the comparison / diff operations between texts or some kind of data: computing diffs, applying patches, generating unified diffs or parsing them, generating diff output for easy future displaying (like side-by-side view) and so on.

null 951 Jan 5, 2023
LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications

LightAdmin - [PoC] Pluggable CRUD UI library for Java web applications The primary goal of this PoC project is to speed up application development by

la-team 655 Dec 16, 2022
KC4Streams - a simple Java library that provides utility classes and standard implementations for most of the Kafka Streams pluggable interfaces

KC4Streams (which stands for Kafka Commons for Streams) is a simple Java library that provides utility classes and standard implementations for most of the Kafka Streams pluggable interfaces.

StreamThoughts 2 Mar 2, 2022
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

Apache Gobblin Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems. Ca

The Apache Software Foundation 2.1k Jan 4, 2023
An elegant Minecraft mod template for the Fabric mod loader

Fabric Example Mod Lorem ipsum dolor sit amet Example is a Minecraft mod that lorem ipsum dolor sit amet. Ut mi lectus, egestas a justo nec, hendrerit

Axieum 24 Dec 25, 2022
A mod loader such as Forge aimed at PVP clients and QoL mods

Feather This is a small project that I intend to work on in my free time. It is a mod loader similar to fabric or forge, but aimed at only making PVP

quickdaffy 19 Sep 3, 2022
something like a loader

lualoader lualoader is a loader for your pastes yes it is better than falcon (for a little time it isn't) if you found a dumper go fuck yourself creat

PlutoSolutions 14 Feb 4, 2022
SimpleFXLoader - Simple JavaFX Scene/Object hierarchy loader.

SimpleFXLoader Simple JavaFX Scene/Object hierarchy loader that can load dynamically some Controller Class once some annotations are used. This only w

Ryan Thomas Payne 2 Dec 30, 2021
The loader for mods under Fabric. It provides mod loading facilities and useful abstractions for other mods to use, which is compatible with spigot now

Silk The loader for mods under Fabric. It provides mod loading facilities and useful abstractions for other mods to use, which is compatible with spig

null 1 Oct 1, 2022
This repository holds the source code for TML (Tecknix Mod Loader)'s API.

This repository contains the modding API not the MDK (Mod Development Kit). This repository will not give you the ability to mod Tecknix Client but you can contribute to the repository if you have events you would like to add.

Tecknix Client 6 Aug 1, 2022
The universally-compatible ultra-light mod loader.

NilLoader NilLoader (ØL or 0L) is a minimal, easy-to-install, application-independent system for applying runtime patches to programs written in Java,

Una 21 Nov 29, 2022
lazy-language-loader improves loading times when changing your language by only reloading the language instead of all the game resources!

lazy-language-loader lazy-language-loader improves loading times when changing your language by only reloading the language instead of all the game re

Shalom Ademuwagun 7 Sep 7, 2022
A Minecraft Mod Loader built as a fun project.

BrassLoader What is BrassLoader? BrassLoader is The next generation of Minecraft Mod Loaders, It primarly loads mods made using our very own BrassAPI!

null 10 Aug 7, 2022
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 386 Jan 2, 2023
A scientific charting library focused on performance optimised real-time data visualisation at 25 Hz update rates for data sets with a few 10 thousand up to 5 million data points.

ChartFx ChartFx is a scientific charting library developed at GSI for FAIR with focus on performance optimised real-time data visualisation at 25 Hz u

GSI CS-CO/ACO 385 Dec 30, 2022