SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Overview

SeaTunnel

seatunnel logo

Backend Workflow


EN doc CN doc

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.


SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

  • Data loss and duplication
  • Task accumulation and delay
  • Low throughput
  • Long cycle to be applied in the production environment
  • Lack of application running status monitoring

SeaTunnel use scenarios

  • Mass data synchronization
  • Mass data integration
  • ETL with massive data
  • Mass data aggregation
  • Multi-source data processing

Features of SeaTunnel  

  • Easy to use, flexible configuration, low code development
  • Real-time streaming
  • Offline multi-source data analysis
  • High-performance, massive data processing capabilities
  • Modular and plug-in mechanism, easy to extend
  • Support data processing and aggregation by SQL
  • Support Spark structured streaming
  • Support Spark 2.x

Workflow of SeaTunnel

seatunnel-workflow.svg

Input[Data Source Input] -> Filter[Data Processing] -> Output[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient. Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel  

  • Input plugin Fake, File, Hdfs, Kafka, S3, Socket, self-developed Input plugin

  • Filter plugin Add, Checksum, Convert, Date, Drop, Grok, Json, Kv, Lowercase, Remove, Rename, Repartition, Replace, Sample, Split, Sql, Table, Truncate, Uppercase, Uuid, Self-developed Filter plugin

  • Output plugin Elasticsearch, File, Hdfs, Jdbc, Kafka, Mysql, S3, Stdout, self-developed Output plugin

Environmental dependency

  1. java runtime environment, java >= 8

  2. If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:

  • Spark on Yarn
  • Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark and Flink.

Downloads

Download address for run-directly software package :https://github.com/apache/incubator-seatunnel/releases

Quick start

Quick start: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/quick-start

Detailed documentation on SeaTunnel:https://interestinglab.github.io/seatunnel-docs/#/

Application practice cases

  • Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task monitoring for hundreds of real-time streaming computing tasks.

  • Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and maintenance for Sina News, CDN and other services, and write it into Clickhouse.

  • Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

  • Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill technical support, and well covers most offline and real-time tasks needs.

  • Yixia Technology, Yizhibo Data Platform

  • Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a new retail brand of Yonghui Yunchuang Technology.

  • Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data volume average daily, and later writing the data to Clickhouse.

  • Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.

For more use cases, please refer to: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/case_study/

Code of conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please follow the REPORTING GUIDELINES to report unacceptable behavior.

Developer

Thanks to all developers https://github.com/apache/incubator-seatunnel/graphs/contributors

Contact Us

Comments
  • [Umbrella][Connector] New SeaTunnel API Connectors

    [Umbrella][Connector] New SeaTunnel API Connectors

    Please Move To https://github.com/apache/incubator-seatunnel/issues/3018

    | No | Connector | priority | difficulty | Source/Sink | Contributer | Status | Issue/PR | |-----|-----------------------------------------------------------------------------------------------------------------------------------------|----------|------------|-------------|------------------|--------|----------| | 1 | Console | high | low | Sink | @ruanwenjun | Done | #1864 | | 2 | Fake | high | low | Source | @ruanwenjun | Done | #1864 | | 3 | Doris | high | high | Source | @2013650523 | | #2536 | | 4 | Doris | high | high | Sink | @hk-lrzy | | #2586 | | 5 | Druid | high | middle | Source | @guanboo | | #2937 | | 6 | Druid | high | middle | Sink | @guanboo | | #2937 | | 7 | ElasticSearch | high | high | Source | @iture123 | | #2821 | | 8 | ElasticSearch | high | high | Sink | @iture123 | Done | #2330 | | 9 | ClickHouse | high | high | Source | @Hisoka-X | Done | #2051 | | 10 | ClickHouse | high | high | Sink | @Hisoka-X | Done | #2051 | | 11 | JDBC | high | high | Source | @ic4y | Done | #2048 | | 12 | JDBC | high | high | Sink | @ic4y | Done | #2048 | | 13 | FeiShu | low | low | Source | @TyrantLucifer | | | | 14 | FeiShu | low | low | Sink | @TyrantLucifer | Done | | | 15 | File(Local) | high | middle | Source | @TyrantLucifer | Done | | | 16 | File(Local) | high | middle | Sink | @EricJoy2048 | Done | #2117 | | 17 | File(S3) | high | middle | Source | @TyrantLucifer | | | | 18 | File(S3) | high | middle | Sink | @TyrantLucifer | | | | 19 | File(HDFS) | high | high | Source | @TyrantLucifer | Done | | | 20 | File(HDFS) | high | high | Sink | @EricJoy2048 | Done | | | 21 | File(OSS) | high | middle | Source | @TyrantLucifer | Done | | | 22 | File(OSS) | high | middle | Sink | @TyrantLucifer | Done | | | 23 | Hudi | high | high | Source | @Emor-nj | Done | #2147 | | 24 | Hudi | high | high | Sink | @Emor-nj | | | | 25 | Iceberg | high | middle | Source | @hailin0 | Done | #2615 | | 26 | Iceberg | high | middle | Sink | @s7monk | | | | 27 | Kafka | high | middle | Source | @Hisoka-X | Done | | | 28 | Kafka | high | middle | Sink | @ruanwenjun | Done | | | 29 | Kudu | high | high | Source | @2013650523 | Done | #2254 | | 30 | Kudu | high | high | Sink | @2013650523 | Done | #2254 | | 31 | MongoDB | high | middle | Source | @wuchunfu | Done | | | 32 | MongoDB | high | middle | Sink | @wuchunfu | Done | | | 33 | Neo4j | middle | middle | Source | @getChan | | #2777 | | 34 | Neo4j | middle | middle | Sink | @getChan | Done | #2434 | | 35 | Phoenix | middle | middle | Source | @531651225 | Done | #2499 | | 36 | Phoenix | middle | middle | Sink | @531651225 | Done | #2499 | | 37 | Redis | high | middle | Source | @TyrantLucifer | Done | | | 38 | Redis | high | middle | Sink | @TyrantLucifer | Done | | | 39 | Socket | high | low | Source | @zhuangchong | Done | #1999 | | 40 | Socket | high | low | Sink | @531651225 | Done | #2549 | | 41 | [JDBC]Tidb | high | low | Source | @xbkaishui | Abandonment | | | 42 | [JDBC]Tidb | high | low | Sink | @xbkaishui | Abandonment | | | 43 | Webhook | high | middle | Sink | @TyrantLucifer | Done | #2348 | | 44 | InfluxDB | middle | middle | Source | @531651225 | | #2697 | | 45 | InfluxDB | middle | middle | Sink | @Jellal-HT | | | | 46 | Pulsar | high | middle | Source | @ashulin | Done | #1984 | | 47 | Pulsar | high | middle | Sink | @FlechazoW | | | | 48 | Email | middle | low | Sink | @2013650523 | Done | #2304 | | 49 | Assert | low | low | Sink | @lhyundeadsoul | Done | | | 50 | Redshift | high | high | Source | | | | | 51 | Facebook Marketing | middle | middle | Source | | | | | 52 | HubSpot | middle | middle | Source | | | | | 53 | Instagram | middle | middle | Source | | | | | 54 | Bing ADs | middle | middle | Source | | | | | 55 | Google Analytics | middle | middle | Source | @MRYOG | | | | 56 | Intercom | middle | middle | Source | | | | | 57 | Zendesk | middle | middle | Source | | | | | 58 | TikTok Marketing | middle | middle | Source | | | | | 59 | Salesforce | middle | middle | Source | | | | | 60 | LinkedIn Ads | middle | middle | Source | | | | | 61 | Stripe | middle | middle | Source | | | | | 62 | Sentry | middle | middle | Sink | @Saintyang | done | #2244 | | 63 | DingTalk | low | low | Source | @MRYOG | | #2757 | | 63 | DingTalk | low | low | Sink | @MRYOG | Done | #2257 | | 64 | IoTDB | high | high | Source | @CalvinKirs | Done | | | 65 | IoTDB | high | high | Sink | @hailin0 | Done | | | 66 | TD-engine | middle | middle | Source | @lhyundeadsoul | | #2707 | | 67 | TD-engine | middle | middle | Sink | @lhyundeadsoul | | | | 68 | HBase | high | high | Source | @zhuzhengjun01 | | | | 69 | HBase | high | high | Sink | @zhuzhengjun01 | | | | 70 | Hive | high | high | Source | @CalvinKirs | Done | | | 71 | Hive | high | high | Sink | @EricJoy2048 | Done | | | 72 | HTTP | high | middle | Source | @zhuangchong | Done | #2012 | | 72 | StarRocks | high | high | Sink | @tonyDong-code | | | | 73 | StarRocks | high | high | Source | @wangw9420 | | | | 74 | ADB PostgreSQL | high | low | Sink | @etcZYP | | | | 75 | Greenplum | high | high | Source | @hailin0 | Done | #2429 | | 76 | Greenplum | high | high | Sink | @hailin0 | Done | #2429 | | 77 | OceanBase | middle | high | Sink | @silenceland | | | | 78 | OceanBase | middle | high | Source | @silenceland | | | | 79 | DB2 | high | high | Sink | @laglangyue | | #2410 | | 80 | DB2 | high | high | Source | @laglangyue | | #2410 | | 81 | BigSource | middle | middle | Source | | | | | 82 | Github | low | low | Source | @MonsterChenzhuo | | | | 83 | Enterprise WeChat | low | low | Sink | @531651225 | Done | #2412 | | 84 | Slack | middle | low | Sink | @Charlie17Li | | | | 85 | Databricks Lakehouse | high | high | Sink | | | | | 86 | Snowflake | middle | high | Sink | | | | | 87 | Snowflake | middle | high | Source | | | | | 88 | [Jdbc]Sql-Server | middle | low | Sink | @liugddx | Done | #2646 | | 89 | [Jdbc]Sql-Server | middle | low | Source | @liugddx | Done | #2646 | | 90 | [Jdbc]Oracle | high | low | Sink | @liugddx | | #2550 | | 91 | [Jdbc]Oracle | high | low | Source | @liugddx | | #2550 | | 92 | [JDBC]Rds | high | low | Sink | @s7monk | | #2829 | | 93 | [JDBC]Rds | high | low | Source | @s7monk | | #2829 | | 94 | [JDBC]SqlLite | middle | low | Sink | @maruko-code | | | | 95 | [JDBC]SqlLite | middle | low | Source | @Caribbeanz | | | | 96 | [JDBC]DM(达梦) | middle | low | Source | @laglangyue | done | #2377 | | 97 | [JDBC]DM(达梦) | middle | low | Sink | @laglangyue | done | #2377 | | 98 | Cassandra | middle | middle | Sink | @bigdataf | | | | 99 | Cassandra | middle | middle | Source | @bigdataf | | | | 100 | [File]excel | middle | low | Sink | @Bingz2 | | #2585 | | 101 | [File]excel | middle | low | Source | @MonsterChenzhuo | | | | 102 | [File]JSON | low | low | Sink | @hailin0 | Done | | | 103 | [File]JSON | low | low | Source | @TyrantLucifer | Done | | | 104 | MaxCompute | middle | middle | Source | @longer-jl | | | | 105 | MaxCompute | middle | middle | Sink | @longer-jl | | | | 106 | TDSql | middle | middle | Source | @dzzxjl | | | | 107 | OpenMLDB | middle | high | Source | @TyrantLucifer | | | | 107 | OpenMLDB | middle | high | Sink | @Dlimeng | | | | 108 | Ftp | middle | middle | Sink | @chessplay | done | #2774 | | 109 | Ftp | middle | middle | Source | @guanboo | done | #2774 | | 110 | GaussDB | middle | middle | Source | @Builder34 | | | | 111 | GaussDB | middle | middle | Sink | @Builder34 | | | | 110 | Teradata | middle | middle | Source | | | | | 111 | Teradata | middle | middle | Sink | | | | | 112 | SFTP | middle | middle | Sink | @TyrantLucifer | | | | 113 | SFTP | middle | middle | Source | @TyrantLucifer | | | | 114 | DataHub | middle | middle | Source | @selectbook | | | | 115 | DataHub | middle | middle | Sink | @chessplay | done | | | 116 | SAP HANA | middle | middle | Source | | | | | 117 | SAP HANA | middle | middle | Sink | | | | | 118 | Flink Table Store | high | high | Sink | @iture123 | | | | 119 | Flink Table Store | high | high | Source | @zhaomin1423 | | | | 120 | Vertica | middle | middle | Source | | | | | 121 | Vertica | middle | middle | Sink | | | | | 122 | Kylin | middle | middle | Source | @531651225 | | | | 123 | Kylin | middle | middle | Sink | @TaoZex | | | | 124 | Neocrm | middle | middle | Source | | | | | 125 | TiDB | middle | middle | Source | @Xuxiaotuan | | #2830 | | 126 | TiDB | middle | middle | Sink | @Xuxiaotuan | | #2830 | | 127 | Sentry | middle | middle | Source | | | | | 128 | PolarDB | middle | middle | Source | | | | | 129 | PolarDB | middle | middle | Sink | | | | | 130 | PolarDB-X | middle | middle | Source | | | | | 131 | PolarDB-X | middle | middle | Sink | | | | | 132 | AnalyticDB | middle | middle | Sink | | | | | 133 | TDSQL | middle | middle | Sink | | | | | 134 | SequoiaDB | middle | middle | Sink | | | | | 135 | TcaplusDB | middle | middle | Source | | | | | 136 | TcaplusDB | middle | middle | Sink | | | | | 137 | GoldenDB | middle | middle | Source | | | | | 138 | GoldenDB | middle | middle | Sink | | | | | 139 | AntDB | middle | middle | Source | | | | | 140 | AntDB | middle | middle | Sink | | | | | 141 | OushuDB | middle | middle | Sink | | | | | 142 | SUNDB | middle | middle | Source | | | | | 143 | SUNDB | middle | middle | Sink | | | | | 144 | UXDB | middle | middle | Source | | | | | 145 | UXDB | middle | middle | Sink | | | | | 146 | DolphinDB | middle | middle | Source | | | | | 147 | DolphinDB | middle | middle | Sink | | | | | 148 | RapidsDB | middle | middle | Source | | | | | 149 | RapidsDB | middle | middle | Sink | | | | | 150 | GreatDB | middle | middle | Source | | | | | 151 | GreatDB | middle | middle | Sink | | | | | 152 | CirroData | middle | middle | Sink | | | | | 153 | Nebula | middle | middle | Sink | | | | | 154 | Gbase 8a | middle | middle | Source | | | | | 155 | KunlunDB | middle | middle | Sink | | | | | 156 | Percona | middle | middle | Source | | | | | 157 | Percona | middle | middle | Sink | | | | | 158 | Splunk | middle | middle | Source | | | | | 159 | Splunk | middle | middle | Sink | | | | | 160 | Amazon DynamoDB | middle | middle | Source | | | | | 161 | Amazon DynamoDB | middle | middle | Sink | | | | | 162 | Microsoft Azure SQL Database | middle | middle | Source | | | | | 163 | Microsoft Azure SQL Database | middle | middle | Sink | | | | | 164 | Neo5j | middle | middle | Source | | | | | 165 | Neo5j | middle | middle | Sink | | | | | 166 | Solr | middle | middle | Sink | | | | | 167 | BigQuery | middle | middle | Source | | | | | 168 | BigQuery | middle | middle | Sink | | | | | 169 | SAP Adaptive Server | middle | middle | Source | | | | | 170 | SAP Adaptive Server | middle | middle | Sink | | | | | 171 | Microsoft Azure Cosmos DB | middle | middle | Source | | | | | 172 | Microsoft Azure Cosmos DB | middle | middle | Sink | | | | | 173 | PostGIS | middle | middle | Source | | | | | 174 | PostGIS | middle | middle | Sink | | | | | 175 | Couchbase | middle | middle | Sink | | | | | 176 | Vika | middle | middle | Sink | | | | | 177 | Gitlab | low | low | Source | | | |

    help wanted good first issue volunteer wanted API-refactor connectors-v2 
    opened by Hisoka-X 194
  • [Connector-V2][JDBC-connector] support Jdbc dm

    [Connector-V2][JDBC-connector] support Jdbc dm

    Purpose of this pull request

    Check list

    • [x] Code changed are covered with tests, or it does not need tests for reason:
    • [x] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    connectors-v2 
    opened by laglangyue 25
  • [DISCUSS][metrics] Support metrics statistics

    [DISCUSS][metrics] Support metrics statistics

    Search before asking

    • [X] I had searched in the feature and found no similar feature requirement.

    Description

    Support metrics statistics when transmitting data.

    Are you willing to submit a PR?

    • [ ] Yes I am willing to submit a PR!

    Code of Conduct

    discuss 
    opened by leo65535 24
  • FAQ

    FAQ

    FAQ 1. Waterdrop开发者自己开发插件时,是否需要了解Waterdrop代码,是否需要把代码写到Waterdrop项目里?

    开发者开发的插件,与waterdrop项目可以完全没有关系,不需要把你的插件代码写到waterdrop项目里面。 插件可以是一个完全独立的项目,在里面你用java,scala,maven,sbt,gradle,都随你。 这也是我们建议开发者开发插件的方式。


    FAQ 2. 以集群模式(cluster)运行waterdrop,提示找不到:plugins.tar.gz

    使用cluster模式提交前,需要您先执行如下命令:

    # 备注: 预计下一个版本发布时,v1.2.3 我们会支持插件目录自动打包,无需再执行此命令。
    tar zcvf plugins.tar.gz plugins
    

    将插件目录打包后,执行(之后如果您的plugins目录没有添加或删除插件,则不需要再次打包了)

    ./bin/start-waterdrop.sh --master yarn --deploy-mode cluster --config ./config/first.conf
    

    如有其他需要,请加微信garyelephant 为您服务。


    FAQ 3. Waterdrop启动后报错如下:

    ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3

    你的问题是jar包依赖冲突了,可以下载一下最新版本试试,应该没事了:

    https://github.com/InterestingLab/waterdrop/releases/download/v1.2.3/waterdrop-1.2.3.zip


    FAQ 4. 我想学习Waterdrop 源码,从哪里开始呢?

    Waterdrop 拥有完全抽象化,结构化的代码实现,已经有很多人选择将Waterdrop的源码作为学习Spark的方式,你可以从主程序入口开始学习源码:Waterdrop.scala


    FAQ 5. Waterdrop 是否支持动态的变量替换,比如我想在定时任务中替换sql中的where条件?

    没问题,都支持,具体配置例子,请见 用${varname} 做变量替换的配置示例。


    FAQ 6. Waterdrop 是否支持在Azkaban, Oozie 这些任务调度框架中运行呢?

    当然可以,请见下面的截图:


    FAQ 7. 使用Waterdrop时遇到问题,我自己解决不了,我应该怎么办?

    请进入项目主页,找到项目负责人的微信号,加他微信。


    FAQ 8. Waterdrop 中如何在配置中指定变量,之后在运行时,动态指定变量的值?

    Waterdrop 从v1.2.4开始,支持在配置中指定变量,此功能常用于做定时或非定时的离线处理时,替换时间、日期等变量,用法如下:

    在配置中,配置变量名称,比如:

    ...
    
    filter {
      sql {
        table_name = "user_view"
        sql = "select * from user_view where city ='"${city}"' and dt = '"${date}"'"
      }
    }
    
    ...
    

    这里只是以sql filter举例,实际上,配置文件中任意位置的key = value中的value,都可以使用变量替换功能。

    详细配置示例,请见variable substitution

    启动命令如下:

    # local  模式
    ./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m local[2] -i city=shanghai -i date=20190319
    
    # yarn client 模式
    ./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m yarn -i city=shanghai -i date=20190319
    
    # yarn cluster 模式
    ./bin/start-waterdrop.sh -c ./config/your_app.conf -e cluster -m yarn -i city=shanghai -i date=20190319
    
    # mesos, spark standalone  启动方式相同。
    

    可以用参数 -i 或者 --variable 后面指定 key=value来指定变量的值,其中key 需要与配置中的变量名相同。


    FAQ 9. Waterdrop消费Kafka出现OOM怎么解决?

    多数情况,OOM是由消费没有限速导致的,解决方法如下:

    image

    详见:https://www.processon.com/view/link/5c9862ece4b0c996d36fe7d7


    document 
    opened by garyelephant 24
  • [Improve] [Connector-V2] File Connector add lzo compression way.

    [Improve] [Connector-V2] File Connector add lzo compression way.

    Purpose of this pull request

    Check list

    • [ ] Code changed are covered with tests, or it does not need tests for reason:
    • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    • [ ] If you are contributing the connector code, please check that the following files are updated:
      1. Update change log that in connector document. For more details you can refer to connector-v2
      2. Update plugin-mapping.properties and add new connector information in it
      3. Update the pom file of seatunnel-dist
    improve First-time contributor connectors-v2 Waiting for code update approved reviewed 
    opened by lightzhao 21
  • [Feature][Connector] Split connector jar from release core jar

    [Feature][Connector] Split connector jar from release core jar

    Search before asking

    • [X] I had searched in the feature and found no similar feature requirement.

    Description

    Now all the connector jar in the binary distribution package of Seatunnel are packaged into one jar file: core . This makes it impossible for us to implement multi-version support for the same component.

    Now

    .
    apache-seatunnel
    | - - lib
          | - - seatunnel-core-spark.jar
          | - - seatunnel-core-flink.jar
    | - - plugins
    | - - config
    | - - bin
    

    After (Example for mulit-version Elasticsearch 6.x and 7.x)

    .
    apache-seatunnel
    | - - lib
          | - - seatunnel-core-spark.jar
          | - - seatunnel-core-flink.jar
    | - - connectors
          | - - flink
                | - - seatunnel-connector-flink-elasticsearch7.jar
                | - - seatunnel-connector-flink-kafka0.10.jar
                | - - other-all-lasted-version-connecotr-single.jar
          | - - spark
    | - - opt
          | - - flink
                | - - seatunnel-connector-flink-elasticsearch6.jar
                | - - seatunnel-connector-flink-kafka0.08.jar
                | - - other-all-older-version-connecotr-single.jar
          | - - spark            
    | - - plugins
    | - - config
    | - - bin
    

    After this finish, user can use differcult version connecotor on one release version, just need move jar from opt folder to connectors folder.

    WorkFlow

    image

    Engine Implement Method

    Flink

    • Flink use PipelineOptions.JARS and PipelineOptions.CLASSPATH to upload connector jars and make it can be loaded.

    Spark

    • Spark use spark.jars properties to support connector jar execute on cluster.

    Usage Scenario

    No response

    Related issues

    No response

    Are you willing to submit a PR?

    • [X] Yes I am willing to submit a PR!

    Code of Conduct

    discuss 
    opened by Hisoka-X 21
  • [Feature][seatunnel-examples] flink  local environment run quickly and debug locally developed code easily

    [Feature][seatunnel-examples] flink local environment run quickly and debug locally developed code easily

    this closes #955

    Purpose of this pull request

    Check list

    • [x] Code changed are covered with tests, or it does not need tests for reason:
    • [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    improve 
    opened by felix-thinkingdata 21
  • [Feature][API] SeaTunnel Transform API

    [Feature][API] SeaTunnel Transform API

    Search before asking

    • [X] I had searched in the feature and found no similar feature requirement.

    Description

    We already have SeaTunnel Source API and SeaTunnel Sink API, but we don't have SeaTunnel Transform API now. We need SeaTunnel Transform API and it must have some key features:

    • Like source and sink, it is decoupled from the engine and can run on different engines.
    • In order to ensure seatunnel's positioning as a data integration platform and not introduce work beyond the plan, the SeaTunnel Transform API will only support UDF level data conversion.
    • In theory, UDF level transform does not require checkpoint and state storage.

    Usage Scenario

    No response

    Related issues

    No response

    Are you willing to submit a PR?

    • [ ] Yes I am willing to submit a PR!

    Code of Conduct

    stale 
    opened by EricJoy2048 18
  • [Feature][Connector-V2] add sqlserver connector

    [Feature][Connector-V2] add sqlserver connector

    Purpose of this pull request

    support sqlserver connector

    Check list

    • [x] Code changed are covered with tests, or it does not need tests for reason:
    • [x] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    connectors-v2 
    opened by liugddx 18
  • [Feature] [config] Fix dependency conflict in seatunnel config when running SeatunnelFlink in idea with local mode#1186

    [Feature] [config] Fix dependency conflict in seatunnel config when running SeatunnelFlink in idea with local mode#1186

    Purpose of this pull request

    This pull request fix the #1186 bug.It add a seatunnel-config-shade module and rename package name of code in seatunnel-config to avoid code conflict when running SeatunnelFlink in Intellij Idea with local mode.

    Check list

    • [x] Code changed are covered with tests, or it does not need tests for reason:
    • [ ] If any new Jar binary package adding in you PR, please add License Notice according New License Guide
    • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    improve First-time contributor 
    opened by Yves-yuan 18
  • [Discuss][HTTP Connector] Add specified field function for all HTTP connector

    [Discuss][HTTP Connector] Add specified field function for all HTTP connector

    Search before asking

    • [X] I had searched in the feature and found no similar feature requirement.

    Description

    So far, some http requests return data that cannot be parsed, such as array nested type data.

    642a62a1f1a1b9eadc9b0b3952e187c

    We need to implement the function to specify a field, such as users in the figure above, so that we can configure the schema for users.

    Usage Scenario

    No response

    Related issues

    No response

    Are you willing to submit a PR?

    • [X] Yes I am willing to submit a PR!

    Code of Conduct

    opened by TaoZex 17
  • [Feature][Connector-V2] Support kerberos in hive and hdfs file connector

    [Feature][Connector-V2] Support kerberos in hive and hdfs file connector

    Purpose of this pull request

    Support kerberos in hive and hdfs file connector

    close #3327

    Check list

    • [ ] Code changed are covered with tests, or it does not need tests for reason:
    • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    • [ ] If you are contributing the connector code, please check that the following files are updated:
      1. Update change log that in connector document. For more details you can refer to connector-v2
      2. Update plugin-mapping.properties and add new connector information in it
      3. Update the pom file of seatunnel-dist
    opened by TyrantLucifer 0
  • [Feature][Connector-V2][File] Support compress codec

    [Feature][Connector-V2][File] Support compress codec

    Search before asking

    • [X] I had searched in the feature and found no similar feature requirement.

    Description

    As we know, connector v2 file not support setting compress codec, this feature is important.

    Usage Scenario

    No response

    Related issues

    No response

    Are you willing to submit a PR?

    • [X] Yes I am willing to submit a PR!

    Code of Conduct

    opened by TyrantLucifer 0
  • [Improve][CDC Base] Guaranteed to be exactly-once in the process of switching from SnapshotTask to IncrementalTask

    [Improve][CDC Base] Guaranteed to be exactly-once in the process of switching from SnapshotTask to IncrementalTask

    Guaranteed to be exactly-once in the process of switching from SnapshotTask to IncrementalTask.

    Purpose of this pull request

    Check list

    • [x] Code changed are covered with tests, or it does not need tests for reason:
    • [x] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    • [x] If you are contributing the connector code, please check that the following files are updated:
      1. Update change log that in connector document. For more details you can refer to connector-v2
      2. Update plugin-mapping.properties and add new connector information in it
      3. Update the pom file of seatunnel-dist
    opened by ic4y 0
  • [Bug][KafkaSource]KafkaConsumer is not close.

    [Bug][KafkaSource]KafkaConsumer is not close.

    Fix the problem that KafkaConsumer of KafkaSource is not closed and the resource is not released.

    Purpose of this pull request

    Check list

    • [ ] Code changed are covered with tests, or it does not need tests for reason:
    • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    • [ ] If you are contributing the connector code, please check that the following files are updated:
      1. Update change log that in connector document. For more details you can refer to connector-v2
      2. Update plugin-mapping.properties and add new connector information in it
      3. Update the pom file of seatunnel-dist
    opened by lightzhao 1
  • [Improve] [Seatunnel-Engine] remove `seatunnel-api` from engine storage.

    [Improve] [Seatunnel-Engine] remove `seatunnel-api` from engine storage.

    Purpose of this pull request

    Check list

    • [ ] Code changed are covered with tests, or it does not need tests for reason:
    • [ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide
    • [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
    • [ ] If you are contributing the connector code, please check that the following files are updated:
      1. Update change log that in connector document. For more details you can refer to connector-v2
      2. Update plugin-mapping.properties and add new connector information in it
      3. Update the pom file of seatunnel-dist
    opened by liugddx 1
  • Timestamp fields are not supported

    Timestamp fields are not supported

    Search before asking

    • [X] I had searched in the issues and found no similar issues.

    What happened

    sink to localfile, timestamp field stored in parquet format error

    SeaTunnel Version

    2.2.0-beta

    SeaTunnel Config

    sink {
    LocalFile {
        path="file:///opt/sink_file/test_3"
        file_name_expression="${transactionId}_${now}"
        file_format="parquet"
        filename_time_format="yyyy.MM.dd"
        is_enable_transaction=true
        sink_columns = ["file_id","file_name","file_size","mail_suject","mail_body","mail_create_time"]
    }
    }
    

    Running Command

    ./bin/start-seatunnel-flink-connector-v2.sh --config config/example_1.conf
    

    Error Exception

    org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Flink job executed failed
            at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
            at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
            at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
            at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
            at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
            at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
            at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
            at org.apache.flink.client.cli.CliFrontend$$Lambda$75/270095066.call(Unknown Source)
            at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
            at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
    Caused by: org.apache.seatunnel.core.starter.exception.CommandExecuteException: Flink job executed failed
            at org.apache.seatunnel.core.starter.flink.command.FlinkApiTaskExecuteCommand.execute(FlinkApiTaskExecuteCommand.java:57)
            at org.apache.seatunnel.core.starter.Seatunnel.run(Seatunnel.java:40)
            at org.apache.seatunnel.core.starter.flink.SeatunnelFlink.main(SeatunnelFlink.java:34)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
            ... 9 more
    Caused by: org.apache.seatunnel.core.starter.exception.TaskExecuteException: Execute Flink job error
            at org.apache.seatunnel.core.starter.flink.execution.FlinkExecution.execute(FlinkExecution.java:75)
            at org.apache.seatunnel.core.starter.flink.command.FlinkApiTaskExecuteCommand.execute(FlinkApiTaskExecuteCommand.java:55)
            ... 16 more
    Caused by: java.util.concurrent.ExecutionException: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: a712d589a49a6bf3743b96f29e38342f)
            at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
            at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1887)
            at org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:123)
            at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:80)
            at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1833)
            at org.apache.seatunnel.core.starter.flink.execution.FlinkExecution.execute(FlinkExecution.java:73)
            ... 17 more
    Caused by: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: a712d589a49a6bf3743b96f29e38342f)
            at org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:125)
            at org.apache.flink.client.deployment.ClusterClientJobClientAdapter$$Lambda$363/1969969319.apply(Unknown Source)
            at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
            at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
            at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
            at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1954)
            at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
            at org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$340/474497082.accept(Unknown Source)
            at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
            at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
            at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
            at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1954)
            at org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$24(RestClusterClient.java:670)
            at org.apache.flink.client.program.rest.RestClusterClient$$Lambda$361/186075763.accept(Unknown Source)
            at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
            at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
            at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
            at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1954)
            at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
            at org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$340/474497082.accept(Unknown Source)
            at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
            at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
            at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
            at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
            at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
            at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
            at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
            at org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:123)
            ... 28 more
    Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
            at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
            at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
            at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:216)
            at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:206)
            at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:197)
            at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:682)
            at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
            at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:435)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:497)
            at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305)
            at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212)
            at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
            at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
            at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$104/1338135026.apply(Unknown Source)
            at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
            at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
            at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
            at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
            at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
            at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
            at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
            at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
            at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
            at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
            at akka.actor.ActorCell.invoke(ActorCell.scala:561)
            at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
            at akka.dispatch.Mailbox.run(Mailbox.scala:225)
            at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
            at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
            at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
            at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
            at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.lang.RuntimeException: Write data error, please check
            at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:62)
            at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:32)
            at org.apache.seatunnel.translation.flink.sink.FlinkSinkWriter.write(FlinkSinkWriter.java:51)
            at org.apache.flink.streaming.runtime.operators.sink.AbstractSinkWriterOperator.processElement(AbstractSinkWriterOperator.java:80)
            at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
            at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
            at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
            at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
            at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
            at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:317)
            at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:411)
            at org.apache.seatunnel.translation.flink.source.RowCollector.collect(RowCollector.java:45)
            at org.apache.seatunnel.translation.flink.source.RowCollector.collect(RowCollector.java:30)
            at org.apache.seatunnel.connectors.seatunnel.jdbc.source.JdbcSourceReader.pollNext(JdbcSourceReader.java:66)
            at org.apache.seatunnel.translation.source.ParallelSource.run(ParallelSource.java:125)
            at org.apache.seatunnel.translation.flink.source.BaseSeaTunnelSourceFunction.run(BaseSeaTunnelSourceFunction.java:83)
            at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104)
            at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60)
            at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)
    Caused by: java.lang.NullPointerException
            at org.apache.seatunnel.shade.connector.file.org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:114)
            at org.apache.seatunnel.shade.connector.file.org.apache.avro.generic.GenericRecordBuilder.set(GenericRecordBuilder.java:104)
            at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.ParquetWriteStrategy.lambda$write$0(ParquetWriteStrategy.java:57)
            at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.ParquetWriteStrategy$$Lambda$429/483743484.accept(Unknown Source)
            at java.util.ArrayList.forEach(ArrayList.java:1249)
            at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.ParquetWriteStrategy.write(ParquetWriteStrategy.java:57)
            at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:60)
            ... 18 more
    

    Flink or Spark Version

    flink-1.13.6

    Java or Scala Version

    No response

    Screenshots

    No response

    Are you willing to submit PR?

    • [X] Yes I am willing to submit a PR!

    Code of Conduct

    bug 
    opened by zhangyide9494 0
  • Releases(2.3.0)
    • 2.3.0(Dec 30, 2022)

      2.3.0-release

      Bug fix

      Core

      • [Core] [Starter] Fix the bug of ST log print failed in some jdk versions #3160
      • [Core] [Shell] Fix bug that shell script about downloading plugins does not work #3462

      Connector-V2

      • [Connector-V2] [Jdbc] Fix the bug that jdbc source can not be stopped in batch mode #3220
      • [Connector-V2] [Jdbc] Fix the bug that jdbc connector reset in jdbc connector #3670
      • [Connector-V2] [Jdbc] Fix the bug that jdbc connector exactly-once it will throw NullPointerException #3730
      • [Connector-V2] [Hive] Fix the following bugs of hive connector: 1. write parquet NullPointerException 2. when restore write from states getting error file path #3258
      • [Connector-V2] [File] Fix the bug that when getting file system throw NullPointerException #3506
      • [Connector-V2] [File] Fix the bug that when user does not config the fileNameExpression it will throw NullPointerException #3706
      • [Connector-V2] [Hudi] Fix the bug that the split owner of Hudi connector may be negative #3184

      ST-Engine

      • [ST-Engine] Fix bug data file name will duplicate when use SeaTunnel Engine #3717
      • [ST-Engine] Fix job restart of all nodes down #3722
      • [ST-Engine] Fix the bug that checkpoint stuck in ST-Engine #3213
      • [ST-Engine] Fix the bug that checkpoint failed in ST-Engine #3769

      E2E

      • [E2E] [Spark] Corrected spark version in e2e container #3225

      Improve

      Core

      • [Core] [Starter] [Flink] Upgrade the method of loading extra jars in flink starter #2982
      • [Core] [Pom] [Package] Optimize package process #3751

      Connector-V1

      • [Connector-V1] Remove connector v1 related codes from dev branch #3450

      Connector-V2

      • [Connector-V2] Add split templates for all connectors #3335
      • [Connector-V2] [Redis] Support redis cluster mode & user authentication #3188
      • [Connector-V2] [Clickhouse] Support nest type and array type in clickhouse connector #3047
      • [Connector-V2] [Clickhouse] Support geo type in clickhouse connector #3141
      • [Connector-V2] [Clickhouse] Improve double convert that in clickhouse connector #3441
      • [Connector-V2] [Clickhouse] Improve float long convert that in clickhouse connector #3471
      • [Connector-V2] [Kafka] Support setting read start offset or message time in kafka connector #3157
      • [Connector-V2] [Kafka] Support specify multiple partition keys in kafka connector #3230
      • [Connector-V2] [Kafka] Support dynamic discover topic & partition in kafka connector #3125
      • [Connector-V2] [Kafka] Support text format for kafka connector #3711
      • [Connector-V2] [IotDB] Add the parameter check logic for iotDB sink connector #3412
      • [Connector-V2] [Jdbc] Support setting fetch size in jdbc connector #3478
      • [Connector-V2] [Jdbc] Support upsert config in jdbc connector #3708
      • [Connector-V2] [Jdbc] Optimize the commit process of jdbc connector #3451
      • [Connector-V2] [Jdbc] Release jdbc resource when after using #3358
      • [Connector-V2] [Oracle] Improve data type mapping of Oracle connector #3486
      • [Connector-V2] [Http] Support extract complex json string in http connector #3510
      • [Connector-V2] [File] [S3] Support s3a protocol in S3 file connector #3632
      • [Connector-V2] [File] [HDFS] Support setting hdfs-site.xml #3778
      • [Connector-V2] [File] Support file split in file connectors #3625
      • [Connector-V2] [CDC] Support write cdc changelog event in elsticsearch sink connector #3673
      • [Connector-V2] [CDC] Support write cdc changelog event in clickhouse sink connector #3653
      • [Conncetor-V2] [CDC] Support write cdc changelog event in jdbc connector #3444

      ST-Engine

      • [ST-Engine] Improve statistic information print format that in ST-Engine #3492
      • [ST-Engine] Improve ST-Engine performance #3216
      • [ST-Engine] Support user-defined jvm parameters in ST-Engine #3307

      CI

      • [CI] Improve CI process #3179 #3194

      E2E

      • [E2E] [Flink] Support execute extra commands on task-manager container #3224
      • [E2E] [Jdbc] Increased Jdbc e2e stability #3234

      Feature

      Core

      • [Core] [Log] Integrate slf4j and log4j2 for unified management logs #3025
      • [Core] [Connector-V2] [Exception] Unified exception API & Unified connector error tip message #3045
      • [Core] [Shade] [Hadoop] Add hadoop shade package for SeaTunnel #3755

      Connector-V2

      • [Connector-V2] [Elasticsearch] Add elasticsearch source connector #2821
      • [Connector-V2] [AmazondynamoDB] Add AmazondynamoDB source & sink connector #3166
      • [Connector-V2] [StarRocks] Add StarRocks sink connector #3164
      • [Connector-V2] [DB2] Add DB2 source & sink connector #2410
      • [Connector-V2] [Transform] Add transform-v2 api #3145
      • [Connector-V2] [InfluxDB] Add influxDB sink connector #3174
      • [Connector-V2] [Cassandra] Add Cassandra Source & Sink connector #3229
      • [Connector-V2] [MyHours] Add MyHours source connector #3228
      • [Connector-V2] [Lemlist] Add Lemlist source connector #3346
      • [Connector-V2] [CDC] [MySql] Add mysql cdc source connector #3455
      • [Connector-V2] [CDC] [SqlServer] Add sqlserver cdc source connector #3686
      • [Connector-V2] [Klaviyo] Add Klaviyo source connector #3443
      • [Connector-V2] [OneSingal] Add OneSingal source connector #3454
      • [Connector-V2] [Slack] Add slack sink connector #3226
      • [Connector-V2] [Jira] Add Jira source connector #3473
      • [Connector-V2] [Sqlite] Add Sqlite source & sink connector #3089
      • [Connector-V2] [OpenMldb] Add openmldb source connector #3313
      • [Connector-V2] [Teradata] Add teradata source & sink connector #3362
      • [Connector-V2] [Doris] Add doris source & sink connector #3586
      • [Connector-V2] [MaxCompute] Add MaxCompute source & sink connector #3640
      • [Connector-V2] [Doris] [Streamload] Add doris streamload sink connector #3631
      • [Connector-V2] [Redshift] Add redshift source & sink connector #3615
      • [Connector-V2] [Notion] Add notion source connector #3470
      • [Connector-V2] [File] [Oss-Jindo] Add oss jindo source & sink connector #3456

      ST-Engine

      • [ST-Engine] Support print job metrics when job finished #3691
      • [ST-Engine] Add metrics statistic in ST-Engine #3621
      • [ST-Engine] Support IMap file storage in ST-Engine #3418
      • [ST-Engine] Support S3 file system for IMap file storage #3675
      • [ST-Engine] Support save job restart status information in ST-Engine #3637

      E2E

      • [E2E] [Http] Add http type connector e2e test cases #3340
      • [E2E] [File] [Local] Add local file connector e2e test cases #3221

      Docs

      • [Docs] [Connector-V2] [Factory] Add TableSourceFactory & TableSinkFactor docs #3343
      • [Docs] [Connector-V2] [Schema] Add connector-v2 schema docs #3296
      • [Docs] [Connector-V2] [Quick-Manaul] Add error quick reference manual #3437
      • [Docs] [README] Improve README and refactored other docs #3619
      Source code(tar.gz)
      Source code(zip)
    • 2.3.0-beta(Nov 9, 2022)

      2.3.0 Beta

      [Connector V2]

      [New Connector V2 Added]

      • [Source] [Kafka] Add Kafka Source Connector (2953)
      • [Source] [Pulsar] Add Pulsar Source Connector (1980)
      • [Source] [S3File] Add S3 File Source Connector (3119)
      • [Source] [JDBC] [Phoenix] Add Phoenix JDBC Source Connector (2499)
      • [Source] [JDBC] [SQL Server] Add SQL Server JDBC Source Connector (2646)
      • [Source] [JDBC] [Oracle] Add Oracle JDBC Source Connector (2550)
      • [Source] [JDBC] [GBase8a] Add GBase8a JDBC Source Connector (3026)
      • [Source] [JDBC] [StarRocks] Add StarRocks JDBC Source Connector (3060)
      • [Sink] [Kafka] Add Kafka Source Connector (2953)
      • [Sink] [S3File] Add S3 File Sink Connector (3119)

      [Improve & Bug Fix]

      • [Source] [Fake]

        • [Improve] Supports direct definition of data values(row) (2839)
        • [Improve] Improve fake source connector: (2944)
          • Support user-defined map size
          • Support user-defined array size
          • Support user-defined string length
          • Support user-defined bytes length
        • [Improve] Support multiple splits for fake source connector (2974)
        • [Improve] Supports setting the number of splits per parallelism and the reading interval between two splits (3098)
      • [Source] [Clickhouse]

        • [Improve] Clickhouse Source random use host when config multi-host (3108)
      • [Source] [FtpFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [Improve] Support extract partition from SeaTunnelRow fields (3085)
        • [Improve] Support parse field from file path (2985)
      • [Source] [HDFSFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [Improve] Support extract partition from SeaTunnelRow fields (3085)
        • [Improve] Support parse field from file path (2985)
      • [Source] [LocalFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [Improve] Support extract partition from SeaTunnelRow fields (3085)
        • [Improve] Support parse field from file path (2985)
      • [Source] [OSSFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [Improve] Support extract partition from SeaTunnelRow fields (3085)
        • [Improve] Support parse field from file path (2985)
      • [Source] [IoTDB]

        • [Improve] Improve IoTDB Source Connector (2917)
          • Support extract timestamp、device、measurement from SeaTunnelRow
          • Support TINYINT、SMALLINT
          • Support flush cache to database before prepareCommit
      • [Source] [JDBC]

        • [Feature] Support Phoenix JDBC Source (2499)
        • [Feature] Support SQL Server JDBC Source (2646)
        • [Feature] Support Oracle JDBC Source (2550)
        • [Feature] Support StarRocks JDBC Source (3060)
        • [Feature] Support GBase8a JDBC Source (3026)
      • [Sink] [Assert]

        • [Improve] 1.Support check the number of rows (2844) (3031):
          • check rows not empty
          • check minimum number of rows
          • check maximum number of rows
        • [Improve] 2.Support direct define of data values(row) (2844) (3031)
        • [Improve] 3.Support setting parallelism as 1 (2844) (3031)
      • [Sink] [Clickhouse]

        • [Improve] Clickhouse Support Int128,Int256 Type (3067)
      • [Sink] [Console]

        • [Improve] Console sink support print subtask index (3000)
      • [Sink] [Enterprise-WeChat]

        • [BugFix] Fix Enterprise-WeChat Sink data serialization (2856)
      • [Sink] [FtpFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [BugFix] Fix filesystem get error (3117)
        • [BugFix] Solved the bug of can not parse '\t' as delimiter from config file (3083)
      • [Sink] [HDFSFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [BugFix] Fix filesystem get error (3117)
        • [BugFix] Solved the bug of can not parse '\t' as delimiter from config file (3083)
      • [Sink] [LocalFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [BugFix] Fix filesystem get error (3117)
        • [BugFix] Solved the bug of can not parse '\t' as delimiter from config file (3083)
      • [Sink] [OSSFile]

        • [BugFix] Fix the bug of incorrect path in windows environment (2980)
        • [BugFix] Fix filesystem get error (3117)
        • [BugFix] Solved the bug of can not parse '\t' as delimiter from config file (3083)
      • [Sink] [IoTDB]

        • [Improve] Improve IoTDB Sink Connector (2917)
          • Support align by sql syntax
          • Support sql split ignore case
          • Support restore split offset to at-least-once
          • Support read timestamp from RowRecord
        • [BugFix] Fix IoTDB connector sink NPE (3080)
      • [Sink] [JDBC]

        • [BugFix] Fix JDBC split exception (2904)
        • [Feature] Support Phoenix JDBC Source (2499)
        • [Feature] Support SQL Server JDBC Source (2646)
        • [Feature] Support Oracle JDBC Source (2550)
        • [Feature] Support StarRocks JDBC Source (3060)
      • [Sink] [Kudu]

        • [Improve] Kudu Sink Connector Support to upsert row (2881)
      • [Sink] [Hive]

        • [Improve] Hive Sink supports automatic partition repair (3133)

      [Connector V1]

      [New Connector V1 Added]

      [Improve & Bug Fix]

      • [Sink] [Spark-Hbase]
        • [BugFix] Handling null values (3099)

      [Starter & Core & API]

      [Feature & Improve]

      • [Improve] [Sink] Support define parallelism for sink connector (2941)
      • [Improve] [all] change Log to @slf4j (3001)
      • [Improve] [format] [text] Support read & write SeaTunnelRow type (2969)
      • [Improve] [api] [flink] extraction unified method (2862)
      • [Feature] [deploy] Add Helm charts (2903)
      • [Feature] [seatunnel-text-format] (2884)

      [Bug Fix]

      • [BugFix] Fix assert connector name error in config/plugin_config file (3127)
      • [BugFix] [starter] Fix connector-v2 flink & spark dockerfile (3007)
      • [BugFix] [core] Fix spark engine parallelism parameter does not working (2965)
      • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
      • [BugFix] [format] [json] Fix jackson package conflict with spark (2934)
      • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
      • [BugFix] [build] Fix the invalidation of the suppression file of checkstyle in the win10 (2986)
      • [BugFix] [seatunnel-translation-base] Fix Source restore state NPE (2878)

      [Docs]

      • Add coding guide (2995)

      [SeaTunnel Engine]

      [Feature & Improve]

      [Cluster Manager]

      • Support Run SeaTunnel Engine in stand-alone.
      • Support Run SeaTunnel Engine cluster.
      • Do not rely on third-party services(zookeeper etc) to realize the master-worker architecture.
      • Autonomous cluster (non centralized).
      • Automatic discovery of cluster members.

      [Core]

      • Support submit Job to SeaTunnel Engine in local mode.
      • Support submit Job to SeaTunnel Engine in cluster mode.
      • Support Batch Job.
      • Support Stream Job.
      • Supports batch stream integration, and the batch stream integration feature of all SeaTunnel V2 Connectors can be guaranteed in SeaTunnel Engine.
      • Support Distributed Snapshot algorithm Chandy Ramport algorithm and Two-phase Commit. Exactly-Once semantics based on these implementations.
      • Support pipeline granularity job scheduling, Ensure that the job can be started under limited resources.
      • Support pipeline granularity job restore.
      • Sharing threads between tasks to achieve real-time synchronization of a large number of small datasets.
      Source code(tar.gz)
      Source code(zip)
    • 2.2.0-beta(Oct 3, 2022)

      [Feature & Improve]

      • Connector V2 API, Decoupling connectors from compute engines
        • [Translation] Support Flink 1.13.x
        • [Translation] Support Spark 2.4
        • [Connector-V2] [Fake] Support FakeSource (#1864)
        • [Connector-V2] [Console] Support ConsoleSink (#1864)
        • [Connector-V2] [ElasticSearch] Support ElasticSearchSink (#2330)
        • [Connector-V2] [ClickHouse] Support ClickHouse Source & Sink (#2051)
        • [Connector-V2] [JDBC] Support JDBC Source & Sink (#2048)
        • [Connector-V2] [JDBC] [Greenplum] Support Greenplum Source & Sink(#2429)
        • [Connector-V2] [JDBC] [DM] Support DaMengDB Source & Sink(#2377)
        • [Connector-V2] [File] Support Source & Sink for Local, HDFS & OSS File
        • [Connector-V2] [File] [FTP] Support FTP File Source & Sink (#2774)
        • [Connector-V2] [Hudi] Support Hudi Source (#2147)
        • [Connector-V2] [Icebreg] Support Icebreg Source (#2615)
        • [Connector-V2] [Kafka] Support Kafka Source (#1940)
        • [Connector-V2] [Kafka] Support Kafka Sink (#1952)
        • [Connector-V2] [Kudu] Support Kudu Source & Sink (#2254)
        • [Connector-V2] [MongoDB] Support MongoDB Source (#2596)
        • [Connector-V2] [MongoDB] Support MongoDB Sink (#2649)
        • [Connector-V2] [Neo4j] Support Neo4j Sink (#2434)
        • [Connector-V2] [Phoenix] Support Phoenix Source & Sink (#2499)
        • [Connector-V2] [Redis] Support Redis Source (#2569)
        • [Connector-V2] [Redis] Support Redis Sink (#2647)
        • [Connector-V2] [Socket] Support Socket Source (#1999)
        • [Connector-V2] [Socket] Support Socket Sink (#2549)
        • [Connector-V2] [HTTP] Support HTTP Source (#2012)
        • [Connector-V2] [HTTP] Support HTTP Sink (#2348)
        • [Connector-V2] [HTTP] [Wechat] Support Wechat Source Sink(#2412)
        • [Connector-V2] [Pulsar] Support Pulsar Source (#1984)
        • [Connector-V2] [Email] Support Email Sink (#2304)
        • [Connector-V2] [Sentry] Support Sentry Sink (#2244)
        • [Connector-V2] [DingTalk] Support DingTalk Sink (#2257)
        • [Connector-V2] [IotDB] Support IotDB Source (#2431)
        • [Connector-V2] [IotDB] Support IotDB Sink (#2407)
        • [Connector-V2] [Hive] Support Hive Source & Sink(#2708)
        • [Connector-V2] [Datahub] Support Datahub Sink(#2558)
      • [Catalog] MySQL Catalog (#2042)
      • [Format] JSON Format (#2014)
      • [Spark] [ClickHouse] Support unauthorized ClickHouse (#2393)
      • [Binary-Package] Add script to automatically download plugins (#2831)
      • [License] Update binary license (#2798)
      • [e2e] Improved e2e start sleep (#2677)
      • [e2e] Container only copy required connector jars (#2675)
      • [build] delete connectors*-dist modules (#2709)
      • [build] Dependency management split (#2606)
      • [build] The e2e module don't depend on the connector*-dist module (#2702)
      • [build] Improved scope of maven-shade-plugin (#2665)
      • [build] make sure flatten-maven-plugin runs after maven-shade-plugin (#2603)
      • [Starter] Use the public CommandLine util class to parse the args (#2470)
      • [Spark] [Redis] Self-Achieved Redis Proxy which is not support redis function of "info replication" (#2389)
      • [Flink] [Transform] support multi split,and add custome split function name (#2268)
      • [Test] Upgrade junit to 5.+ (#2305)

      [Bugfix]

      • [Starter] Ensure that output paths constructed from zip archive entries are validated to prevent writing files to unexpected locations (#2843)
      • [Starter] Let the SparkCommandArgs do not split the variable value with comma (#2523)
      • [Spark] fix the problem of calling the getData() method twice (#2764)
      • [e2e] Fix path split exception in win10,not check file existed (#2715)

      [Docs]

      • [Kafka] Update Kafka.md (#2863)
      • [JDBC] Fix inconsistency between document (#2776)
      • [Flink-SQL] [ElasticSearch] Updated prepare section (#2634)
      • [Contribution] add CheckStyle-IDEA Plugin introduction (#2535)
      • [Contribution] Update new-license.md (#2494)
      Source code(tar.gz)
      Source code(zip)
    • 2.1.3(Aug 10, 2022)

      [Feature & Improvement]

      [Connector][Flink][Fake] Supported BigInteger Type (#2118)

      [Connector][Spark][TiDB] Refactored config parameters (#1983)

      [Connector][Flink]add AssertSink connector (#2022)

      [Connector][Spark][ClickHouse]Support Rsync to transfer clickhouse data file (#2074)

      [Connector & e2e][Flink] add IT for Assert Sink in e2e module (#2036)

      [Transform][Spark] data quality for null data rate (#1978)

      [Transform][Spark] Add a module to set default value for null field #1958

      [Chore]a more understandable code,and code warning will disappear #2005

      [Spark] Use higher version of the libthrift dependency (#1994)

      [Core][Starter] Change jar connector load logic (#2193)

      [Core]Add plugin discovery module (#1881)

      [BUG]

      [Connector][Hudi] Source loads the data twice

      [Connector][Doris]Fix the bug Unrecognized field "TwoPhaseCommit" after doris 0.15 (#2054)

      [Connector][Jdbc]Fix the data output exception when accessing Hive using Spark JDBC #2085

      [Connector][Jdbc]Fix JDBC data loss occurs when partition_column (partition mode) is set #2033

      [Connector][Kafka]KafkaTableStream schema json parse #2168

      [seatunnel-core] Failed to get APP_DIR path bug fixed (#2165)

      [seatunnel-api-flink] Connectors dependencies repeat additions (#2207)

      [seatunnel-core] Failed to get APP_DIR path bug fixed (#2165)

      [seatunnel-core-flink] Updated FlinkRunMode enum to get the proper help message for run modes. (#2008)

      [seatunnel-core-flink]fix same source and sink registerplugin librarycache error (#2015)

      [Command]fix commandArgs -t(--check) conflict with flink deployment target (#2174)

      [Core][Jackson]fix jackson type convert error (#2031)

      [Core][Starter] When use cluster mode, but starter app root dir also should same as client mode. (#2141)

      Docs

      source socket connector docs update (#1995)

      Add uuid, udf, replace transform to doc (#2016)

      Update Flink engine version requirements (#2220)

      Add Flink SQL module to website. (#2021)

      [kubernetes] update seatunnel doc on kubernetes (#2035)

      Dependency upgrade

      Upgrade common-collecions4 to 4.4

      Upgrade common-codec to 1.13

      Source code(tar.gz)
      Source code(zip)
    • 2.1.2(Jun 18, 2022)

      [Feature]

      • Add Spark webhook source
      • Support Flink application mode
      • Split connector jar from core jar
      • Add Replace transforms for Spark
      • Add Uuid transform for Spark
      • Support Flink dynamic configurations
      • Flink JDBC source support Oracle database
      • Add Flink connector Http
      • Add Flink transform for register user define function
      • Add Flink SQL Kafka, ElasticSearch connector

      [Bugfix]

      • Fixed ClickHouse sink data type convert error
      • Fixed first execute Spark start shell can't run problem
      • Fixed can not get config file when use Spark on yarn cluster mode
      • Fixed Spark extraJavaOptions can't be empty
      • Fixed the "plugins.tar.gz" decompression failure in Spark standalone cluster mode
      • Fixed Clickhouse sink can not work correctly when use multiple hosts
      • Fixed Flink sql conf parse exception
      • Fixed Flink JDBC Mysql datatype mapping incomplete
      • Fixed variables cannot be set in Flink mode
      • Fixed SeaTunnel Flink engine cannot check source config

      [Improvement]

      • Update Jackson version to 2.12.6
      • Add guide on how to Set Up SeaTunnel with Kubernetes
      • Optimize some generic type code
      • Add Flink SQL e2e module
      • Flink JDBC connector add pre sql and post sql
      • Use @AutoService to generate SPI file
      • Support Flink FakeSourceStream to mock data
      • Support Read Hive by Flink JDBC source
      • ClickhouseFile support ReplicatedMergeTree
      • Support use Hive sink to save table as ORCFileFormat
      • Support Spark Redis sink custom expire time
      • Add Spark JDBC isolationLevel config
      • Use Jackson replace Fastjson
      Source code(tar.gz)
      Source code(zip)
    • 2.1.1(Apr 27, 2022)

      [Feature]

      • Support json format config file
      • Jdbc connector support partition
      • Add ClickhouseFile sink on Spark engine
      • Support compile with jdk11
      • Add elasticsearch 7.x plugin on Flink engine
      • Add Feishu plugin on Spark engine
      • Add Spark http source plugin
      • Add Clickhouse sink plugin on Flink engine

      [Bugfix]

      • Fix flink ConsoleSink not printing results
      • Fix various jdbc type of dialect compatibility between JdbcSource and JdbcSink
      • Fix when have empty data source, transform not execute
      • Fix datetime/date string can't convert to timestamp/date
      • Fix tableexits not contain TemporaryTable
      • Fix FileSink cannot work in flink stream mode
      • Fix config param issues of spark redis sink
      • Fix sql parse table name error
      • Fix not being able to send data to Kafka
      • Fix resource lake of file.
      • Fix When outputting data to doris, a ClassCastException was encountered

      [Improvement]

      • Change jdbc related dependency scope to default
      • Use different command to execute task
      • Automatic identify spark hive plugin, add enableHiveSupport
      • Print config in origin order
      • Remove useless job name from JobInfo
      • Add console limit and batch flink fake source
      • Add Flink e2e module
      • Add Spark e2e module
      • Optimize plugin load, rename plugin package name
      • Rewrite Spark, Flink start script with code.
      • To quickly locate the wrong SQL statement in flink sql transform
      • Upgrade log4j version to 2.17.1
      • Unified version management of third-party dependencies
      • USe revision to manage project version
      • Add sonar check
      • Add ssl/tls parameter in spark email connector
      • Remove return result of sink plugin
      • Add flink-runtime-web to flink example Please go to the official channel to download: https://seatunnel.apache.org/download
      Source code(tar.gz)
      Source code(zip)
    • 2.1.0(Mar 20, 2022)

      • Use JCommander to do command line parameter parsing, making developers focus on the logic itself.
      • Flink is upgraded from 1.9 to 1.13.5, keeping compatibility with older versions and preparing for subsequent CDC.
      • Support for Doris, Hudi, Phoenix, Druid, and other Connector plugins, and you can find complete plugin support here plugins-supported-by-seatunnel.
      • Local development extremely fast starts environment support. It can be achieved by using the example module without modifying any code, which is convenient for local debugging.
      • Support for installing and trying out Apache SeaTunnel(Incubating) via Docker containers.
      • SQL component supports SET statements and configuration variables.
      • Config module refactoring to facilitate understanding for the contributors while ensuring code compliance (License) of the project.
      • Project structure realigned to fit the new Roadmap.
      • CI&CD support, code quality automation control (more plans will be carried out to support CI&CD development). Please go to the official channel to download: https://seatunnel.apache.org/download
      Source code(tar.gz)
      Source code(zip)
    • v1.5.7(Dec 28, 2021)

    • v1.5.6(Dec 23, 2021)

      What's Changed

      • [project rename] changed start-waterdrop.sh to start-seatunnel.sh, changed logo ascii code from waterdrop to seatunnel by @garyelephant
      • [Feature] added the abstraction of BaseAction by @garyelephant in https://github.com/InterestingLab/seatunnel/pull/810
      • [feature] allow user to customize log4j.properties @garyelephant in https://github.com/InterestingLab/seatunnel/issues/267#issuecomment-640986057
      • [bugfix] fixed a bug of kerberos config in spark config by @garyelephant in https://github.com/InterestingLab/seatunnel/issues/590
      • [bugfix] Fix bug of #719 by @RickyHuo in https://github.com/InterestingLab/seatunnel/pull/743
      Source code(tar.gz)
      Source code(zip)
      seatunnel-1.5.6.zip(68.01 MB)
    • v1.5.3(Aug 11, 2021)

    • v1.5.2(Aug 9, 2021)

    • v2.0.4(Oct 13, 2020)

    • v1.5.1(Jul 20, 2020)

      [Feature] Add redisStream input plugin. [Feature] mongoDB input plugin add parameter of schema, support specify schema by yourself. [Enhancement] Support type of Nullable(Decimal(P, S)) with clickhouse output plugin. [Enhancement] Output plugin using parameter of format rather than serializer [Bugfix] Fix #492 #517 #534


      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.5.0.zip 下载。

      如果Github下载速度慢,可通过百度云(链接:https://pan.baidu.com/s/19GUwZPC2YBG9Pt7iuF9TNw 密码:upeb) 直接下载。


      备注:spark >= 2.3 下载 waterdrop-1.5.1.zip, spark < 2.3 下载waterdrop-1.5.1-with-spark.zip


      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.5.1-with-spark.zip(169.07 MB)
      waterdrop-1.5.1.zip(49.27 MB)
    • v1.5.0(Jun 9, 2020)

      1. [Enhancement] Support Chinese column name with ClickHouse output.
      2. [Enhancement] Remove useless code with antlr4.
      3. [Enhancement] Support specify queue with -q or --queue.
      4. [Enhancement] Optimize batch processing,drop unnecessary coding.
      5. [Feature] Replace third party jar package(config-1.3.3-SNAPSHOT.jar) with waterdrop-config module .
      6. [Feature] Add filter plugin of urldecode and urlencode.
      7. [Bugfix] Fix #392 #411 (Config Parse bug).
      8. [Bugfix] Support specify --driver-memory with waterdrop config file in spark section (#507) .

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.5.0.zip 下载。

      如果Github下载速度慢,可通过百度云(链接:https://pan.baidu.com/s/1vCpGUcpSdyetLMMB39J2fg 密码:ullf) 直接下载。


      备注:spark >= 2.3 下载 waterdrop-1.5.0.zip, spark < 2.3 下载waterdrop-1.5.0-with-spark.zip


      Upgrade Guide

      • If you upgrade from a previous version, you have to update all plugin dependencies that developed by yourself.
      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.5.0-with-spark.zip(170.33 MB)
      waterdrop-1.5.0.zip(50.53 MB)
    • v1.4.3(Apr 22, 2020)

      1. [Feature] Support ClickHouse Cluster Mode using parame of cluster. Reading ClickHouse table of system.clusters
      2. [Fixbug] Fix a bug of checkConfig when using result_table_name rather than table_name
      3. [Fixbug] Fix a bug of MongoDB in Spark Structured Streaming Output
      4. [Enhancement] Update dependency of ElasticSearch to 7.6.2
      5. [Enhancement] Update dependency of clickhouse-jdbc to 0.2.4

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.4.3.zip 下载。

      如果Github下载速度慢,可通过百度云(链接:https://pan.baidu.com/s/1Qik5I1IGsgx1u26plSOFDg 密码:fqkr) 直接下载。


      备注:spark >= 2.3 下载 waterdrop-1.4.3.zip, spark < 2.3 下载waterdrop-1.4.3-with-spark.zip

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.4.3-with-spark.zip(170.34 MB)
      waterdrop-1.4.3.zip(50.81 MB)
    • v2.0.0-pre(Jan 19, 2020)

    • v1.4.2(Dec 6, 2019)

      • [Enhancement] MySQL input supports parallel loading data with jdbc.partitionColumn.
      • [Fixbug] Fix a bug that cannot use result_table_name in Hive and MongoDB input plugins.

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.4.1.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1haSd0KFMAS-qQZqI4QHWjA) 直接下载。


      备注:spark >= 2.3 下载 waterdrop-1.4.2.zip, spark < 2.3 下载waterdrop-1.4.2-with-spark.zip

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.4.2-with-spark.zip(169.65 MB)
      waterdrop-1.4.2.zip(50.02 MB)
    • v1.4.1(Sep 11, 2019)

      • [Enhancement] Structured streaming kafka input supports more complex json data.
      • [Enhancement] Fix bug of structured streaming jdbc output. #364
      • [Enhancement] File input supports specify format full qualified class name.
      • [Fixbug] Fix a bug of waterdrop exit status.

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.4.1.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/13VrqUkyZLYT0I8R1ajBvdw) 直接下载。


      备注:spark >= 2.3 下载 waterdrop-1.4.1.zip, spark < 2.3 下载waterdrop-1.4.1-with-spark.zip

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.4.1-with-spark.zip(169.62 MB)
      waterdrop-1.4.1.zip(49.99 MB)
    • v1.4.0(Aug 12, 2019)

      • [Feature] Add common options of source_table_name and result_table_name
      • [Feature] Remove option of table_name
      • [Enhancement] ClickHouse Output plugin compatible with Null message
      • [Enhancement] fields of ClickHouse Output was set to unnecessary option
      • [Feature] ClickHouse output supports decimal
      • [Feature] Support to develop input plugin with Java
      • [Fixbug] Fix a bug of config parser, #353

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.4.0.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1SBh207x07eCmUHN5GHywfw) 直接下载。


      备注:spark >= 2.3 下载 waterdrop 1.4.0.zip, spark < 2.3 下载waterdrop-1.4.0-with-spark.zip

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.4.0-with-spark.zip(169.62 MB)
      waterdrop-1.4.0.zip(49.99 MB)
    • v1.3.8(Jul 9, 2019)

      • [Fix Bug] fixed a bug of config file variable substitution in spark section.

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.5.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1ZNehHx_Tpeiq530S_JetIA) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.8.zip(49.93 MB)
    • v1.3.7(Jun 26, 2019)

      • [Feature] Support ClickHouse LowCardinality[T]
      • [Enhancement] Do not output if dataset is empty

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.5.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1o0eMK4kqZaIzUHJhYdIT3g) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.7.zip(49.93 MB)
    • v1.3.6(Jun 3, 2019)

      • [Enhancement] Kafka output plugin supports the parameter of serializer. #322

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.5.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1Gcs_GjDA7srMYiBQQCCXzg) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.6.zip(49.93 MB)
    • v1.3.5(May 20, 2019)

      • [Enhancement] Removed all duplicate dependencies for spark and hadoop in assembly jar
      • [Enhancement] Do not take(n) in batch processing

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.5.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/19r42RWQxYswsPOt0bIq3tQ) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.5.zip(49.92 MB)
    • v1.3.3(May 9, 2019)

      • [Enhancement] Optimize the performance of ES input.
      • [Bugfix] Fix #305

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.3.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/144BOQqR8Uf08ecIZ4kXQwA) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.3.zip(167.79 MB)
    • v1.3.2(Apr 28, 2019)

      • [Feature] Support elasticsearch input plugin.
      • [Feature] ClickHouse output will retry with specified error code.

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-1.3.2.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1gn-wgbTKXvTj1WUKTa9x2Q) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.2.zip(167.84 MB)
    • v1.3.1(Apr 15, 2019)

      • [Feature] Supported structured streaming jdbc output.
      • [Enhancement] Optimized filter of watermark for structured streaming.
      • [Enhancement] Optimized structured streaming kafka input.
      • [Enhancement] ClickHouse output supported type Of Nullable(T)

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/114yVMKj0u_XURcDr3ESeMg) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.1.zip(167.70 MB)
    • v1.3.0(Mar 29, 2019)

      • [Feature] Support Spark Structured Streaming.

      ./bin/start-waterdrop-structured-streaming.sh --master 'local[4]' --deploy-mode client --config ./config/structuredstreaming.conf.template


      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-<version>.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/15YUTHl7IP8cpieX9003RSQ) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.3.0.zip(165.31 MB)
    • v1.2.4(Mar 20, 2019)

      • [Enhancement] Replace antlr4 with typesafe config implementation as config parser.
      • [Feature] Added variables substitution feature in waterdrop config file.
      • [Enhancement] Supported clickhouse jdbc settings in waterdrop config file.

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-<version>.zip 下载。

      如果Github下载速度慢,可通过百度云(https://pan.baidu.com/s/1PtUPIcfBmL8l5Ib3KLrZvA) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.2.4.zip(162.70 MB)
    • v1.2.2(Mar 5, 2019)

      • [Enhancement] Allow to specify json schema in Json filter, Special Thanks for @huangdeheng
      • [BUG] Fixed class not found exception for Jdbc Input
      • [Enhancement] Set SparkConf in start-waterdrop.sh
      • [Enhancement] Added checkSQLSyntax in Sql Filter

      注意:Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码,请点击下面waterdrop-<version>.zip 下载。

      如果github下载速度慢,可通过百度云(https://pan.baidu.com/s/1cH6nB07BiRaJR6AuEo5dNg) 直接下载。

      Source code(tar.gz)
      Source code(zip)
      waterdrop-1.2.2.zip(162.63 MB)
    • v1.2.1(Feb 24, 2019)

    Owner
    The Apache Software Foundation
    The Apache Software Foundation
    A high available,high performance distributed messaging system.

    #新闻 MetaQ 1.4.6.2发布。更新日志 MetaQ 1.4.6.1发布。更新日志 MetaQ 1.4.5.1发布。更新日志 MetaQ 1.4.5发布。更新日志 Meta-ruby 0.1 released: a ruby client for metaq. SOURCE #介绍 Meta

    dennis zhuang 1.3k Dec 12, 2022
    Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

    Firehose - Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.

    Open DataOps Foundation 279 Dec 22, 2022
    Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.

    Dagger Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processi

    Open DataOps Foundation 238 Dec 22, 2022
    Kryptonite is a turn-key ready transformation (SMT) for Apache Kafka® Connect to do field-level 🔒 encryption/decryption 🔓 of records. It's an UNOFFICIAL community project.

    Kryptonite - An SMT for Kafka Connect Kryptonite is a turn-key ready transformation (SMT) for Apache Kafka® to do field-level encryption/decryption of

    Hans-Peter Grahsl 53 Jan 3, 2023
    Real Time communication library using Animated Gifs as a transport™

    gifsockets "This library is the websockets of the '90s" - Somebody at Hacker News. This library shows how to achieve realtime text communication using

    Alvaro Videla 1.8k Dec 17, 2022
    Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

    Apache Camel Apache Camel is a powerful, open-source integration framework based on prevalent Enterprise Integration Patterns with powerful bean integ

    The Apache Software Foundation 4.7k Dec 31, 2022
    HornetQ is an open source project to build a multi-protocol, embeddable, very high performance, clustered, asynchronous messaging system.

    HornetQ If you need information about the HornetQ project please go to http://community.jboss.org/wiki/HornetQ http://www.jboss.org/hornetq/ This file

    HornetQ 245 Dec 3, 2022
    High Performance Inter-Thread Messaging Library

    LMAX Disruptor A High Performance Inter-Thread Messaging Library Maintainer LMAX Development Team Support Open a ticket in GitHub issue tracker Google

    LMAX Group 15.5k Jan 9, 2023
    Carbyne Stack MP-SPDZ Integration Utilities

    Carbyne Stack MP-SPDZ Integration Utilities This project provides utilities for using MP-SPDZ in the Carbyne Stack microservices. License Carbyne Stac

    Carbyne Stack 5 Oct 15, 2022
    A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues

    Nakadi Event Broker Nakadi is a distributed event bus broker that implements a RESTful API abstraction on top of Kafka-like queues, which can be used

    Zalando SE 866 Dec 21, 2022
    Apache Pulsar - distributed pub-sub messaging system

    Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API. Learn more about Pulsar at https:

    The Apache Software Foundation 12.1k Jan 4, 2023
    An Open-Source, Distributed MQTT Message Broker for IoT.

    MMQ broker MMQ broker 是一款完全开源,高度可伸缩,高可用的分布式 MQTT 消息服务器,适用于 IoT、M2M 和移动应用程序。 MMQ broker 完整支持MQTT V3.1 和 V3.1.1。 安装 MMQ broker 是跨平台的,支持 Linux、Unix、macOS

    Solley 60 Dec 15, 2022
    Carbyne Stack secret sharing distributed object store

    Carbyne Stack Amphora Secret Share Store Amphora is an open source object store for secret shared data and part of Carbyne Stack. DISCLAIMER: Carbyne

    Carbyne Stack 6 Dec 1, 2022
    Powerful event-bus optimized for high throughput in multi-threaded applications. Features: Sync and Async event publication, weak/strong references, event filtering, annotation driven

    MBassador MBassador is a light-weight, high-performance event bus implementing the publish subscribe pattern. It is designed for ease of use and aims

    Benjamin Diedrichsen 930 Jan 6, 2023
    Dataflow template which read data from Kafka (Support SSL), transform, and outputs the resulting records to BigQuery

    Kafka to BigQuery Dataflow Template The pipeline template read data from Kafka (Support SSL), transform the data and outputs the resulting records to

    DoiT International 12 Jun 1, 2021
    Pipeline for Visualization of Streaming Data

    Seminararbeit zum Thema Visualisierung von Datenströmen Diese Arbeit entstand als Seminararbeit im Rahmen der Veranstaltung Event Processing an der Ho

    Domenic Cassisi 1 Feb 13, 2022
    A modular and portable open source XMPP client library written in Java for Android and Java (SE) VMs

    Smack About Smack is an open source, highly modular, easy to use, XMPP client library written in Java for Java SE compatible JVMs and Android. A pure

    Ignite Realtime 2.3k Dec 28, 2022
    Evgeniy Khyst 54 Dec 28, 2022
    Microservice-based online payment system for customers and merchants using RESTful APIs and message queues

    Microservice-based online payment system for customers and merchants using RESTful APIs and message queues

    Daniel Larsen 1 Mar 23, 2022