SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Last update: Jan 2, 2023

Overview

SeaTunnel

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

Data loss and duplication
Task accumulation and delay
Low throughput
Long cycle to be applied in the production environment
Lack of application running status monitoring

SeaTunnel use scenarios

Mass data synchronization
Mass data integration
ETL with massive data
Mass data aggregation
Multi-source data processing

Features of SeaTunnel

Easy to use, flexible configuration, low code development
Real-time streaming
Offline multi-source data analysis
High-performance, massive data processing capabilities
Modular and plug-in mechanism, easy to extend
Support data processing and aggregation by SQL
Support Spark structured streaming
Support Spark 2.x

Workflow of SeaTunnel

Input[Data Source Input] -> Filter[Data Processing] -> Output[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient. Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel

Input plugin Fake, File, Hdfs, Kafka, S3, Socket, self-developed Input plugin
Filter plugin Add, Checksum, Convert, Date, Drop, Grok, Json, Kv, Lowercase, Remove, Rename, Repartition, Replace, Sample, Split, Sql, Table, Truncate, Uppercase, Uuid, Self-developed Filter plugin
Output plugin Elasticsearch, File, Hdfs, Jdbc, Kafka, Mysql, S3, Stdout, self-developed Output plugin

Environmental dependency

java runtime environment, java >= 8
If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:

Spark on Yarn
Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark and Flink.

Downloads

Download address for run-directly software package :https://github.com/apache/incubator-seatunnel/releases

Quick start

Quick start: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/quick-start

Detailed documentation on SeaTunnel:https://interestinglab.github.io/seatunnel-docs/#/

Application practice cases

Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task monitoring for hundreds of real-time streaming computing tasks.

Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and maintenance for Sina News, CDN and other services, and write it into Clickhouse.

Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill technical support, and well covers most offline and real-time tasks needs.

Yixia Technology, Yizhibo Data Platform
Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a new retail brand of Yonghui Yunchuang Technology.

Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data volume average daily, and later writing the data to Clickhouse.

Tencent Cloud

Collecting various logs from business services into Apache Kafka, some of the data in Apache Kafka is consumed and extracted through Seatunnel, and then store into Clickhouse.

For more use cases, please refer to: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/case_study/

Code of conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please follow the REPORTING GUIDELINES to report unacceptable behavior.

Developer

Thanks to all developers https://github.com/apache/incubator-seatunnel/graphs/contributors

Contact Us

Mail list: [email protected]. Mail to [email protected], follow the reply to subscribe the mail list.
Slack: Send Request to join SeaTunnel slack mail to the mail list([email protected]), we will invite you in.
bilibili

Comments

[Umbrella][Connector] New SeaTunnel API Connectors

Please Move To https://github.com/apache/incubator-seatunnel/issues/3018

| No | Connector | priority | difficulty | Source/Sink | Contributer | Status | Issue/PR | |-----|-----------------------------------------------------------------------------------------------------------------------------------------|----------|------------|-------------|------------------|--------|----------| | 1 | Console | high | low | Sink | @ruanwenjun | Done | #1864 | | 2 | Fake | high | low | Source | @ruanwenjun | Done | #1864 | | 3 | Doris | high | high | Source | @2013650523 | | #2536 | | 4 | Doris | high | high | Sink | @hk-lrzy | | #2586 | | 5 | Druid | high | middle | Source | @guanboo | | #2937 | | 6 | Druid | high | middle | Sink | @guanboo | | #2937 | | 7 | ElasticSearch | high | high | Source | @iture123 | | #2821 | | 8 | ElasticSearch | high | high | Sink | @iture123 | Done | #2330 | | 9 | ClickHouse | high | high | Source | @Hisoka-X | Done | #2051 | | 10 | ClickHouse | high | high | Sink | @Hisoka-X | Done | #2051 | | 11 | JDBC | high | high | Source | @ic4y | Done | #2048 | | 12 | JDBC | high | high | Sink | @ic4y | Done | #2048 | | 13 | FeiShu | low | low | Source | @TyrantLucifer | | | | 14 | FeiShu | low | low | Sink | @TyrantLucifer | Done | | | 15 | File(Local) | high | middle | Source | @TyrantLucifer | Done | | | 16 | File(Local) | high | middle | Sink | @EricJoy2048 | Done | #2117 | | 17 | File(S3) | high | middle | Source | @TyrantLucifer | | | | 18 | File(S3) | high | middle | Sink | @TyrantLucifer | | | | 19 | File(HDFS) | high | high | Source | @TyrantLucifer | Done | | | 20 | File(HDFS) | high | high | Sink | @EricJoy2048 | Done | | | 21 | File(OSS) | high | middle | Source | @TyrantLucifer | Done | | | 22 | File(OSS) | high | middle | Sink | @TyrantLucifer | Done | | | 23 | Hudi | high | high | Source | @Emor-nj | Done | #2147 | | 24 | Hudi | high | high | Sink | @Emor-nj | | | | 25 | Iceberg | high | middle | Source | @hailin0 | Done | #2615 | | 26 | Iceberg | high | middle | Sink | @s7monk | | | | 27 | Kafka | high | middle | Source | @Hisoka-X | Done | | | 28 | Kafka | high | middle | Sink | @ruanwenjun | Done | | | 29 | Kudu | high | high | Source | @2013650523 | Done | #2254 | | 30 | Kudu | high | high | Sink | @2013650523 | Done | #2254 | | 31 | MongoDB | high | middle | Source | @wuchunfu | Done | | | 32 | MongoDB | high | middle | Sink | @wuchunfu | Done | | | 33 | Neo4j | middle | middle | Source | @getChan | | #2777 | | 34 | Neo4j | middle | middle | Sink | @getChan | Done | #2434 | | 35 | Phoenix | middle | middle | Source | @531651225 | Done | #2499 | | 36 | Phoenix | middle | middle | Sink | @531651225 | Done | #2499 | | 37 | Redis | high | middle | Source | @TyrantLucifer | Done | | | 38 | Redis | high | middle | Sink | @TyrantLucifer | Done | | | 39 | Socket | high | low | Source | @zhuangchong | Done | #1999 | | 40 | Socket | high | low | Sink | @531651225 | Done | #2549 | | 41 | [JDBC]Tidb | high | low | Source | @xbkaishui | Abandonment | | | 42 | [JDBC]Tidb | high | low | Sink | @xbkaishui | Abandonment | | | 43 | Webhook | high | middle | Sink | @TyrantLucifer | Done | #2348 | | 44 | InfluxDB | middle | middle | Source | @531651225 | | #2697 | | 45 | InfluxDB | middle | middle | Sink | @Jellal-HT | | | | 46 | Pulsar | high | middle | Source | @ashulin | Done | #1984 | | 47 | Pulsar | high | middle | Sink | @FlechazoW | | | | 48 | Email | middle | low | Sink | @2013650523 | Done | #2304 | | 49 | Assert | low | low | Sink | @lhyundeadsoul | Done | | | 50 | Redshift | high | high | Source | | | | | 51 | Facebook Marketing | middle | middle | Source | | | | | 52 | HubSpot | middle | middle | Source | | | | | 53 | Instagram | middle | middle | Source | | | | | 54 | Bing ADs | middle | middle | Source | | | | | 55 | Google Analytics | middle | middle | Source | @MRYOG | | | | 56 | Intercom | middle | middle | Source | | | | | 57 | Zendesk | middle | middle | Source | | | | | 58 | TikTok Marketing | middle | middle | Source | | | | | 59 | Salesforce | middle | middle | Source | | | | | 60 | LinkedIn Ads | middle | middle | Source | | | | | 61 | Stripe | middle | middle | Source | | | | | 62 | Sentry | middle | middle | Sink | @Saintyang | done | #2244 | | 63 | DingTalk | low | low | Source | @MRYOG | | #2757 | | 63 | DingTalk | low | low | Sink | @MRYOG | Done | #2257 | | 64 | IoTDB | high | high | Source | @CalvinKirs | Done | | | 65 | IoTDB | high | high | Sink | @hailin0 | Done | | | 66 | TD-engine | middle | middle | Source | @lhyundeadsoul | | #2707 | | 67 | TD-engine | middle | middle | Sink | @lhyundeadsoul | | | | 68 | HBase | high | high | Source | @zhuzhengjun01 | | | | 69 | HBase | high | high | Sink | @zhuzhengjun01 | | | | 70 | Hive | high | high | Source | @CalvinKirs | Done | | | 71 | Hive | high | high | Sink | @EricJoy2048 | Done | | | 72 | HTTP | high | middle | Source | @zhuangchong | Done | #2012 | | 72 | StarRocks | high | high | Sink | @tonyDong-code | | | | 73 | StarRocks | high | high | Source | @wangw9420 | | | | 74 | ADB PostgreSQL | high | low | Sink | @etcZYP | | | | 75 | Greenplum | high | high | Source | @hailin0 | Done | #2429 | | 76 | Greenplum | high | high | Sink | @hailin0 | Done | #2429 | | 77 | OceanBase | middle | high | Sink | @silenceland | | | | 78 | OceanBase | middle | high | Source | @silenceland | | | | 79 | DB2 | high | high | Sink | @laglangyue | | #2410 | | 80 | DB2 | high | high | Source | @laglangyue | | #2410 | | 81 | BigSource | middle | middle | Source | | | | | 82 | Github | low | low | Source | @MonsterChenzhuo | | | | 83 | Enterprise WeChat | low | low | Sink | @531651225 | Done | #2412 | | 84 | Slack | middle | low | Sink | @Charlie17Li | | | | 85 | Databricks Lakehouse | high | high | Sink | | | | | 86 | Snowflake | middle | high | Sink | | | | | 87 | Snowflake | middle | high | Source | | | | | 88 | [Jdbc]Sql-Server | middle | low | Sink | @liugddx | Done | #2646 | | 89 | [Jdbc]Sql-Server | middle | low | Source | @liugddx | Done | #2646 | | 90 | [Jdbc]Oracle | high | low | Sink | @liugddx | | #2550 | | 91 | [Jdbc]Oracle | high | low | Source | @liugddx | | #2550 | | 92 | [JDBC]Rds | high | low | Sink | @s7monk | | #2829 | | 93 | [JDBC]Rds | high | low | Source | @s7monk | | #2829 | | 94 | [JDBC]SqlLite | middle | low | Sink | @maruko-code | | | | 95 | [JDBC]SqlLite | middle | low | Source | @Caribbeanz | | | | 96 | [JDBC]DM(达梦) | middle | low | Source | @laglangyue | done | #2377 | | 97 | [JDBC]DM(达梦) | middle | low | Sink | @laglangyue | done | #2377 | | 98 | Cassandra | middle | middle | Sink | @bigdataf | | | | 99 | Cassandra | middle | middle | Source | @bigdataf | | | | 100 | [File]excel | middle | low | Sink | @Bingz2 | | #2585 | | 101 | [File]excel | middle | low | Source | @MonsterChenzhuo | | | | 102 | [File]JSON | low | low | Sink | @hailin0 | Done | | | 103 | [File]JSON | low | low | Source | @TyrantLucifer | Done | | | 104 | MaxCompute | middle | middle | Source | @longer-jl | | | | 105 | MaxCompute | middle | middle | Sink | @longer-jl | | | | 106 | TDSql | middle | middle | Source | @dzzxjl | | | | 107 | OpenMLDB | middle | high | Source | @TyrantLucifer | | | | 107 | OpenMLDB | middle | high | Sink | @Dlimeng | | | | 108 | Ftp | middle | middle | Sink | @chessplay | done | #2774 | | 109 | Ftp | middle | middle | Source | @guanboo | done | #2774 | | 110 | GaussDB | middle | middle | Source | @Builder34 | | | | 111 | GaussDB | middle | middle | Sink | @Builder34 | | | | 110 | Teradata | middle | middle | Source | | | | | 111 | Teradata | middle | middle | Sink | | | | | 112 | SFTP | middle | middle | Sink | @TyrantLucifer | | | | 113 | SFTP | middle | middle | Source | @TyrantLucifer | | | | 114 | DataHub | middle | middle | Source | @selectbook | | | | 115 | DataHub | middle | middle | Sink | @chessplay | done | | | 116 | SAP HANA | middle | middle | Source | | | | | 117 | SAP HANA | middle | middle | Sink | | | | | 118 | Flink Table Store | high | high | Sink | @iture123 | | | | 119 | Flink Table Store | high | high | Source | @zhaomin1423 | | | | 120 | Vertica | middle | middle | Source | | | | | 121 | Vertica | middle | middle | Sink | | | | | 122 | Kylin | middle | middle | Source | @531651225 | | | | 123 | Kylin | middle | middle | Sink | @TaoZex | | | | 124 | Neocrm | middle | middle | Source | | | | | 125 | TiDB | middle | middle | Source | @Xuxiaotuan | | #2830 | | 126 | TiDB | middle | middle | Sink | @Xuxiaotuan | | #2830 | | 127 | Sentry | middle | middle | Source | | | | | 128 | PolarDB | middle | middle | Source | | | | | 129 | PolarDB | middle | middle | Sink | | | | | 130 | PolarDB-X | middle | middle | Source | | | | | 131 | PolarDB-X | middle | middle | Sink | | | | | 132 | AnalyticDB | middle | middle | Sink | | | | | 133 | TDSQL | middle | middle | Sink | | | | | 134 | SequoiaDB | middle | middle | Sink | | | | | 135 | TcaplusDB | middle | middle | Source | | | | | 136 | TcaplusDB | middle | middle | Sink | | | | | 137 | GoldenDB | middle | middle | Source | | | | | 138 | GoldenDB | middle | middle | Sink | | | | | 139 | AntDB | middle | middle | Source | | | | | 140 | AntDB | middle | middle | Sink | | | | | 141 | OushuDB | middle | middle | Sink | | | | | 142 | SUNDB | middle | middle | Source | | | | | 143 | SUNDB | middle | middle | Sink | | | | | 144 | UXDB | middle | middle | Source | | | | | 145 | UXDB | middle | middle | Sink | | | | | 146 | DolphinDB | middle | middle | Source | | | | | 147 | DolphinDB | middle | middle | Sink | | | | | 148 | RapidsDB | middle | middle | Source | | | | | 149 | RapidsDB | middle | middle | Sink | | | | | 150 | GreatDB | middle | middle | Source | | | | | 151 | GreatDB | middle | middle | Sink | | | | | 152 | CirroData | middle | middle | Sink | | | | | 153 | Nebula | middle | middle | Sink | | | | | 154 | Gbase 8a | middle | middle | Source | | | | | 155 | KunlunDB | middle | middle | Sink | | | | | 156 | Percona | middle | middle | Source | | | | | 157 | Percona | middle | middle | Sink | | | | | 158 | Splunk | middle | middle | Source | | | | | 159 | Splunk | middle | middle | Sink | | | | | 160 | Amazon DynamoDB | middle | middle | Source | | | | | 161 | Amazon DynamoDB | middle | middle | Sink | | | | | 162 | Microsoft Azure SQL Database | middle | middle | Source | | | | | 163 | Microsoft Azure SQL Database | middle | middle | Sink | | | | | 164 | Neo5j | middle | middle | Source | | | | | 165 | Neo5j | middle | middle | Sink | | | | | 166 | Solr | middle | middle | Sink | | | | | 167 | BigQuery | middle | middle | Source | | | | | 168 | BigQuery | middle | middle | Sink | | | | | 169 | SAP Adaptive Server | middle | middle | Source | | | | | 170 | SAP Adaptive Server | middle | middle | Sink | | | | | 171 | Microsoft Azure Cosmos DB | middle | middle | Source | | | | | 172 | Microsoft Azure Cosmos DB | middle | middle | Sink | | | | | 173 | PostGIS | middle | middle | Source | | | | | 174 | PostGIS | middle | middle | Sink | | | | | 175 | Couchbase | middle | middle | Sink | | | | | 176 | Vika | middle | middle | Sink | | | | | 177 | Gitlab | low | low | Source | | | |
help wanted good first issue volunteer wanted API-refactor connectors-v2

opened by Hisoka-X 194
[Connector-V2][JDBC-connector] support Jdbc dm
Purpose of this pull request

Check list

[x] Code changed are covered with tests, or it does not need tests for reason:

[x] If any new Jar binary package adding in your PR, please add License Notice according New License Guide

[x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

connectors-v2
opened by laglangyue 25
[DISCUSS][metrics] Support metrics statistics
Search before asking

[X] I had searched in the feature and found no similar feature requirement.

Description

Support metrics statistics when transmitting data.

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

discuss
opened by leo65535 24
FAQ
FAQ 1. Waterdrop开发者自己开发插件时，是否需要了解Waterdrop代码，是否需要把代码写到Waterdrop项目里？

开发者开发的插件，与waterdrop项目可以完全没有关系，不需要把你的插件代码写到waterdrop项目里面。插件可以是一个完全独立的项目，在里面你用java，scala，maven，sbt，gradle，都随你。这也是我们建议开发者开发插件的方式。

FAQ 2. 以集群模式(cluster)运行waterdrop，提示找不到：plugins.tar.gz

使用cluster模式提交前，需要您先执行如下命令：

# 备注：预计下一个版本发布时，v1.2.3 我们会支持插件目录自动打包，无需再执行此命令。 tar zcvf plugins.tar.gz plugins

将插件目录打包后，执行(之后如果您的plugins目录没有添加或删除插件，则不需要再次打包了)

./bin/start-waterdrop.sh --master yarn --deploy-mode cluster --config ./config/first.conf

如有其他需要，请加微信garyelephant 为您服务。

FAQ 3. Waterdrop启动后报错如下：

ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3

你的问题是jar包依赖冲突了，可以下载一下最新版本试试，应该没事了：

https://github.com/InterestingLab/waterdrop/releases/download/v1.2.3/waterdrop-1.2.3.zip

FAQ 4. 我想学习Waterdrop 源码，从哪里开始呢？

Waterdrop 拥有完全抽象化，结构化的代码实现，已经有很多人选择将Waterdrop的源码作为学习Spark的方式，你可以从主程序入口开始学习源码：Waterdrop.scala

FAQ 5. Waterdrop 是否支持动态的变量替换，比如我想在定时任务中替换sql中的where条件？

没问题，都支持，具体配置例子，请见用${varname} 做变量替换的配置示例。

FAQ 6. Waterdrop 是否支持在Azkaban, Oozie 这些任务调度框架中运行呢？

当然可以，请见下面的截图：

FAQ 7. 使用Waterdrop时遇到问题，我自己解决不了，我应该怎么办？

请进入项目主页，找到项目负责人的微信号，加他微信。

FAQ 8. Waterdrop 中如何在配置中指定变量，之后在运行时，动态指定变量的值？

Waterdrop 从v1.2.4开始，支持在配置中指定变量，此功能常用于做定时或非定时的离线处理时，替换时间、日期等变量，用法如下：

在配置中，配置变量名称，比如:

... filter { sql { table_name = "user_view" sql = "select * from user_view where city ='"${city}"' and dt = '"${date}"'" } } ...

这里只是以sql filter举例，实际上，配置文件中任意位置的key = value中的value，都可以使用变量替换功能。

详细配置示例，请见variable substitution

启动命令如下：

# local 模式 ./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m local[2] -i city=shanghai -i date=20190319 # yarn client 模式 ./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m yarn -i city=shanghai -i date=20190319 # yarn cluster 模式 ./bin/start-waterdrop.sh -c ./config/your_app.conf -e cluster -m yarn -i city=shanghai -i date=20190319 # mesos, spark standalone 启动方式相同。

可以用参数 -i 或者 --variable 后面指定 key=value来指定变量的值，其中key 需要与配置中的变量名相同。

FAQ 9. Waterdrop消费Kafka出现OOM怎么解决？

多数情况，OOM是由消费没有限速导致的，解决方法如下：

详见：https://www.processon.com/view/link/5c9862ece4b0c996d36fe7d7

document
opened by garyelephant 24
[Improve] [Connector-V2] File Connector add lzo compression way.
Purpose of this pull request

Check list

[ ] Code changed are covered with tests, or it does not need tests for reason:

[ ] If any new Jar binary package adding in your PR, please add License Notice according New License Guide

[ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

[ ] If you are contributing the connector code, please check that the following files are updated:

Update change log that in connector document. For more details you can refer to connector-v2

Update plugin-mapping.properties and add new connector information in it

Update the pom file of seatunnel-dist

improve First-time contributor connectors-v2 Waiting for code update approved reviewed
opened by lightzhao 21
[Feature][Connector] Split connector jar from release core jar
Search before asking

[X] I had searched in the feature and found no similar feature requirement.

Description

Now all the connector jar in the binary distribution package of Seatunnel are packaged into one jar file: core . This makes it impossible for us to implement multi-version support for the same component.

Now

. apache-seatunnel | - - lib | - - seatunnel-core-spark.jar | - - seatunnel-core-flink.jar | - - plugins | - - config | - - bin

After (Example for mulit-version Elasticsearch 6.x and 7.x)

. apache-seatunnel | - - lib | - - seatunnel-core-spark.jar | - - seatunnel-core-flink.jar | - - connectors | - - flink | - - seatunnel-connector-flink-elasticsearch7.jar | - - seatunnel-connector-flink-kafka0.10.jar | - - other-all-lasted-version-connecotr-single.jar | - - spark | - - opt | - - flink | - - seatunnel-connector-flink-elasticsearch6.jar | - - seatunnel-connector-flink-kafka0.08.jar | - - other-all-older-version-connecotr-single.jar | - - spark | - - plugins | - - config | - - bin

After this finish, user can use differcult version connecotor on one release version, just need move jar from opt folder to connectors folder.

WorkFlow

Engine Implement Method

Flink

Flink use PipelineOptions.JARS and PipelineOptions.CLASSPATH to upload connector jars and make it can be loaded.

Spark

Spark use spark.jars properties to support connector jar execute on cluster.

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

discuss
opened by Hisoka-X 21
[Feature][seatunnel-examples] flink local environment run quickly and debug locally developed code easily
this closes #955

Purpose of this pull request

Check list

[x] Code changed are covered with tests, or it does not need tests for reason:

[x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

improve
opened by felix-thinkingdata 21
[Feature][API] SeaTunnel Transform API
Search before asking

[X] I had searched in the feature and found no similar feature requirement.

Description

We already have SeaTunnel Source API and SeaTunnel Sink API, but we don't have SeaTunnel Transform API now. We need SeaTunnel Transform API and it must have some key features:

Like source and sink, it is decoupled from the engine and can run on different engines.

In order to ensure seatunnel's positioning as a data integration platform and not introduce work beyond the plan, the SeaTunnel Transform API will only support UDF level data conversion.

In theory, UDF level transform does not require checkpoint and state storage.

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

stale
opened by EricJoy2048 18
[Feature][Connector-V2] add sqlserver connector
Purpose of this pull request

support sqlserver connector

Check list

[x] Code changed are covered with tests, or it does not need tests for reason:

[x] If any new Jar binary package adding in your PR, please add License Notice according New License Guide

[x] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

connectors-v2
opened by liugddx 18
[Feature] [config] Fix dependency conflict in seatunnel config when running SeatunnelFlink in idea with local mode#1186
Purpose of this pull request

This pull request fix the #1186 bug.It add a seatunnel-config-shade module and rename package name of code in seatunnel-config to avoid code conflict when running SeatunnelFlink in Intellij Idea with local mode.

Check list

[x] Code changed are covered with tests, or it does not need tests for reason:

[ ] If any new Jar binary package adding in you PR, please add License Notice according New License Guide

[ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

improve First-time contributor
opened by Yves-yuan 18
[Discuss][HTTP Connector] Add specified field function for all HTTP connector
Search before asking

[X] I had searched in the feature and found no similar feature requirement.

Description

So far, some http requests return data that cannot be parsed, such as array

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

Related tags

Overview

SeaTunnel

Why do we need SeaTunnel

SeaTunnel use scenarios

Features of SeaTunnel

Workflow of SeaTunnel

Plugins supported by SeaTunnel

Environmental dependency

Downloads

Quick start

Application practice cases

Code of conduct

Developer

Contact Us

Comments

Please Move To https://github.com/apache/incubator-seatunnel/issues/3018

Purpose of this pull request

Check list

Search before asking

Description

Are you willing to submit a PR?

Code of Conduct

Purpose of this pull request

Check list

Search before asking

Description

Now

After (Example for mulit-version Elasticsearch 6.x and 7.x)

WorkFlow

Engine Implement Method

Flink

Spark

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Purpose of this pull request

Check list

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Purpose of this pull request

Check list

Purpose of this pull request

Check list

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Purpose of this pull request

Check list

Search before asking

Description

Usage Scenario

Related issues

Are you willing to submit a PR?

Code of Conduct

Purpose of this pull request

Check list

Purpose of this pull request

Check list

Purpose of this pull request

Check list

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?