You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/08/17 15:02:41 UTC

[GitHub] [incubator-seatunnel] Bingz2 opened a new issue, #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE

Bingz2 opened a new issue, #2449:
URL: https://github.com/apache/incubator-seatunnel/issues/2449

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   Error running Spark Connector V2 example using IDE
   
   ### SeaTunnel Version
   
   dev
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set spark configuration here
     # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
     #job.mode = BATCH
     spark.app.name = "SeaTunnel"
     spark.executor.instances = 2
     spark.executor.cores = 1
     spark.executor.memory = "1g"
     spark.master = local
   }
   
   source {
     # This is a example input plugin **only for test and demonstrate the feature input plugin**
     FakeSource {
       result_table_name = "fake"
       field_name = "name,age,timestamp"
     }
   
     # You can also use other input plugins, such as hdfs
     # hdfs {
     #   result_table_name = "accesslog"
     #   path = "hdfs://hadoop-cluster-01/nginx/accesslog"
     #   format = "json"
     # }
   
     # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
     # please go to https://seatunnel.apache.org/docs/spark/configuration/source-plugins/Fake
   }
   
   transform {
     # split data by specific delimiter
   
     # you can also use other transform plugins, such as sql
     sql {
       sql = "select name,age from fake"
       result_table_name = "sql"
     }
   
     # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
     # please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Split
   }
   
   sink {
     # choose stdout output plugin to output data to console
     LocalFile {
        format = "orc"
        path = "D:/workspace/test/st"
        file_name_expression = "orc"
     }
   
     # you can also you other output plugins, such as sql
     # hdfs {
     #   path = "hdfs://hadoop-cluster-01/nginx/accesslog_processed"
     #   save_mode = "append"
     # }
   
     # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
     # please go to https://seatunnel.apache.org/docs/spark/configuration/sink-plugins/Console
   }
   ```
   
   
   ### Running Command
   
   ```shell
   Run the Spark Connector v2 Example using a local IDE
   ```
   
   
   ### Error Exception
   
   ```log
   22/08/17 22:49:03 INFO Executor: Starting executor ID driver on host localhost
   22/08/17 22:49:03 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53914.
   22/08/17 22:49:03 INFO NettyBlockTransferService: Server created on GITV:53914
   22/08/17 22:49:03 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
   22/08/17 22:49:03 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, GITV, 53914, None)
   22/08/17 22:49:04 INFO BlockManagerMasterEndpoint: Registering block manager GITV:53914 with 1965.3 MB RAM, BlockManagerId(driver, GITV, 53914, None)
   22/08/17 22:49:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, GITV, 53914, None)
   22/08/17 22:49:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, GITV, 53914, None)
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6f95cd51{/metrics/json,null,AVAILABLE,@Spark}
   22/08/17 22:49:04 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load SeaTunnelSource Plugin from D:\workspace\idea\seatunnel\incubator-seatunnel\seatunnel-common\connectors\seatunnel
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load plugin: PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='FakeSource'} from classpath
   22/08/17 22:49:04 INFO SparkEnvironment: register plugins :[]
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load BaseSparkTransform Plugin from D:\workspace\idea\seatunnel\incubator-seatunnel\seatunnel-common\connectors\seatunnel
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load plugin: PluginIdentifier{engineType='seatunnel', pluginType='transform', pluginName='sql'} from classpath
   22/08/17 22:49:04 INFO SparkEnvironment: register plugins :[]
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load SeaTunnelSink Plugin from D:\workspace\idea\seatunnel\incubator-seatunnel\seatunnel-common\connectors\seatunnel
   22/08/17 22:49:04 INFO AbstractPluginDiscovery: Load plugin: PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='LocalFile'} from classpath
   22/08/17 22:49:04 INFO SparkEnvironment: register plugins :[]
   22/08/17 22:49:04 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/D:/workspace/idea/seatunnel/incubator-seatunnel/spark-warehouse/').
   22/08/17 22:49:04 INFO SharedState: Warehouse path is 'file:/D:/workspace/idea/seatunnel/incubator-seatunnel/spark-warehouse/'.
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7c8326a4{/SQL,null,AVAILABLE,@Spark}
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@77128dab{/SQL/json,null,AVAILABLE,@Spark}
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6f012914{/SQL/execution,null,AVAILABLE,@Spark}
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@18fdb6cf{/SQL/execution/json,null,AVAILABLE,@Spark}
   22/08/17 22:49:04 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@720653c2{/static/sql,null,AVAILABLE,@Spark}
   22/08/17 22:49:05 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
   22/08/17 22:49:07 ERROR SparkApiTaskExecuteCommand: Run SeaTunnel on spark failed.
   java.lang.RuntimeException: file_name_expression must contains transactionId when is_enable_transaction is true
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.config.TextFileSinkConfig.<init>(TextFileSinkConfig.java:112)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.AbstractFileSink.getSinkConfig(AbstractFileSink.java:143)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.AbstractFileSink.createAggregatedCommitter(AbstractFileSink.java:114)
   	at org.apache.seatunnel.translation.spark.sink.SparkDataSourceWriter.<init>(SparkDataSourceWriter.java:48)
   	at org.apache.seatunnel.translation.spark.sink.SparkSink.createWriter(SparkSink.java:67)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:254)
   	at org.apache.seatunnel.core.starter.spark.execution.SinkExecuteProcessor.execute(SinkExecuteProcessor.java:75)
   	at org.apache.seatunnel.core.starter.spark.execution.SparkExecution.execute(SparkExecution.java:60)
   	at org.apache.seatunnel.core.starter.spark.command.SparkApiTaskExecuteCommand.execute(SparkApiTaskExecuteCommand.java:54)
   	at org.apache.seatunnel.core.starter.Seatunnel.run(Seatunnel.java:40)
   	at org.apache.seatunnel.example.spark.v2.ExampleUtils.builder(ExampleUtils.java:43)
   	at org.apache.seatunnel.example.spark.v2.SeaTunnelApiExample.main(SeaTunnelApiExample.java:28)
   22/08/17 22:49:07 INFO SparkContext: Invoking stop() from shutdown hook
   22/08/17 22:49:07 INFO AbstractConnector: Stopped Spark@9bd0fa6{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
   22/08/17 22:49:07 INFO SparkUI: Stopped Spark web UI at http://GITV:4040
   22/08/17 22:49:07 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   22/08/17 22:49:07 INFO MemoryStore: MemoryStore cleared
   22/08/17 22:49:07 INFO BlockManager: BlockManager stopped
   22/08/17 22:49:07 INFO BlockManagerMaster: BlockManagerMaster stopped
   22/08/17 22:49:07 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   22/08/17 22:49:07 INFO SparkContext: Successfully stopped SparkContext
   22/08/17 22:49:07 INFO ShutdownHookManager: Shutdown hook called
   ```
   
   
   ### Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on issue #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on issue #2449:
URL: https://github.com/apache/incubator-seatunnel/issues/2449#issuecomment-1219059425

   `java.lang.RuntimeException: file_name_expression must contains transactionId when is_enable_transaction is true`
   
   The `file_name_expression` is not a required parameter, the default value is `${transactionId}`. But if you want set `file_name_expression` value, the value must contain the `${transactionId}` part.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on issue #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE

Posted by GitBox <gi...@apache.org>.
TyrantLucifer commented on issue #2449:
URL: https://github.com/apache/incubator-seatunnel/issues/2449#issuecomment-1218309875

   Please refer to these demo config files and change your config of local source.
   
   https://github.com/apache/incubator-seatunnel/tree/dev/seatunnel-e2e%2Fseatunnel-spark-connector-v2-e2e%2Fsrc%2Ftest%2Fresources%2Ffile


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Bingz2 commented on issue #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE

Posted by GitBox <gi...@apache.org>.
Bingz2 commented on issue #2449:
URL: https://github.com/apache/incubator-seatunnel/issues/2449#issuecomment-1219117745

   According to the example configuration in e2e, it can be modified to run normally,Thank you so much!
   ```
    sink {
     # choose stdout output plugin to output data to console
     LocalFile {
       path="D:/workspace/test/st"
       partition_by=["age"]
       partition_dir_expression="${k0}=${v0}"
       is_partition_field_write_in_file=true
       file_name_expression="${transactionId}_${now}"
       file_format="orc"
       filename_time_format="yyyy.MM.dd"
       is_enable_transaction=true
       save_mode="error"
     }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Bingz2 closed issue #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE

Posted by GitBox <gi...@apache.org>.
Bingz2 closed issue #2449: [Bug] [Connector-V2][File Local Sink] Error running Spark Connector V2 example using IDE
URL: https://github.com/apache/incubator-seatunnel/issues/2449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org