You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "wang-zhiang (via GitHub)" <gi...@apache.org> on 2023/03/30 10:28:05 UTC

[GitHub] [incubator-seatunnel] wang-zhiang opened a new issue, #4455: seatunnel An error occurs when importing hbase

wang-zhiang opened a new issue, #4455:
URL: https://github.com/apache/incubator-seatunnel/issues/4455

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   I want to import a test data of mongo into hbase, and I have built a table in hbase, but an error was reported during execution, and the error message was not obvious. I suspect this is a bug, and I hope you can give me an answer
   
   ### SeaTunnel Version
   
   2.1.2
   
   ### SeaTunnel Config
   
   ```conf
   #!/bin/bash
   
   env {
       execution.parallelism = 20
       spark.executor.cores = 1
       spark.executor.memory = "6g"
   }
   
   
   source {
     mongodb {
         readconfig.uri = "mongodb://smartpath:smartpthdata@192.168.5.101:27017,192.168.5.102:27017,192.168.5.103:27017/admin"
         readconfig.database = "test2"
         readconfig.collection = ${sqlserver_table}
         readconfig.spark.mongodb.input.partitioner = "MongoPaginateBySizePartitioner"
         schema="{\"_id\": \"string\",\"name\": \"string\"}"
         result_table_name = "mongodb_result_table"
     }
   }
   
   
   transform {
   }
   
   sink {
    hbase {
       source_table_name = "mongodb_result_table"
       hbase.zookeeper.quorum = "hadoop104:2181,hadoop105:2181,hadoop106:2181,hadoop107:2181,hadoop108:2181,hadoop109:2181,hadoop110:2181"
       catalog ="{\"table\":{ \"namespace\":\"test1\", \"name\":\"test66\"},\"rowkey\":\"_id\",\"columns\":{\"_id\":{\"cf\":\"rowkey\", \"col\":\"_id\", \"type\":\"string\"},\"name\":{\"cf\":\"info\", \"col\":\"name\", \"type\":\"string\"}}}"
       staging_dir = "/hbase/test1/test66/"
       save_mode = "overwrite"
       hbase.bulkload.retries.number = "0"
    }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   /opt/module/seatunnel-2.1.2/bin/start-seatunnel-spark.sh \
           --master spark://192.168.5.104:7077 \
           --deploy-mode client \
           --config /opt/module/seatunnel-2.1.2/script_spark/test/mongo-hbase-test.conf\
           --variable sqlserver_table="copy1"
   ```
   
   
   ### Error Exception
   
   ```log
   2023-03-30 05:48:44,701 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on hadoop104:41444 in memory (size: 2.8 KB, free: 366.3 MB)
   2023-03-30 05:48:44,724 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on 192.168.5.107:44381 in memory (size: 2.8 KB, free: 3.0 GB)
   2023-03-30 05:48:44,770 WARN tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into table test1:test88 with files [family:info path:hdfs://mycluster/hbase/test1/test88/1680169709262/info/d73b5e5892e94c59ab162a55d233f8e2] failed.  This is recoverable and they will be retried.
   2023-03-30 05:48:44,777 INFO tool.LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 1 with 1 files remaining to group or split
   2023-03-30 05:48:44,778 INFO hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
   2023-03-30 05:48:44,786 INFO tool.LoadIncrementalHFiles: Trying to load hfile=hdfs://mycluster/hbase/test1/test88/1680169709262/info/d73b5e5892e94c59ab162a55d233f8e2 first=Optional[62e8df0cb7020000830054b2] last=Optional[ewfefw]
   2023-03-30 05:48:44,801 WARN tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into table test1:test88 with files [family:info path:hdfs://mycluster/hbase/test1/test88/1680169709262/info/d73b5e5892e94c59ab162a55d233f8e2] failed.  This is recoverable and they will be retried.
   2023-03-30 05:48:44,806 INFO tool.LoadIncrementalHFiles: Split occurred while grouping HFiles, retry attempt 2 with 1 files remaining to group or split
   2023-03-30 05:48:44,835 ERROR tool.LoadIncrementalHFiles: -------------------------------------------------
   Bulk load aborted with some files not yet loaded:
   -------------------------------------------------
     hdfs://mycluster/hbase/test1/test88/1680169709262/info/d73b5e5892e94c59ab162a55d233f8e2
   
   2023-03-30 05:48:44,836 INFO client.ConnectionImplementation: Closing master protocol: MasterService
   2023-03-30 05:48:44,838 INFO zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x44f23927 to hadoop104:2181,hadoop105:2181,hadoop106:2181,hadoop107:2181,hadoop108:2181,hadoop109:2181,hadoop110:2181
   2023-03-30 05:48:44,842 INFO zookeeper.ZooKeeper: Session: 0x70052749b6f004c closed
   2023-03-30 05:48:44,842 INFO zookeeper.ClientCnxn: EventThread shut down
   2023-03-30 05:48:44,955 ERROR base.Seatunnel: 
   
   ===============================================================================
   
   
   2023-03-30 05:48:44,956 ERROR base.Seatunnel: Fatal Error, 
   
   2023-03-30 05:48:44,956 ERROR base.Seatunnel: Please submit bug report in https://github.com/apache/incubator-seatunnel/issues
   
   2023-03-30 05:48:44,956 ERROR base.Seatunnel: Reason:Execute Spark task error 
   
   2023-03-30 05:48:44,960 ERROR base.Seatunnel: Exception StackTrace:java.lang.RuntimeException: Execute Spark task error
   	at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:79)
   	at org.apache.seatunnel.core.base.Seatunnel.run(Seatunnel.java:39)
   	at org.apache.seatunnel.core.spark.SeatunnelSpark.main(SeatunnelSpark.java:32)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.io.IOException: Retry attempted 2 times without completing, bailing out
   	at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.performBulkLoad(LoadIncrementalHFiles.java:420)
   	at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:343)
   	at org.apache.hadoop.hbase.tool.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:256)
   	at org.apache.seatunnel.spark.hbase.sink.Hbase.output(Hbase.scala:132)
   	at org.apache.seatunnel.spark.hbase.sink.Hbase.output(Hbase.scala:41)
   	at org.apache.seatunnel.spark.SparkEnvironment.sinkProcess(SparkEnvironment.java:179)
   	at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:54)
   	at org.apache.seatunnel.core.spark.command.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:76)
   	... 14 more
    
   2023-03-30 05:48:44,960 ERROR base.Seatunnel:
   ```
   
   
   ### Flink or Spark Version
   
   spark2.4
   
   ### Java or Scala Version
   
   1.8
   
   ### Screenshots
   
   fail in import
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #4455: seatunnel An error occurs when importing hbase

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4455:
URL: https://github.com/apache/incubator-seatunnel/issues/4455#issuecomment-1528904751

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] commented on issue #4455: seatunnel An error occurs when importing hbase

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4455:
URL: https://github.com/apache/incubator-seatunnel/issues/4455#issuecomment-1541052533

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] github-actions[bot] closed issue #4455: seatunnel An error occurs when importing hbase

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4455: seatunnel An error occurs when importing hbase
URL: https://github.com/apache/incubator-seatunnel/issues/4455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org