You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/09 09:23:06 UTC

[GitHub] [hudi] kingkongpoon opened a new issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

kingkongpoon opened a new issue #2557:
URL: https://github.com/apache/hudi/issues/2557


   I  run my process with spark on yarn,
   the table type I have tried COW and MOR,
   When I first run the cow table (SaveMode.Overwrite), it's very fast.(about 700MB data in hdfs)
   but when I run an increment(SaveMode.Append), it's very slowly,and throw error
   like
   ```
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
   	at org.apache.hadoop.util.Shell.run(Shell.java:869)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   ```
   21/02/09 16:52:21 ERROR [dispatcher-event-loop-11] YarnScheduler: Lost executor 10 on node1: Container from a bad node: container_e10_1610102487810_33748_01_000012 on host: node1. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/09 16:52:23 ERROR [dispatcher-event-loop-11] YarnScheduler: Lost executor 4 on node1: Container from a bad node: container_e10_1610102487810_33748_01_000005 on host: node1. Exit status: 1. Diagnostics: Exception from container-launch.
   Container id: container_e10_1610102487810_33748_01_000005
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
   	at org.apache.hadoop.util.Shell.run(Shell.java:869)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   ```
   and my cluster's total memory is about 40GB 
   
   I run with this conf
   spark-submit --master yarn --driver-memory 4G --executor-memory 8G --executor-cores 4 --num-executors 10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-854309671


   @kingkongpoon Gentle ping if your issue still persists and if there is a way to reproduce it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-776032523


   >When I first run the cow table (SaveMode.Overwrite), it's very fast.(about 700MB data in hdfs). but when I run an increment(SaveMode.Append), it's very slowly,and throw error
   
   Spark OOMs are hard to guess. Not sure if its directly related to Hudi. have you looked at the tuning guide? gc settings, memory overhead? https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kingkongpoon edited a comment on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

kingkongpoon edited a comment on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-783209847


   > To help investigate better
   > 
   > * Can you post the configs you used to write to hudi.
   > * Can you post a screen shot of spark stages. So that we know where its failing and can relate to some configs used.
   > * Can you give some rough idea of your dataset record keys. Is it completely random or does it have some ordering to it. what it is made of.
   > * I assume you are using regular bloom as index type.
   
   my Spark code configure
   ```
   input
         .write.format("org.apache.hudi")
         .option("hoodie.cleaner.commits.retained", 1)
         .option("hoodie.keep.min.commits", 2)
         .option("hoodie.keep.max.commits", 3)
         .option("hoodie.insert.shuffle.parallelism", 30)
         .option("hoodie.upsert.shuffle.parallelism", 30)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uuid")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "etl_modify_time")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY, classOf[DefaultHoodieRecordPayload].getName)
         .option(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP, "etl_modify_time")
         .option("hoodie.table.name", "std_order") 
         .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, hiveserver2)
         .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "dwd_std")
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "std_order")
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, classOf[ComplexKeyGenerator].getName)
         .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[MultiPartKeysValueExtractor].getName)
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
         .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
         .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.GLOBAL_BLOOM.name())
         .mode(SaveMode.Overwrite)
   //    .mode(SaveMode.Append)
         .save(basePath)
   
   ```
   ```
   spark-submit --master yarn --driver-memory 4G --executor-memory  8G --executor-cores 4 --num-executors 10 
   --conf spark.executor.memoryOverhead=4G  --conf spark.yarn.max.ex.ilures=100  
   --class com.qmtec.peony.newcrm.hudi.process  --jars hudi-hadoop-mr-bundle-0.7.0.jar 
   --jars hudi-hive-sync-bundle-0.7.0.jar --jars hudi-spark-bundle_2.11-0.7.0.jar qmtec-peony-etl-hudi-1.0.jar 
   ```
   
   uuid is the tid from order table data,and it is unique,When I first wirte data in HDFS, .mode(SaveMode.Overwrite), and create hive table successfully ,the file in HDFS is about 520MB.
   But when I use the same code,configure and data in .mode(SaveMode.Append), the process will throw errors
   ```
   21/02/22 15:57:43 ERROR [dispatcher-event-loop-5] YarnScheduler: Lost executor 4 on emr-worker-2.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000005 on host: emr-worker-2.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:57:45 ERROR [dispatcher-event-loop-7] YarnScheduler: Lost executor 5 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000006 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:12 ERROR [dispatcher-event-loop-2] YarnScheduler: Lost executor 7 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000009 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:31 ERROR [dispatcher-event-loop-4] YarnScheduler: Lost executor 8 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000010 on host: emr-worker-4.cluster-47763. Exit status: 1. Diagnostics: Exception from container-launch.
   Container id: container_e10_1610102487810_52481_01_000010
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
   	at org.apache.hadoop.util.Shell.run(Shell.java:869)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   
   ```
   but sometime it can run successfully ,it wil have two parquet files,and each parquet file are also ablout 520MB
   and the table root path has a .hoodie file ,when I run each time ,this file will become bigger 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-854309671


   @kingkongpoon Gentle ping if your issue still persists and if there is a way to reproduce it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kingkongpoon commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

kingkongpoon commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-783209847


   > To help investigate better
   > 
   > * Can you post the configs you used to write to hudi.
   > * Can you post a screen shot of spark stages. So that we know where its failing and can relate to some configs used.
   > * Can you give some rough idea of your dataset record keys. Is it completely random or does it have some ordering to it. what it is made of.
   > * I assume you are using regular bloom as index type.
   
   my Spark code configure
   ```
   input
         .write.format("org.apache.hudi")
         .option("hoodie.cleaner.commits.retained", 1)
         .option("hoodie.keep.min.commits", 2)
         .option("hoodie.keep.max.commits", 3)
         .option("hoodie.insert.shuffle.parallelism", 30)
         .option("hoodie.upsert.shuffle.parallelism", 30)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uuid")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "etl_modify_time")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY, classOf[DefaultHoodieRecordPayload].getName)
         .option(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP, "etl_modify_time")
         .option("hoodie.table.name", "std_order") 
         .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, hiveserver2)
         .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "dwd_std")
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "std_order")
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, classOf[ComplexKeyGenerator].getName)
         .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[MultiPartKeysValueExtractor].getName)
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
         .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
         .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.GLOBAL_BLOOM.name())
         .mode(SaveMode.Overwrite)
   //    .mode(SaveMode.Append)
         .save(basePath)
   
   ```
   ```
   spark-submit --master yarn --driver-memory 4G --executor-memory  8G --executor-cores 4 --num-executors 10 
   --conf spark.executor.memoryOverhead=4G  --conf spark.yarn.max.ex.ilures=100  
   --class com.qmtec.peony.newcrm.hudi.process  --jars hudi-hadoop-mr-bundle-0.7.0.jar 
   --jars hudi-hive-sync-bundle-0.7.0.jar --jars hudi-spark-bundle_2.11-0.7.0.jar qmtec-peony-etl-hudi-1.0.jar 
   ```
   
   uuid is the tid from order table data,and it's unique,When I first wirte data in HDFS, .mode(SaveMode.Overwrite), and create hive table successfully ,the file in HDFS is about 520MB.
   But when I use the same code,configure and data in .mode(SaveMode.Append), the process will throw errors
   ```
   21/02/22 15:57:43 ERROR [dispatcher-event-loop-5] YarnScheduler: Lost executor 4 on emr-worker-2.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000005 on host: emr-worker-2.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:57:45 ERROR [dispatcher-event-loop-7] YarnScheduler: Lost executor 5 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000006 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:12 ERROR [dispatcher-event-loop-2] YarnScheduler: Lost executor 7 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000009 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:31 ERROR [dispatcher-event-loop-4] YarnScheduler: Lost executor 8 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000010 on host: emr-worker-4.cluster-47763. Exit status: 1. Diagnostics: Exception from container-launch.
   Container id: container_e10_1610102487810_52481_01_000010
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
   	at org.apache.hadoop.util.Shell.run(Shell.java:869)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-782851569


   To help investigate better
   - Can you post the configs you used to write to hudi.
   - Can you post a screen shot of spark stages. So that we know where its failing and can relate to some configs used. 
   - Can you give some rough idea of your dataset record keys. Is it completely random or does it have some ordering to it. what it is made of.
   - I assume you are using regular bloom as index type. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash closed issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

n3nash closed issue #2557:
URL: https://github.com/apache/hudi/issues/2557


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kingkongpoon commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

kingkongpoon commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-783182791


   > > When I first run the cow table (SaveMode.Overwrite), it's very fast.(about 700MB data in hdfs). but when I run an increment(SaveMode.Append), it's very slowly,and throw error
   > 
   > Spark OOMs are hard to guess. Not sure if its directly related to Hudi. have you looked at the tuning guide? gc settings, memory overhead? https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide
   
   I have tried your advice , it seems to be better ,but the same code , configre and data, sometime also OOM, 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-824532918


   @kingkongpoon Please let us know if this issue persists, else we can close this ticket


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash closed issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

n3nash closed issue #2557:
URL: https://github.com/apache/hudi/issues/2557


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-810429566


   once you respond, can you please remove "awaiting-user-response" label for the issue. If possible add "awaiting-community-help" label. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kingkongpoon edited a comment on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

kingkongpoon edited a comment on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-783209847


   > To help investigate better
   > 
   > * Can you post the configs you used to write to hudi.
   > * Can you post a screen shot of spark stages. So that we know where its failing and can relate to some configs used.
   > * Can you give some rough idea of your dataset record keys. Is it completely random or does it have some ordering to it. what it is made of.
   > * I assume you are using regular bloom as index type.
   
   my Spark code configure
   ```
   input
         .write.format("org.apache.hudi")
         .option("hoodie.cleaner.commits.retained", 1)
         .option("hoodie.keep.min.commits", 2)
         .option("hoodie.keep.max.commits", 3)
         .option("hoodie.insert.shuffle.parallelism", 30)
         .option("hoodie.upsert.shuffle.parallelism", 30)
         .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
         .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL)
         .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "uuid")
         .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "etl_modify_time")
         .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY, classOf[DefaultHoodieRecordPayload].getName)
         .option(HoodiePayloadProps.PAYLOAD_ORDERING_FIELD_PROP, "etl_modify_time")
         .option("hoodie.table.name", "std_order") 
         .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, hiveserver2)
         .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY, "dwd_std")
         .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, "std_order")
         .option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY, classOf[ComplexKeyGenerator].getName)
         .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "created_year,created_month,created_day,brand_id")
         .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[MultiPartKeysValueExtractor].getName)
         .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
         .option(HoodieIndexConfig.BLOOM_INDEX_UPDATE_PARTITION_PATH, "true")
         .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.GLOBAL_BLOOM.name())
         .mode(SaveMode.Overwrite)
   //    .mode(SaveMode.Append)
         .save(basePath)
   
   ```
   ```
   spark-submit --master yarn --driver-memory 4G --executor-memory  8G --executor-cores 4 --num-executors 10 
   --conf spark.executor.memoryOverhead=4G  --conf spark.yarn.max.ex.ilures=100  
   --class com.qmtec.peony.newcrm.hudi.process  --jars hudi-hadoop-mr-bundle-0.7.0.jar 
   --jars hudi-hive-sync-bundle-0.7.0.jar --jars hudi-spark-bundle_2.11-0.7.0.jar qmtec-peony-etl-hudi-1.0.jar 
   ```
   
   uuid is the tid from order table data,and it is unique,When I first wirte data in HDFS, .mode(SaveMode.Overwrite), and create hive table successfully ,the file in HDFS is about 520MB.
   But when I use the same code,configure and data in .mode(SaveMode.Append), the process will throw errors
   ```
   21/02/22 15:57:43 ERROR [dispatcher-event-loop-5] YarnScheduler: Lost executor 4 on emr-worker-2.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000005 on host: emr-worker-2.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:57:45 ERROR [dispatcher-event-loop-7] YarnScheduler: Lost executor 5 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000006 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:12 ERROR [dispatcher-event-loop-2] YarnScheduler: Lost executor 7 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000009 on host: emr-worker-4.cluster-47763. Exit status: 137. Diagnostics: Container killed on request. Exit code is 137
   Container exited with a non-zero exit code 137
   Killed by external signal
   .
   21/02/22 15:58:31 ERROR [dispatcher-event-loop-4] YarnScheduler: Lost executor 8 on emr-worker-4.cluster-47763: Container from a bad node: container_e10_1610102487810_52481_01_000010 on host: emr-worker-4.cluster-47763. Exit status: 1. Diagnostics: Exception from container-launch.
   Container id: container_e10_1610102487810_52481_01_000010
   Exit code: 1
   Stack trace: ExitCodeException exitCode=1: 
   	at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
   	at org.apache.hadoop.util.Shell.run(Shell.java:869)
   	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
   	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
   	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   
   Container exited with a non-zero exit code 1
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2557:
URL: https://github.com/apache/hudi/issues/2557#issuecomment-809490684


   @kingkongpoon : sorry for the late follow up. May I know why do you use Global bloom. Our default recommendation is to use regular "BLOOM" if you don't have any special requirement to use global version. Global bloom is expected to be slower depending on characteristics of your record keys. 
   If your record keys are completely random (no ordering or any timestamp in them), we would recommend using "SIMPLE" index which should perform better when compared to "BLOOM" index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org