You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/25 19:57:11 UTC

[GitHub] [hudi] kazdy opened a new issue, #5426: [SUPPORT]

kazdy opened a new issue, #5426:
URL: https://github.com/apache/hudi/issues/5426

   **Describe the problem you faced**
   
   I observed a weird thing in hudi metrics (i use cloudwatch reporter). 
   When I stopped the structured streaming job it suddenly added around 100 commits to the metrics (see the picture). Same with pending compaction metrics, it suddenly started showing 178 pending compactions.
   This is the second time I see this when stopping job.
   Timeline has as many commits as expected. It's only in metrics.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Enable cloudwatch metrics reporter
   2. Run structured streaming job with forEachBatch() sink
   3. Stop job using `yarn application -kill appid`
   4. Observe additional commitsin metrics
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.10.1 OSS
   
   * Spark version : 3.1.2-amzn (EMR on Ec2 with Yarn)
   
   * Hive version : --
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   code:
   https://gist.github.com/kazdy/a3a95aecf0a7dfb6b9ba62a54e9214c9
   spark-submit command:
   ```
   spark-submit \
   --master yarn \
   --deploy-mode cluster \
   --executor-memory 9g \
   --driver-memory 24g \
   --executor-cores 2 \
   --driver-cores 4 \
   --conf "spark.dynamicAllocation.executorIdleTimeout=600" \
   --conf "spark.driver.extraJavaOptions=-XX:+PrintTenuringDistribution -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof" \
   --conf "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof" \
   --conf "spark.yarn.max.executor.failures=100" \
   --conf "spark.task.maxFailures=4" \
   --conf "spark.rdd.compress=true" \
   --conf "spark.shuffle.compress=true" \
   --conf "spark.shuffle.spill.compress=true" \
   --conf "spark.kryoserializer.buffer.max=512m" \
   --conf "hoodie.upsert.shuffle.parallelism=10" \
   --conf "hoodie.insert.shuffle.parallelism=10" \
   --conf "spark.sql.shuffle.partitions=8" \
   --conf "spark.default.parallelism=8" \
   --conf "spark.driver.maxResultSize=4g" \
   --conf "spark.streaming.stopGracefullyOnShutdown=true" \
   --conf "spark.streaming.backpressure.enabled=true" \
   --conf "spark.driver.memoryOverhead=3000" \
   --conf "spark.executor.memoryOverhead=2048" \
   --packages "org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.10.1,org.apache.spark:spark-avro_2.12:3.1.2" \
   s3://bucket/mor_streaming.py
   ```
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy closed issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy closed issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown
URL: https://github.com/apache/hudi/issues/5426


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1160597602

   @kazdy Any udpate from aws-support?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by "kazdy (via GitHub)" <gi...@apache.org>.
kazdy commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1542822915

   I was able to reproduce it a couple of times, usually after shutting down long running spark structured streaming jobs. I only used cloudwatch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by GitBox <gi...@apache.org>.
kazdy commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1110279697

   Hi @yihua thanks,
   I've reached to AWS/EMR support. I'll give an update on what's their response.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] kazdy commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by GitBox <gi...@apache.org>.
kazdy commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1160605460

   Unfortunately not, closed the case as they couldn't replicate this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1110215397

   @kazdy Thanks for reporting the issue.  I'll look into the issue.  Meanwhile, have you contacted AWS/EMR support to see if they have any special handling of the termination of Spark structured streaming job from the libraries they provided (since they have their own version of Spark 3.1.2-amzn), that can potentially cause this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5426: [SUPPORT] Cloudwatch metrics - spikes on structured streaming app shutdown

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5426:
URL: https://github.com/apache/hudi/issues/5426#issuecomment-1299685045

   @kazdy : is this consistently reproducible? is this a case only w/ cloud watch metrics or even w/ other metrics reporter as well ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org