You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/22 19:31:45 UTC

[GitHub] [hudi] tzhang-fetch opened a new issue, #6750: [SUPPORT] SqlQueryBasedTransformer causes memory issues

tzhang-fetch opened a new issue, #6750:
URL: https://github.com/apache/hudi/issues/6750

   **Describe the problem you faced**
   
   With a DeltaStreamer job that runs fine before, adding a SqlQueryBasedTransformer that only SELECTs 1 column runs into memory issues.
   
   `"--transformer-class",
               "org.apache.hudi.utilities.transform.SqlQueryBasedTransformer",
               "--hoodie-conf",
               "hoodie.deltastreamer.transformer.sql=SELECT a.ATTRIBUTES FROM <SRC> a"`
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Add SqlQueryBasedTransformer with simple SELECT statement to a DeltaStreamer job
   2. Run job
   3.
   4.
   
   **Expected behavior**
   
   Getting back one column from the job, without memory issues
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.2
   
   * Hive version : - 
   
   * Hadoop version : 3.1.2
   
   * Storage (HDFS/S3/GCS..) : Reading from Kafka, storing in S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Some additional screenshots and messages in this slack thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1663698444989499
   
   **Stacktrace**
   
   ```2022-09-19T21:45:44.212+0000: [GC (Allocation Failure) [PSYoungGen: 546303K->25113K(2446848K)] 598196K->77023K(8039424K), 0.0236729 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]                      │
   │ 2022-09-19T21:45:44.236+0000: [GC (Allocation Failure) [PSYoungGen: 25113K->25029K(2758656K)] 77023K->76946K(8351232K), 0.0177561 secs] [Times: user=0.02 sys=0.02, real=0.02 secs]                        │
   │ 2022-09-19T21:45:44.254+0000: [Full GC (Allocation Failure) [PSYoungGen: 25029K->0K(2758656K)] [ParOldGen: 51917K->54295K(5592576K)] 76946K->54295K(8351232K), [Metaspace: 112463K->112463K(1155072K)], 0. │
   │ 2022-09-19T21:45:44.378+0000: [GC (Allocation Failure) [PSYoungGen: 0K->0K(2720768K)] 54295K->54295K(8313344K), 0.0035697 secs] [Times: user=0.00 sys=0.00, real=0.01 secs]                                │
   │ 2022-09-19T21:45:44.381+0000: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(2720768K)] [ParOldGen: 54295K->45261K(5592576K)] 54295K->45261K(8313344K), [Metaspace: 112463K->109953K(1155072K)], 0.1912 │
   │ #                                                                                                                                                                                                          │
   │ # java.lang.OutOfMemoryError: Java heap space                                                                                                                                                              │
   │ # -XX:OnOutOfMemoryError="kill -9 %p"                                                                                                                                                                      │
   │ #   Executing /bin/sh -c "kill -9 22"..```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #6750: [SUPPORT] SqlQueryBasedTransformer causes memory issues

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #6750:
URL: https://github.com/apache/hudi/issues/6750#issuecomment-1529062128

   @tzhang-fetch Couldn't reproduce this issue as SQL transformer is working fine.
   
   Are you saying with same executor and driver memory , hudi job got killed when using sql transformer. 
   Are you still Can you let us know the memory configs used?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [SUPPORT] SqlQueryBasedTransformer causes memory issues [hudi]

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope closed issue #6750: [SUPPORT] SqlQueryBasedTransformer causes memory issues
URL: https://github.com/apache/hudi/issues/6750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [SUPPORT] SqlQueryBasedTransformer causes memory issues [hudi]

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #6750:
URL: https://github.com/apache/hudi/issues/6750#issuecomment-1755517833

   @tzhang-fetch Closing out this issue for now due to no activity. Please reopen in case you have any concerns. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org