You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/07 01:16:20 UTC

[GitHub] [hudi] namuny opened a new issue, #5776: [SUPPORT] Performance degradation for listing partitions

namuny opened a new issue, #5776:
URL: https://github.com/apache/hudi/issues/5776

   I'm noticing a steep increase in duration for listing partitions during clustering, specifically after [this PR](https://github.com/apache/hudi/pull/4643) was merged. I'm yet to get to the bottom of exactly why, but reverting the implementation of FileSystemBackedTableMetadata.getAllPartitionPaths to 0.9.0's implementation gives me a performance boost.
   
   **Test results**:
   * 0.9.0 approach (but using 0.11.0 for everything else) - 50 seconds to list partitions
   * Pure 0.11.0 approach - over 20 minutes to list partitions
   
   **My setup**:
   * Hudi 0.11.0
   * CoW + inline clustering
   * Metadata table is disabled
   * Test results above is with 10,000 partitions, using S3.
   
   Regardless of why the metadata is disabled, I'm curious to understand why the partition listing time for 10,000 partitions goes from sub minute to 20+ minutes.
   
   
   **Expected behavior**
   
   There should not be a performance degradation when listing partitions for operations such as clustering.
   
   **Environment Description**
   
   * Hudi version : 0.11.0
   
   * Spark version : 3.1.2
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5776: [SUPPORT] Performance degradation for listing partitions

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5776:
URL: https://github.com/apache/hudi/issues/5776#issuecomment-1151931084

   https://github.com/apache/hudi/pull/5829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5776: [SUPPORT] Performance degradation for listing partitions

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5776:
URL: https://github.com/apache/hudi/issues/5776#issuecomment-1151928249

   @namuny : yes, looks like it regressed. https://issues.apache.org/jira/browse/HUDI-4221 
   I am reverting the change. will put up a PR. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5776: [SUPPORT] Performance degradation for listing partitions

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5776:
URL: https://github.com/apache/hudi/issues/5776#issuecomment-1151931305

   thanks for pointing it out. since we have a PR, closing it out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #5776: [SUPPORT] Performance degradation for listing partitions

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #5776: [SUPPORT] Performance degradation for listing partitions
URL: https://github.com/apache/hudi/issues/5776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org