You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/08/10 09:43:02 UTC

[GitHub] [hadoop] mukund-thakur edited a comment on pull request #2207: HADOOP-17074 Optimise s3a Listing to be fully asynchronous.

mukund-thakur edited a comment on pull request #2207:
URL: https://github.com/apache/hadoop/pull/2207#issuecomment-671252550


   Performance result using new test:
   `2020-08-10 15:04:35,966 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listFiles() api with batch size of 10 including 10ms of processing time for each file: 12,039,952,465 nS
   2020-08-10 15:04:52,170 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listStatus() api with batch size of 10 including 10ms of processing time for each file: 16,088,964,963 nS`
   
   We can see an improvement of 4s with these configs.
   Result when the same test is run in trunk having sync listing.
   
   `2020-08-10 15:10:03,815 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listFiles() api with batch size of 10 including 10ms of processing time for each file: 16,722,638,860 nS
   2020-08-10 15:10:20,293 [JUnit-testMultiPagesListingPerformanceAndCorrectness] INFO  contract.ContractTestUtils (ContractTestUtils.java:end(1847)) - Duration of listing 1000 files using listStatus() api with batch size of 10 including 10ms of processing time for each file: 16,364,577,964 nS`
   
   It is evident from the logs that without the improvements, listStatus and listFiles took same time.
   
   CC @steveloughran @mehakmeet @bgaborg


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org