You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/27 06:59:02 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #5141: [WIP] Fixing closure of ParquetReader

nsivabalan opened a new pull request #5141:
URL: https://github.com/apache/hudi/pull/5141


   ## What is the purpose of the pull request
   
   We were running integration tests against hudi and in recent times we are seeing "too many open files" and the spark long running COW tests fails. Looks like we don't close the parquet reader in couple of places. Fixing the closure in this patch. 
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079967965


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079967529


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079985989


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079967965


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079880811


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [WIP] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079861844


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1080041925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424",
       "triggerID" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   * c1cdbc2a7209596c6613d16e4b60f19731f09cf4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1080041027


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   * c1cdbc2a7209596c6613d16e4b60f19731f09cf4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [WIP] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079861844


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079985989


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1080041027


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   * c1cdbc2a7209596c6613d16e4b60f19731f09cf4 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079996859


   @leesf : addressed your comments. Feel free to merge if looks good. thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] alexeykudinkin commented on a change in pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on a change in pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#discussion_r836669643



##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetReader.java
##########
@@ -30,14 +32,17 @@
 import org.apache.hudi.common.model.HoodieFileFormat;
 import org.apache.hudi.common.util.BaseFileUtils;
 import org.apache.hudi.common.util.ParquetReaderIterator;
+
 import org.apache.parquet.avro.AvroParquetReader;
 import org.apache.parquet.avro.AvroReadSupport;
 import org.apache.parquet.hadoop.ParquetReader;
 
 public class HoodieParquetReader<R extends IndexedRecord> implements HoodieFileReader<R> {
+  
   private final Path path;
   private final Configuration conf;
   private final BaseFileUtils parquetUtils;
+  private List<ParquetReaderIterator> readerIterators = new ArrayList<>();

Review comment:
       @nsivabalan please make it final

##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
##########
@@ -333,7 +335,13 @@ object HoodieBaseRelation {
     partitionedFile => {
       val extension = FSUtils.getFileExtension(partitionedFile.filePath)
       if (HoodieFileFormat.PARQUET.getFileExtension.equals(extension)) {
-        parquetReader.apply(partitionedFile)
+        val iter = parquetReader.apply(partitionedFile)
+        if (iter.isInstanceOf[Closeable]) {
+          // register a callback to close parquetReader which will be executed on task completion.
+          // when tasks finished, this method will be called, and release resources.
+          Option(TaskContext.get()).foreach(_.addTaskCompletionListener[Unit](_ => iter.asInstanceOf[Closeable].close()))

Review comment:
       While i appreciate the intent here to tie up the iterator to the scope of particular task, i don't think this is the right way to fix it: you're tying the lifespan of the iterator to that one of the task (which in this case runs on executor), but there's no clear invariant why this iterator could not outlive this task.
   
   Instead we should rely on the RDD to close out the iterator when its done with iteration. And if you would take a look at `FileScanRDD` (which we rely on) you can see that it does exactly that. The reason why it's broken right now is b/c we modify the iterator (which is not inheriting from Closeable anymore):
   
   ```
   file: PartitionedFile => {
         val iter = readParquetFile(file)
         iter.flatMap {
           case r: InternalRow => Seq(r)
           case b: ColumnarBatch => b.rowIterator().asScala
         }
       }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079880811


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079862356


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1080059696


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424",
       "triggerID" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c1cdbc2a7209596c6613d16e4b60f19731f09cf4 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf merged pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
leesf merged pull request #5141:
URL: https://github.com/apache/hudi/pull/5141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#discussion_r836027994



##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
##########
@@ -333,7 +335,13 @@ object HoodieBaseRelation {
     partitionedFile => {
       val extension = FSUtils.getFileExtension(partitionedFile.filePath)
       if (HoodieFileFormat.PARQUET.getFileExtension.equals(extension)) {
-        parquetReader.apply(partitionedFile)
+        val iter = parquetReader.apply(partitionedFile)
+        if (iter.isInstanceOf[Closeable]) {
+          // register a callback to close parquetReader which will be executed on task completion.

Review comment:
       Did you mean `AutoClosable` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1080041925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424",
       "triggerID" : "c1cdbc2a7209596c6613d16e4b60f19731f09cf4",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7421) 
   * c1cdbc2a7209596c6613d16e4b60f19731f09cf4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7424) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on a change in pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#discussion_r835873697



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java
##########
@@ -31,6 +33,8 @@
  */
 public class ParquetReaderIterator<T> implements ClosableIterator<T> {
 
+  private static final Logger LOG = LogManager.getLogger(ParquetReaderIterator.class);

Review comment:
       unused import?

##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieParquetReader.java
##########
@@ -30,14 +32,21 @@
 import org.apache.hudi.common.model.HoodieFileFormat;
 import org.apache.hudi.common.util.BaseFileUtils;
 import org.apache.hudi.common.util.ParquetReaderIterator;
+
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
 import org.apache.parquet.avro.AvroParquetReader;
 import org.apache.parquet.avro.AvroReadSupport;
 import org.apache.parquet.hadoop.ParquetReader;
 
 public class HoodieParquetReader<R extends IndexedRecord> implements HoodieFileReader<R> {
+
+  private static final Logger LOG = LogManager.getLogger(HoodieParquetReader.class);

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #5141: [HUDI-3724] Fixing closure of ParquetReader

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #5141:
URL: https://github.com/apache/hudi/pull/5141#issuecomment-1079967529


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411",
       "triggerID" : "a42d360b3e73cf177881d747371f4f6d50281a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ac001c3b3f0b99cc5adc296f5186628ae8b8b487",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a42d360b3e73cf177881d747371f4f6d50281a0a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=7411) 
   * ac001c3b3f0b99cc5adc296f5186628ae8b8b487 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org