You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/07/12 10:30:20 UTC

[GitHub] [hudi] pratyakshsharma opened a new pull request, #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

pratyakshsharma opened a new pull request, #6087:
URL: https://github.com/apache/hudi/pull/6087

   …udi connector
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184625016

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "1181796662",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867) 
   * cf13b8bda17411dde2a98f74dacd73c822435719 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181647558

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
codope commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1182075738

   cc @alexeykudinkin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184867560

   Agree with you on this. Let me draft an RFC and we can take it up from there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184184933

   @alexeykudinkin An epic is filed here - https://issues.apache.org/jira/browse/HUDI-4394. 
   
   Please note this draft PR is intended as a POC and would work well with Presto.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181796662

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184629839

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "1181796662",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9918",
       "triggerID" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867) 
   * cf13b8bda17411dde2a98f74dacd73c822435719 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1183472383

   @pratyakshsharma thanks for taking the time to contribute this! 
   
   We definitely want to make sure that the code integrating w/ Presto/Trino/Hive is reusable as much as possible, and i think we should start to think about it upfront to avoid churn of refactoring things back and forth. Given the scope of this integration as well as its impact, i think we'd def go for RFC for it to make sure we solicit the feedback from the community before go too far w/ the implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pratyakshsharma commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181860701

   @xiarixiaoyao this is a good question, something I have been thinking about too. The idea is to build a layer that will help in integrating column stats index with all java based engines like presto, trino and hive. This lays the foundation, since we need something like ranges so as to be able to filter the files using min and max values. Few classes here are actually inspired from those present in presto, but they are not exactly similar.
   We can end up writing this logic in presto, but then a similar work will have to be done for trino as well.
   
   Although since this is just the beginning of this work, I am open to hear others' thoughts on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181640199

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181636551

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181838719

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1181796662",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * a71c82cdd446e19da4b09fdee127f0f328cda4b7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] alexeykudinkin commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
alexeykudinkin commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184822319

   Got it. Yeah, i don't think we'll be able to make it into 0.12 given that we're planning to do a code freeze next week.
   
   And again, i don't think we can go with the project of this size, scope and more importantly impact (it'll be affecting all forthcoming execution engines like Flink, Presto, Trino, Hive, etc) w/o an RFC.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1184638511

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a71c82cdd446e19da4b09fdee127f0f328cda4b7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9867",
       "triggerID" : "1181796662",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9918",
       "triggerID" : "cf13b8bda17411dde2a98f74dacd73c822435719",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * cf13b8bda17411dde2a98f74dacd73c822435719 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9918) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiarixiaoyao commented on pull request #6087: [HUDI-4364]: changes for integrating column stats index into presto-h…

Posted by GitBox <gi...@apache.org>.
xiarixiaoyao commented on PR #6087:
URL: https://github.com/apache/hudi/pull/6087#issuecomment-1181825652

   @pratyakshsharma  nice work!
    A little question:
   Why don't we put those codes into Presto Hudi connector so that we can reuse related classes of Presto directly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org