You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/22 08:16:17 UTC

[GitHub] [hudi] lw309637554 opened a new pull request #2199: spark incremental read support with replace

lw309637554 opened a new pull request #2199:
URL: https://github.com/apache/hudi/pull/2199


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   spark incremental read support with replace
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
codope commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874048542


   @lw309637554 Is this PR still valid given that #3139 is merged now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2199: spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-714318530


   before merge this, need fisrt merge https://github.com/apache/hudi/pull/2196


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] satishkotha commented on pull request #2199: [HUDI-1264] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
satishkotha commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-718128566


   Will review this after concerns in https://github.com/apache/hudi/pull/2196  are addressed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2199: [HUDI-1264] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-717667890


   @satishkotha @n3nash you guys can take this as well? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 edited a comment on pull request #2199: spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 edited a comment on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-714318530


   before merge this, need first merge https://github.com/apache/hudi/pull/2196


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-742046685


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2199?src=pr&el=h1) Report
   > Merging [#2199](https://codecov.io/gh/apache/hudi/pull/2199?src=pr&el=desc) (c39aaac) into [master](https://codecov.io/gh/apache/hudi/commit/de2fbeac334e65abee38f5f5867805ad07b8dcd2?el=desc) (de2fbea) will **decrease** coverage by `43.13%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2199/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2199?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2199       +/-   ##
   =============================================
   - Coverage     53.49%   10.35%   -43.14%     
   + Complexity     2788       48     -2740     
   =============================================
     Files           355       51      -304     
     Lines         16169     1786    -14383     
     Branches       1650      213     -1437     
   =============================================
   - Hits           8649      185     -8464     
   + Misses         6819     1588     -5231     
   + Partials        701       13      -688     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudispark | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `10.35% <ø> (-59.75%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2199?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | [...lities/schema/SchemaProviderWithPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQcm92aWRlcldpdGhQb3N0UHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | ... and [331 more](https://codecov.io/gh/apache/hudi/pull/2199/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874049537


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c39aaac4f94125ae8de331639ec3343c60463b8f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-809975186


   @satishkotha Is this PR still valid ? @lw309637554 Can you please rebase this PR so we can get this landed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #2199: [HUDI-1264] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-718007382


   @satishkotha can you please review this, once done, let me know and I'll take a final pass and merge it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2199: [HUDI-1264] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-715906430


   @satishkotha @n3nash @bvaradar 
   hi , the solution in this pull request just filter the commits between the  latest  replace commit and the end commit. 
   But compare to HoodieParquetRealtimeInputFormat , it use fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if we should change spark incremental relation to use fsView.getLatestMergedFileSlicesBeforeOrOn ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874049537


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c39aaac4f94125ae8de331639ec3343c60463b8f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 closed pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 closed pull request #2199:
URL: https://github.com/apache/hudi/pull/2199


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874049537


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c39aaac4f94125ae8de331639ec3343c60463b8f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-898113298


   > @lw309637554 Is this PR still valid given that #3139 is merged now?
   
   @codope hello , i think i can close this one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874049537


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c39aaac4f94125ae8de331639ec3343c60463b8f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c39aaac4f94125ae8de331639ec3343c60463b8f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-822124457


   > @satishkotha Is this PR still valid ? @lw309637554 Can you please rebase this PR so we can get this landed.
   @n3nash @satishkotha 
   i think the solution in this pr is not very good.
   
   hi , the solution in this pull request just filter the commits between the latest replace commit and the end commit.
   But compare to HoodieParquetRealtimeInputFormat , it use fsView.getLatestMergedFileSlicesBeforeOrOn to filter the not replace slice, if we should change spark incremental relation to use fsView.getLatestMergedFileSlicesBeforeOrOn ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on pull request #2199: [HUDI-1264][WIP] spark incremental read support with replace

Posted by GitBox <gi...@apache.org>.
codope commented on pull request #2199:
URL: https://github.com/apache/hudi/pull/2199#issuecomment-874048542


   @lw309637554 Is this PR still valid given that #3139 is merged now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org