You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/27 15:53:37 UTC

[GitHub] [hudi] pengzhiwei2018 opened a new pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

pengzhiwei2018 opened a new pull request #2497:
URL: https://github.com/apache/hudi/pull/2497


   … with log
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Incorrect query result is return for MOR table when merge the base data with the log data. This PR fix this by pass
   the PRE_COMBINE_FILED to the DefaultHoodieRecordPayload in HoodieMergeOnReadRDD#mergeRowWithLog.
   
   ## Brief change log
   
   - Store the `preCombineField`  to the `hoodie.properties`. This add a `preCombineField` field to the `HoodieTableMetaClient#initTableType ` method.
   - Pass the `preCombineField` to the `DefaultHoodieRecordPayload` in `HoodieMergeOnReadRDD#mergeRowWithLog`
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
     - *Added `TestMORDataSource#testPreCombineFiledForReadMOR` to verify the change.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-769513904


   cc @nsivabalan to also triage


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (143d036) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **increase** coverage by `19.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2497       +/-   ##
   =============================================
   + Coverage     50.28%   69.43%   +19.14%     
   + Complexity     3120      357     -2763     
   =============================================
     Files           430       53      -377     
     Lines         19565     1930    -17635     
     Branches       2004      230     -1774     
   =============================================
   - Hits           9838     1340     -8498     
   + Misses         8924      456     -8468     
   + Partials        803      134      -669     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==) | `79.54% <ø> (ø)` | `6.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   | [.../apache/hudi/hadoop/RecordReaderValueIterator.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL1JlY29yZFJlYWRlclZhbHVlSXRlcmF0b3IuamF2YQ==) | | | |
   | [.../org/apache/hudi/util/RowDataToAvroConverters.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1mbGluay9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaHVkaS91dGlsL1Jvd0RhdGFUb0F2cm9Db252ZXJ0ZXJzLmphdmE=) | | | |
   | [...he/hudi/cli/commands/HDFSParquetImportCommand.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL2NvbW1hbmRzL0hERlNQYXJxdWV0SW1wb3J0Q29tbWFuZC5qYXZh) | | | |
   | [...va/org/apache/hudi/common/model/HoodieLogFile.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUxvZ0ZpbGUuamF2YQ==) | | | |
   | [.../common/util/queue/FunctionBasedQueueProducer.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvcXVldWUvRnVuY3Rpb25CYXNlZFF1ZXVlUHJvZHVjZXIuamF2YQ==) | | | |
   | [...util/jvm/OpenJ9MemoryLayoutSpecification64bit.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvanZtL09wZW5KOU1lbW9yeUxheW91dFNwZWNpZmljYXRpb242NGJpdC5qYXZh) | | | |
   | [...e/hudi/exception/HoodieCorruptedDataException.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUNvcnJ1cHRlZERhdGFFeGNlcHRpb24uamF2YQ==) | | | |
   | [...i/bootstrap/SparkParquetBootstrapDataProvider.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYm9vdHN0cmFwL1NwYXJrUGFycXVldEJvb3RzdHJhcERhdGFQcm92aWRlci5qYXZh) | | | |
   | ... and [363 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (4891a86) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **increase** coverage by `0.24%`.
   > The diff coverage is `58.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2497      +/-   ##
   ============================================
   + Coverage     50.28%   50.53%   +0.24%     
   - Complexity     3120     3123       +3     
   ============================================
     Files           430      430              
     Lines         19565    19597      +32     
     Branches       2004     2008       +4     
   ============================================
   + Hits           9838     9903      +65     
   + Misses         8924     8886      -38     
   - Partials        803      808       +5     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.45% <0.00%> (-0.07%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `33.03% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.46% <72.41%> (+3.60%)` | `0.00 <4.00> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `45.45% <0.00%> (-0.60%)` | `17.00 <0.00> (ø)` | |
   | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.33% <0.00%> (-2.36%)` | `45.00 <0.00> (ø)` | |
   | [...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh) | `89.78% <60.00%> (-1.70%)` | `14.00 <2.00> (+2.00)` | :arrow_down: |
   | [...g/apache/hudi/MergeOnReadIncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==) | `81.45% <71.42%> (-0.76%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | `89.13% <77.77%> (-1.46%)` | `17.00 <1.00> (+1.00)` | :arrow_down: |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `48.76% <100.00%> (+0.18%)` | `0.00 <0.00> (ø)` | |
   | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | [...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=) | `60.46% <0.00%> (+60.46%)` | `0.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-769144965


   > @pengzhiwei2018 the build failure seems related.
   
   Yes, I have fix it now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (b62f1c3) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **decrease** coverage by `40.59%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #2497       +/-   ##
   ============================================
   - Coverage     50.28%   9.68%   -40.60%     
   + Complexity     3120      48     -3072     
   ============================================
     Files           430      53      -377     
     Lines         19565    1930    -17635     
     Branches       2004     230     -1774     
   ============================================
   - Hits           9838     187     -9651     
   + Misses         8924    1730     -7194     
   + Partials        803      13      -790     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.80%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==) | `0.00% <ø> (-79.55%)` | `0.00 <0.00> (-6.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | ... and [402 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770527676


   > thnx Gary. I have asked liwei. If he is not working on it as of now, probably I will take it up and put up a diff.
   
   hi @nsivabalan , do you think we can merge this PR and do the refactoring in a separate PR or on hold for now? WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 merged pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 merged pull request #2497:
URL: https://github.com/apache/hudi/pull/2497


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770383571


   > thanks @pengzhiwei2018 , LGTM! @nsivabalan I agree we should use a builder pattern to init the table. Looks like @lw309637554 is working on the refactoring https://issues.apache.org/jira/browse/HUDI-1315.
   
   thnx Gary. I have asked liwei. If he is not working on it as of now, probably I will take it up and put up a diff. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-769027722


   @pengzhiwei2018 the build failure seems related. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (143d036) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **decrease** coverage by `40.59%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #2497       +/-   ##
   ============================================
   - Coverage     50.28%   9.68%   -40.60%     
   + Complexity     3120      48     -3072     
   ============================================
     Files           430      53      -377     
     Lines         19565    1930    -17635     
     Branches       2004     230     -1774     
   ============================================
   - Hits           9838     187     -9651     
   + Misses         8924    1730     -7194     
   + Partials        803      13      -790     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.80%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==) | `0.00% <ø> (-79.55%)` | `0.00 <0.00> (-6.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | ... and [402 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770832362


   @garyli1019 : I am good w/ the PR. Can you land it. Please fill in the right commit message while you squash & merge. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r567362329



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/BootstrapExecutor.java
##########
@@ -173,7 +173,7 @@ private void initializeTable() throws IOException {
 
     HoodieTableMetaClient.initTableTypeWithBootstrap(new Configuration(jssc.hadoopConfiguration()),
         cfg.targetBasePath, HoodieTableType.valueOf(cfg.tableType), cfg.targetTableName, "archived", cfg.payloadClassName,
-        cfg.baseFileFormat, cfg.bootstrapIndexClass, bootstrapBasePath);
+        cfg.baseFileFormat, null, cfg.bootstrapIndexClass, bootstrapBasePath);

Review comment:
       missed this one?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566552361



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##########
@@ -285,7 +289,14 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
 
       private def mergeRowWithLog(curRow: InternalRow, curKey: String) = {
         val historyAvroRecord = serializer.serialize(curRow).asInstanceOf[GenericRecord]
-        logRecords.get(curKey).getData.combineAndGetUpdateValue(historyAvroRecord, tableAvroSchema)
+        if (preCombineField != null) {
+          val payloadProps = new Properties()

Review comment:
       we are creating a new `Properties` in every call, can we put this outside?

##########
File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/TableCommand.java
##########
@@ -108,7 +108,7 @@ public String createTable(
 
     final HoodieTableType tableType = HoodieTableType.valueOf(tableTypeStr);
     HoodieTableMetaClient.initTableType(HoodieCLI.conf, path, tableType, name, archiveFolder,
-        payloadClass, layoutVersion);
+        payloadClass, null, layoutVersion);

Review comment:
       can we add a new method `initTableType` to handle all the null being added?

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -50,7 +50,8 @@ case class HoodieMergeOnReadTableState(tableStructSchema: StructType,
                                        requiredStructSchema: StructType,
                                        tableAvroSchema: String,
                                        requiredAvroSchema: String,
-                                       hoodieRealtimeFileSplits: List[HoodieMergeOnReadFileSplit])
+                                       hoodieRealtimeFileSplits: List[HoodieMergeOnReadFileSplit],
+                                       preCombineField: String)

Review comment:
       can we make this field `option` instead of using `null`?

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
   private val tableStructSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+    val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+    if (fieldFromTableConfig != null) {
+      fieldFromTableConfig
+    } else if (optParams.contains(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY)) {

Review comment:
       can we use the `HoodieTableConfig` instead? or somehow translate all precombine field options into one place and deprecate others. Using the write option while reading sounds a bit odd. 

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
   private val tableStructSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+    val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField

Review comment:
       `preCombineFieldFromTableConfig` sounds better?

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##########
@@ -18,6 +18,8 @@
 
 package org.apache.hudi
 
+import java.util.Properties

Review comment:
       nit: this import should be in the next group

##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
   private val tableStructSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+    val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+    if (fieldFromTableConfig != null) {

Review comment:
       If the field does not exist, will this be an empty string or null?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (2f23040) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **decrease** coverage by `1.61%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2497      +/-   ##
   ============================================
   - Coverage     50.28%   48.67%   -1.62%     
   + Complexity     3120     2775     -345     
   ============================================
     Files           430      375      -55     
     Lines         19565    16735    -2830     
     Branches       2004     1689     -315     
   ============================================
   - Hits           9838     8145    -1693     
   + Misses         8924     7944     -980     
   + Partials        803      646     -157     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.45% <0.00%> (-0.07%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `33.03% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `45.45% <0.00%> (-0.60%)` | `17.00 <0.00> (ø)` | |
   | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.33% <0.00%> (-2.36%)` | `45.00 <0.00> (ø)` | |
   | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | [...udi/spark3/internal/HoodieWriterCommitMessage.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVXcml0ZXJDb21taXRNZXNzYWdlLmphdmE=) | | | |
   | [...in/scala/org/apache/hudi/HoodieEmptyRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUVtcHR5UmVsYXRpb24uc2NhbGE=) | | | |
   | [...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=) | | | |
   | [.../org/apache/hudi/hive/NonPartitionedExtractor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvTm9uUGFydGl0aW9uZWRFeHRyYWN0b3IuamF2YQ==) | | | |
   | [...3/internal/HoodieDataSourceInternalBatchWrite.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlLmphdmE=) | | | |
   | [.../java/org/apache/hudi/HoodieDataSourceHelpers.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvSG9vZGllRGF0YVNvdXJjZUhlbHBlcnMuamF2YQ==) | | | |
   | [.../org/apache/hudi/hive/HoodieHiveSyncException.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSG9vZGllSGl2ZVN5bmNFeGNlcHRpb24uamF2YQ==) | | | |
   | ... and [44 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566917341



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -50,7 +50,8 @@ case class HoodieMergeOnReadTableState(tableStructSchema: StructType,
                                        requiredStructSchema: StructType,
                                        tableAvroSchema: String,
                                        requiredAvroSchema: String,
-                                       hoodieRealtimeFileSplits: List[HoodieMergeOnReadFileSplit])
+                                       hoodieRealtimeFileSplits: List[HoodieMergeOnReadFileSplit],
+                                       preCombineField: String)

Review comment:
       done!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (4891a86) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **increase** coverage by `0.24%`.
   > The diff coverage is `58.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2497      +/-   ##
   ============================================
   + Coverage     50.28%   50.53%   +0.24%     
   - Complexity     3120     3123       +3     
   ============================================
     Files           430      430              
     Lines         19565    19597      +32     
     Branches       2004     2008       +4     
   ============================================
   + Hits           9838     9903      +65     
   + Misses         8924     8886      -38     
   - Partials        803      808       +5     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.45% <0.00%> (-0.07%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `33.03% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.46% <72.41%> (+3.60%)` | `0.00 <4.00> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `45.45% <0.00%> (-0.60%)` | `17.00 <0.00> (ø)` | |
   | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.33% <0.00%> (-2.36%)` | `45.00 <0.00> (ø)` | |
   | [...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh) | `89.78% <60.00%> (-1.70%)` | `14.00 <2.00> (+2.00)` | :arrow_down: |
   | [...g/apache/hudi/MergeOnReadIncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==) | `81.45% <71.42%> (-0.76%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | `89.13% <77.77%> (-1.46%)` | `17.00 <1.00> (+1.00)` | :arrow_down: |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `48.76% <100.00%> (+0.18%)` | `0.00 <0.00> (ø)` | |
   | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | [...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=) | `60.46% <0.00%> (+60.46%)` | `0.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566913898



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##########
@@ -18,6 +18,8 @@
 
 package org.apache.hudi
 
+import java.util.Properties

Review comment:
       Fixed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566877440



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
   private val tableStructSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+    val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+    if (fieldFromTableConfig != null) {

Review comment:
       It will be null if it does not exist.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566968882



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -328,38 +328,62 @@ public synchronized HoodieArchivedTimeline getArchivedTimeline() {
    */
   public static HoodieTableMetaClient initTableTypeWithBootstrap(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                                  String tableName, String archiveLogFolder, String payloadClassName,
-                                                                 String baseFileFormat, String bootstrapIndexClass,
+                                                                 String baseFileFormat, String preCombineField, String bootstrapIndexClass,
                                                                  String bootstrapBasePath) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, bootstrapIndexClass, bootstrapBasePath);
+        archiveLogFolder, payloadClassName, null,
+      baseFileFormat, preCombineField, bootstrapIndexClass, bootstrapBasePath);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String archiveLogFolder, String payloadClassName,
+                                                    String baseFileFormat, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName,
+        archiveLogFolder, payloadClassName, null, baseFileFormat, preCombineField,
+       null, null);
   }
 
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String archiveLogFolder, String payloadClassName,
                                                     String baseFileFormat) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, null, null);
+      archiveLogFolder, payloadClassName, null, baseFileFormat, null,
+      null, null);
   }
 
   /**
    * Used primarily by tests, examples.
    */
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String payloadClassName, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
+        null, preCombineField);
+  }
+
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String payloadClassName) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
-        null, null, null, null);
+      null, (String) null);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,

Review comment:
       @garyli1019 @vinothchandar : I am sure this would have been brought up earlier too. Curious as to why we haven't exposed a builder for MetaClient instantiation. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (7b3d36e) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **decrease** coverage by `40.59%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master   #2497       +/-   ##
   ============================================
   - Coverage     50.28%   9.68%   -40.60%     
   + Complexity     3120      48     -3072     
   ============================================
     Files           430      53      -377     
     Lines         19565    1930    -17635     
     Branches       2004     230     -1774     
   ============================================
   - Hits           9838     187     -9651     
   + Misses         8924    1730     -7194     
   + Partials        803      13      -790     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `9.68% <ø> (-59.80%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==) | `0.00% <ø> (-79.55%)` | `0.00 <0.00> (-6.00)` | |
   | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-3.00%)` | |
   | [...a/org/apache/hudi/utilities/sources/RowSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUm93U291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [.../org/apache/hudi/utilities/sources/AvroSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQXZyb1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/utilities/sources/JsonSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvblNvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-1.00%)` | |
   | [...rg/apache/hudi/utilities/sources/CsvDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvQ3N2REZTU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-10.00%)` | |
   | [...g/apache/hudi/utilities/sources/JsonDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkRGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-4.00%)` | |
   | [...apache/hudi/utilities/sources/JsonKafkaSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvSnNvbkthZmthU291cmNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-6.00%)` | |
   | [...pache/hudi/utilities/sources/ParquetDFSSource.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvUGFycXVldERGU1NvdXJjZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | `0.00% <0.00%> (-5.00%)` | |
   | ... and [402 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566961281



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -328,38 +328,62 @@ public synchronized HoodieArchivedTimeline getArchivedTimeline() {
    */
   public static HoodieTableMetaClient initTableTypeWithBootstrap(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                                  String tableName, String archiveLogFolder, String payloadClassName,
-                                                                 String baseFileFormat, String bootstrapIndexClass,
+                                                                 String baseFileFormat, String preCombineField, String bootstrapIndexClass,
                                                                  String bootstrapBasePath) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, bootstrapIndexClass, bootstrapBasePath);
+        archiveLogFolder, payloadClassName, null,
+      baseFileFormat, preCombineField, bootstrapIndexClass, bootstrapBasePath);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String archiveLogFolder, String payloadClassName,
+                                                    String baseFileFormat, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName,
+        archiveLogFolder, payloadClassName, null, baseFileFormat, preCombineField,
+       null, null);
   }
 
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String archiveLogFolder, String payloadClassName,
                                                     String baseFileFormat) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, null, null);
+      archiveLogFolder, payloadClassName, null, baseFileFormat, null,
+      null, null);
   }
 
   /**
    * Used primarily by tests, examples.
    */
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String payloadClassName, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
+        null, preCombineField);
+  }
+
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String payloadClassName) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
-        null, null, null, null);
+      null, (String) null);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,

Review comment:
       should we think of introducing a builder pattern since we have too many overloaded constructors/initTableTypes? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] garyli1019 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
garyli1019 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r567361480



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -328,38 +328,62 @@ public synchronized HoodieArchivedTimeline getArchivedTimeline() {
    */
   public static HoodieTableMetaClient initTableTypeWithBootstrap(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                                  String tableName, String archiveLogFolder, String payloadClassName,
-                                                                 String baseFileFormat, String bootstrapIndexClass,
+                                                                 String baseFileFormat, String preCombineField, String bootstrapIndexClass,
                                                                  String bootstrapBasePath) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, bootstrapIndexClass, bootstrapBasePath);
+        archiveLogFolder, payloadClassName, null,
+      baseFileFormat, preCombineField, bootstrapIndexClass, bootstrapBasePath);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String archiveLogFolder, String payloadClassName,
+                                                    String baseFileFormat, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName,
+        archiveLogFolder, payloadClassName, null, baseFileFormat, preCombineField,
+       null, null);
   }
 
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String archiveLogFolder, String payloadClassName,
                                                     String baseFileFormat) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName,
-        archiveLogFolder, payloadClassName, null, baseFileFormat, null, null);
+      archiveLogFolder, payloadClassName, null, baseFileFormat, null,
+      null, null);
   }
 
   /**
    * Used primarily by tests, examples.
    */
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
+                                                    String tableName, String payloadClassName, String preCombineField) throws IOException {
+    return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
+        null, preCombineField);
+  }
+
   public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,
                                                     String tableName, String payloadClassName) throws IOException {
     return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName,
-        null, null, null, null);
+      null, (String) null);
+  }
+
+  public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType,

Review comment:
       found a jira for this topic https://issues.apache.org/jira/browse/HUDI-1315




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (b62f1c3) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **increase** coverage by `19.14%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2497       +/-   ##
   =============================================
   + Coverage     50.28%   69.43%   +19.14%     
   + Complexity     3120      357     -2763     
   =============================================
     Files           430       53      -377     
     Lines         19565     1930    -17635     
     Branches       2004      230     -1774     
   =============================================
   - Hits           9838     1340     -8498     
   + Misses         8924      456     -8468     
   + Partials        803      134      -669     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `?` | `?` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (-0.06%)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...udi/utilities/deltastreamer/BootstrapExecutor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvQm9vdHN0cmFwRXhlY3V0b3IuamF2YQ==) | `79.54% <ø> (ø)` | `6.00 <0.00> (ø)` | |
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `70.50% <0.00%> (-0.36%)` | `50.00% <0.00%> (-1.00%)` | |
   | [.../org/apache/hudi/common/model/HoodieFileGroup.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZUZpbGVHcm91cC5qYXZh) | | | |
   | [...in/java/org/apache/hudi/common/model/BaseFile.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0Jhc2VGaWxlLmphdmE=) | | | |
   | [.../hive/SlashEncodedHourPartitionValueExtractor.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvU2xhc2hFbmNvZGVkSG91clBhcnRpdGlvblZhbHVlRXh0cmFjdG9yLmphdmE=) | | | |
   | [...3/internal/HoodieDataSourceInternalBatchWrite.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmszL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3NwYXJrMy9pbnRlcm5hbC9Ib29kaWVEYXRhU291cmNlSW50ZXJuYWxCYXRjaFdyaXRlLmphdmE=) | | | |
   | [...udi/timeline/service/handlers/BaseFileHandler.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS10aW1lbGluZS1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL3RpbWVsaW5lL3NlcnZpY2UvaGFuZGxlcnMvQmFzZUZpbGVIYW5kbGVyLmphdmE=) | | | |
   | [...pache/hudi/common/fs/SizeAwareDataInputStream.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL1NpemVBd2FyZURhdGFJbnB1dFN0cmVhbS5qYXZh) | | | |
   | [...apache/hudi/common/fs/HoodieWrapperFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL0hvb2RpZVdyYXBwZXJGaWxlU3lzdGVtLmphdmE=) | | | |
   | [...g/apache/hudi/cli/utils/SparkTempViewProvider.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL3V0aWxzL1NwYXJrVGVtcFZpZXdQcm92aWRlci5qYXZh) | | | |
   | ... and [363 more](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566617430



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala
##########
@@ -285,7 +289,14 @@ class HoodieMergeOnReadRDD(@transient sc: SparkContext,
 
       private def mergeRowWithLog(curRow: InternalRow, curKey: String) = {
         val historyAvroRecord = serializer.serialize(curRow).asInstanceOf[GenericRecord]
-        logRecords.get(curKey).getData.combineAndGetUpdateValue(historyAvroRecord, tableAvroSchema)
+        if (preCombineField != null) {
+          val payloadProps = new Properties()

Review comment:
       Good suggestion! I will refactor this code later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770830774


   Yes, sounds good @garyli1019 . LGTM. Feel free to land it if you are good. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r567431431



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/BootstrapExecutor.java
##########
@@ -173,7 +173,7 @@ private void initializeTable() throws IOException {
 
     HoodieTableMetaClient.initTableTypeWithBootstrap(new Configuration(jssc.hadoopConfiguration()),
         cfg.targetBasePath, HoodieTableType.valueOf(cfg.tableType), cfg.targetTableName, "archived", cfg.payloadClassName,
-        cfg.baseFileFormat, cfg.bootstrapIndexClass, bootstrapBasePath);
+        cfg.baseFileFormat, null, cfg.bootstrapIndexClass, bootstrapBasePath);

Review comment:
       added!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#issuecomment-770090356


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=h1) Report
   > Merging [#2497](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=desc) (2f23040) into [master](https://codecov.io/gh/apache/hudi/commit/23f2ef3efbea5e9a686bac195cdf97605f20d91d?el=desc) (23f2ef3) will **increase** coverage by `0.24%`.
   > The diff coverage is `58.33%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2497/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #2497      +/-   ##
   ============================================
   + Coverage     50.28%   50.53%   +0.24%     
   - Complexity     3120     3123       +3     
   ============================================
     Files           430      430              
     Lines         19565    19597      +32     
     Branches       2004     2008       +4     
   ============================================
   + Hits           9838     9903      +65     
   + Misses         8924     8886      -38     
   - Partials        803      808       +5     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `37.21% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `51.45% <0.00%> (-0.07%)` | `0.00 <0.00> (ø)` | |
   | hudiflink | `33.03% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudihadoopmr | `33.16% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudisparkdatasource | `69.46% <72.41%> (+3.60%)` | `0.00 <4.00> (ø)` | |
   | hudisync | `48.61% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | huditimelineservice | `66.49% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudiutilities | `69.48% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2497?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...rg/apache/hudi/common/table/HoodieTableConfig.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlQ29uZmlnLmphdmE=) | `45.45% <0.00%> (-0.60%)` | `17.00 <0.00> (ø)` | |
   | [...pache/hudi/common/table/HoodieTableMetaClient.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL0hvb2RpZVRhYmxlTWV0YUNsaWVudC5qYXZh) | `68.33% <0.00%> (-2.36%)` | `45.00 <0.00> (ø)` | |
   | [...n/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZU1lcmdlT25SZWFkUkRELnNjYWxh) | `89.78% <60.00%> (-1.70%)` | `14.00 <2.00> (+2.00)` | :arrow_down: |
   | [...g/apache/hudi/MergeOnReadIncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==) | `81.45% <71.42%> (-0.76%)` | `22.00 <1.00> (+1.00)` | :arrow_down: |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | `89.13% <77.77%> (-1.46%)` | `17.00 <1.00> (+1.00)` | :arrow_down: |
   | [...n/scala/org/apache/hudi/HoodieSparkSqlWriter.scala](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZVNwYXJrU3FsV3JpdGVyLnNjYWxh) | `48.76% <100.00%> (+0.18%)` | `0.00 <0.00> (ø)` | |
   | [...ache/hudi/common/fs/inline/InMemoryFileSystem.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2ZzL2lubGluZS9Jbk1lbW9yeUZpbGVTeXN0ZW0uamF2YQ==) | `79.31% <0.00%> (-10.35%)` | `15.00% <0.00%> (-1.00%)` | |
   | [...src/main/java/org/apache/hudi/QuickstartUtils.java](https://codecov.io/gh/apache/hudi/pull/2497/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvUXVpY2tzdGFydFV0aWxzLmphdmE=) | `60.46% <0.00%> (+60.46%)` | `0.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

Posted by GitBox <gi...@apache.org>.
pengzhiwei2018 commented on a change in pull request #2497:
URL: https://github.com/apache/hudi/pull/2497#discussion_r566917206



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -78,7 +78,16 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
   private val tableStructSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableAvroSchema)
   private val maxCompactionMemoryInBytes = getMaxCompactionMemoryInBytes(jobConf)
   private val fileIndex = buildFileIndex()
-
+  private val preCombineField = {
+    val fieldFromTableConfig = metaClient.getTableConfig.getPreCombineField
+    if (fieldFromTableConfig != null) {
+      fieldFromTableConfig
+    } else if (optParams.contains(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY)) {

Review comment:
       Yes, the PRECOMBINE_FIELD_OPT_KEY is used to compatible with the old table which has not store the preCombineField to hoodie.properties.  Using the write option is a bit odd. So I have add a `READ_PRECOMBINE_FIELD` to the `DataSourceReadOptions`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org