You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/25 21:54:36 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

nsivabalan opened a new pull request #2487:
URL: https://github.com/apache/hudi/pull/2487


   ## What is the purpose of the pull request
   
   Adding record level index based on hoodie backed table. 
   
   ## Brief change log
   
     - *Added RecordLevelIndex to hoodie that stores and exposes record level index info*
   
   Review guide:
   - Index class: RecordLevelIndex
   - Classed used in read path for index table: // Supports read in two modes. either scan fully and fetch key locations. or look up one by one
   a. HoodieRecordLevelIndexScanner
   b. HoodieRecordLevelIndexLookupFunction and RecordLevelIndexLazyLookupIterator
   - Record schema : HoodieRecordLevelIndexRecord
   - Payload to be used in Index table: HoodieRecordLevelIndexPayload
   - Configs added: hoodie.record.level.index.num.partitions and hoodie.record.level.index.enable.seek
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-880135035


   Sounds good. I will ping you once I have something to show around this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151



##########
File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+
+/**
+ * Payload used in index table for Hoodie Record level index.
+ */
+public class HoodieRecordLevelIndexPayload implements HoodieRecordPayload<HoodieRecordLevelIndexPayload> {
+
+  private String key;
+  private String partitionPath;
+  private String instantTime;
+  private String fileId;
+
+  public HoodieRecordLevelIndexPayload(Option<GenericRecord> record) {
+    if (record.isPresent()) {
+      // This can be simplified using SpecificData.deepcopy once this bug is fixed
+      // https://issues.apache.org/jira/browse/AVRO-1811
+      key = record.get().get("key").toString();
+      partitionPath = record.get().get("partitionPath").toString();
+      instantTime = record.get().get("instantTime").toString();
+      fileId = record.get().get("fileId").toString();
+    }
+  }
+
+  private HoodieRecordLevelIndexPayload(String key, String partitionPath, String instantTime, String fileId) {
+    this.key = key;
+    this.partitionPath = partitionPath;
+    this.instantTime = instantTime;
+    this.fileId = fileId;
+  }
+
+  @Override
+  public HoodieRecordLevelIndexPayload preCombine(HoodieRecordLevelIndexPayload another) {
+    if (this.instantTime.compareTo(another.instantTime) >= 0) {

Review comment:
       Note: this needs some fixing . Can we just convert the string to long and compare. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-867212403


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8b07157d222a415db4f0d12fabb720cb4a37e28c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-880160813


   oooh . exciting. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] lw309637554 commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
lw309637554 commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-824911386


   @nsivabalan @vinothchandar hello , do we have any plan on record index ? In our scene, use min&max and bloomfiler find the record key performance is poor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=h1) Report
   > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=desc) (8b07157) into [master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc) (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2487       +/-   ##
   =============================================
   + Coverage     50.18%   69.43%   +19.24%     
   + Complexity     3050      357     -2693     
   =============================================
     Files           419       53      -366     
     Lines         18931     1930    -17001     
     Branches       1948      230     -1718     
   =============================================
   - Hits           9500     1340     -8160     
   + Misses         8656      456     -8200     
   + Partials        775      134      -641     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==) | | | |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | | | |
   | [.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh) | | | |
   | [.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==) | | | |
   | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | |
   | [...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh) | | | |
   | [.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh) | | | |
   | [.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=) | | | |
   | [...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh) | | | |
   | [...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh) | | | |
   | ... and [356 more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] prashantwason commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
prashantwason commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-875308439


   This is a very comprehensive implementation for the record-level index.
   
   There are some changes required to the current Metadata Table design to enable record-level-index:
   1. Synchronous updates of metadata table 
   2. Performance improvement for multi key-lookups (requires for tagLocation) 
   3. In-line file system for point lookup of keys from HFile Blocks in log files so we dont have to load the entire log block in memory
   
   I am working on these changes and should be complete before end of July. 
   
   @nsivabalan I will be happy to collaborate with your to take this draft towards the Hoodie Metadata Table based record-level-index. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=h1) Report
   > Merging [#2487](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=desc) (8b07157) into [master](https://codecov.io/gh/apache/hudi/commit/e302c6bc12c7eb764781898fdee8ee302ef4ec10?el=desc) (e302c6b) will **increase** coverage by `19.24%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2487/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #2487       +/-   ##
   =============================================
   + Coverage     50.18%   69.43%   +19.24%     
   + Complexity     3050      357     -2693     
   =============================================
     Files           419       53      -366     
     Lines         18931     1930    -17001     
     Branches       1948      230     -1718     
   =============================================
   - Hits           9500     1340     -8160     
   + Misses         8656      456     -8200     
   + Partials        775      134      -641     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | hudicli | `?` | `?` | |
   | hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | hudicommon | `?` | `?` | |
   | hudiflink | `?` | `?` | |
   | hudihadoopmr | `?` | `?` | |
   | hudisparkdatasource | `?` | `?` | |
   | hudisync | `?` | `?` | |
   | huditimelineservice | `?` | `?` | |
   | hudiutilities | `69.43% <ø> (ø)` | `0.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2487?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...e/hudi/common/engine/HoodieLocalEngineContext.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2VuZ2luZS9Ib29kaWVMb2NhbEVuZ2luZUNvbnRleHQuamF2YQ==) | | | |
   | [.../org/apache/hudi/MergeOnReadSnapshotRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL01lcmdlT25SZWFkU25hcHNob3RSZWxhdGlvbi5zY2FsYQ==) | | | |
   | [.../org/apache/hudi/exception/HoodieKeyException.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUtleUV4Y2VwdGlvbi5qYXZh) | | | |
   | [.../apache/hudi/common/bloom/BloomFilterTypeCode.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL2Jsb29tL0Jsb29tRmlsdGVyVHlwZUNvZGUuamF2YQ==) | | | |
   | [...able/timeline/versioning/AbstractMigratorBase.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL3ZlcnNpb25pbmcvQWJzdHJhY3RNaWdyYXRvckJhc2UuamF2YQ==) | | | |
   | [...rc/main/java/org/apache/hudi/cli/HoodiePrompt.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jbGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY2xpL0hvb2RpZVByb21wdC5qYXZh) | | | |
   | [.../org/apache/hudi/common/model/HoodieTableType.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL21vZGVsL0hvb2RpZVRhYmxlVHlwZS5qYXZh) | | | |
   | [.../scala/org/apache/hudi/Spark2RowDeserializer.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3BhcmsyL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUvaHVkaS9TcGFyazJSb3dEZXNlcmlhbGl6ZXIuc2NhbGE=) | | | |
   | [...hudi/common/table/log/block/HoodieDeleteBlock.java](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL2xvZy9ibG9jay9Ib29kaWVEZWxldGVCbG9jay5qYXZh) | | | |
   | [...cala/org/apache/hudi/HoodieBootstrapRelation.scala](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay1kYXRhc291cmNlL2h1ZGktc3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9odWRpL0hvb2RpZUJvb3RzdHJhcFJlbGF0aW9uLnNjYWxh) | | | |
   | ... and [356 more](https://codecov.io/gh/apache/hudi/pull/2487/diff?src=pr&el=tree-more) | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-767228748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#discussion_r564142151



##########
File path: hudi-common/src/main/java/org/apache/hudi/index/HoodieRecordLevelIndexPayload.java
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.index;
+
+import org.apache.hudi.avro.model.HoodieRecordLevelIndexRecord;
+import org.apache.hudi.common.model.HoodieRecordPayload;
+import org.apache.hudi.common.util.Option;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+
+import java.io.IOException;
+
+/**
+ * Payload used in index table for Hoodie Record level index.
+ */
+public class HoodieRecordLevelIndexPayload implements HoodieRecordPayload<HoodieRecordLevelIndexPayload> {
+
+  private String key;
+  private String partitionPath;
+  private String instantTime;
+  private String fileId;
+
+  public HoodieRecordLevelIndexPayload(Option<GenericRecord> record) {
+    if (record.isPresent()) {
+      // This can be simplified using SpecificData.deepcopy once this bug is fixed
+      // https://issues.apache.org/jira/browse/AVRO-1811
+      key = record.get().get("key").toString();
+      partitionPath = record.get().get("partitionPath").toString();
+      instantTime = record.get().get("instantTime").toString();
+      fileId = record.get().get("fileId").toString();
+    }
+  }
+
+  private HoodieRecordLevelIndexPayload(String key, String partitionPath, String instantTime, String fileId) {
+    this.key = key;
+    this.partitionPath = partitionPath;
+    this.instantTime = instantTime;
+    this.fileId = fileId;
+  }
+
+  @Override
+  public HoodieRecordLevelIndexPayload preCombine(HoodieRecordLevelIndexPayload another) {
+    if (this.instantTime.compareTo(another.instantTime) >= 0) {

Review comment:
       Note: this needs some fixing . Can we just convert the string to long and compare. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-914630391


   We have a new PR up already.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-879077519


   @prashantwason : I am occupied with other stuffs for now. Please go ahead and get started with your work. Feel free to take this up and fix it as required. I will reach out to you once I have some cycles and can share some of the work. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-867212403


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8b07157d222a415db4f0d12fabb720cb4a37e28c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run travis` re-run the last Travis build
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar closed pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
vinothchandar closed pull request #2487:
URL: https://github.com/apache/hudi/pull/2487


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-867210438


   yes for random uuids, bloomfilters/min/max is less helpful. 
   
   @lw309637554 At a high level, we need some foundational work around metadata table to add record level index to it, thats the direction we are taking. The record level impl itself should be straight forward. I have done all the benchmarks on s3/object storage for range reads. IIUC this is prioritized very highly at uber in h2. @prashantwason can you please provide an update here? I know you are working on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot edited a comment on pull request #2487: [WIP HUDI-53] Adding Record Level Index based on hoodie backed table

Posted by GitBox <gi...@apache.org>.
hudi-bot edited a comment on pull request #2487:
URL: https://github.com/apache/hudi/pull/2487#issuecomment-867212403


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8b07157d222a415db4f0d12fabb720cb4a37e28c",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8b07157d222a415db4f0d12fabb720cb4a37e28c UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org