You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/18 13:14:01 UTC

[GitHub] [hudi] hj2016 opened a new pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

hj2016 opened a new pull request #1978:
URL: https://github.com/apache/hudi/pull/1978


   
   - *Fixed the support for changes to the hbase index partition path, and added the HBASE_INDEX_UPDATE_PARTITION_PATH parameter to turn on and off similar to GlobalBloom.*
   
   - *Added hbase index partition path change test case *
   
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691379071


   @n3nash The problem nsivabalan said, I have fixed it a few days ago. I don’t know if you are talking about the same problem. If not, can I elaborate on it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-688581129


   @n3nash @nsivabalan can we please fast track this if possible? :) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-683900773


   @n3nash @nsivabalan can you please review this 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691379071






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r481087898



##########
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##########
@@ -213,36 +215,61 @@ private boolean checkIfValidCommit(HoodieTableMetaClient metaClient, String comm
           statements.add(generateStatement(rec.getRecordKey()));
           currentBatchOfRecords.add(rec);
           // iterator till we reach batch size
-          if (statements.size() >= multiGetBatchSize || !hoodieRecordIterator.hasNext()) {
-            // get results for batch from Hbase
-            Result[] results = doGet(hTable, statements);
-            // clear statements to be GC'd
-            statements.clear();
-            for (Result result : results) {
-              // first, attempt to grab location from HBase
-              HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
-              if (result.getRow() != null) {
-                String keyFromResult = Bytes.toString(result.getRow());
-                String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
-                String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
-                String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
-
-                if (checkIfValidCommit(metaClient, commitTs)) {
-                  currentRecord = new HoodieRecord(new HoodieKey(currentRecord.getRecordKey(), partitionPath),
+          if (hoodieRecordIterator.hasNext() && statements.size() < multiGetBatchSize) {
+            continue;
+          }
+
+          // get results for batch from Hbase
+          Result[] results = doGet(hTable, statements);
+          // clear statements to be GC'd
+          statements.clear();
+          for (Result result : results) {
+            // first, attempt to grab location from HBase
+            HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
+
+            if (result.getRow() == null) {
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+            String keyFromResult = Bytes.toString(result.getRow());
+            String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
+            String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
+            String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
+
+            if (!checkIfValidCommit(metaClient, commitTs)) {
+              // if commit is invalid, treat this as a new taggedRecord
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+
+            if (updatePartitionPath && !partitionPath.equals(currentRecord.getPartitionPath())) {

Review comment:
       When I modify this code, I feel that the readability of multi-level nested for if code is poor, so I reduce the number of nesting levels of if and for statements, and think it is clearer to read.
   If you don't think it's reasonable to modify this way, I can only change the judgment of the partition change part. What do you think? I don't know if the indentation problem you mentioned is the one I mentioned above?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r481536312



##########
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##########
@@ -213,36 +215,61 @@ private boolean checkIfValidCommit(HoodieTableMetaClient metaClient, String comm
           statements.add(generateStatement(rec.getRecordKey()));
           currentBatchOfRecords.add(rec);
           // iterator till we reach batch size
-          if (statements.size() >= multiGetBatchSize || !hoodieRecordIterator.hasNext()) {
-            // get results for batch from Hbase
-            Result[] results = doGet(hTable, statements);
-            // clear statements to be GC'd
-            statements.clear();
-            for (Result result : results) {
-              // first, attempt to grab location from HBase
-              HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
-              if (result.getRow() != null) {
-                String keyFromResult = Bytes.toString(result.getRow());
-                String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
-                String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
-                String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
-
-                if (checkIfValidCommit(metaClient, commitTs)) {
-                  currentRecord = new HoodieRecord(new HoodieKey(currentRecord.getRecordKey(), partitionPath),
+          if (hoodieRecordIterator.hasNext() && statements.size() < multiGetBatchSize) {
+            continue;
+          }
+
+          // get results for batch from Hbase
+          Result[] results = doGet(hTable, statements);
+          // clear statements to be GC'd
+          statements.clear();
+          for (Result result : results) {
+            // first, attempt to grab location from HBase
+            HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
+
+            if (result.getRow() == null) {
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+            String keyFromResult = Bytes.toString(result.getRow());
+            String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
+            String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
+            String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
+
+            if (!checkIfValidCommit(metaClient, commitTs)) {
+              // if commit is invalid, treat this as a new taggedRecord
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+
+            if (updatePartitionPath && !partitionPath.equals(currentRecord.getPartitionPath())) {

Review comment:
       I removed the extra line breaks. I wonder if it is the problem you said?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io edited a comment on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706814593


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=h1) Report
   > Merging [#1978](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1978/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #1978   +/-   ##
   =========================================
     Coverage     53.60%   53.60%           
   - Complexity     2846     2847    +1     
   =========================================
     Files           359      359           
     Lines         16548    16546    -2     
     Branches       1780     1780           
   =========================================
     Hits           8870     8870           
   + Misses         6920     6917    -3     
   - Partials        758      759    +1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.71% <ø> (+0.01%)` | `1794.00 <ø> (+1.00)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=) | `66.66% <0.00%> (-3.61%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: |
   | [...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==) | `76.19% <0.00%> (ø)` | `20.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=) | `88.00% <0.00%> (ø)` | `3.00% <0.00%> (ø%)` | |
   | [...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==) | | | |
   | [...e/hudi/exception/SchemaCompatibilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGliaWxpdHlFeGNlcHRpb24uamF2YQ==) | `33.33% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | `50.00% <0.00%> (+16.66%)` | `2.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-692196027


   @n3nash : not sure if you had any issues w/ tests. my feeback is addressed and patch is good from my end. Please land it if you are fine with the patch. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-675926853


   @n3nash I modified it, you can confirm it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r480453812



##########
File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java
##########
@@ -213,36 +215,61 @@ private boolean checkIfValidCommit(HoodieTableMetaClient metaClient, String comm
           statements.add(generateStatement(rec.getRecordKey()));
           currentBatchOfRecords.add(rec);
           // iterator till we reach batch size
-          if (statements.size() >= multiGetBatchSize || !hoodieRecordIterator.hasNext()) {
-            // get results for batch from Hbase
-            Result[] results = doGet(hTable, statements);
-            // clear statements to be GC'd
-            statements.clear();
-            for (Result result : results) {
-              // first, attempt to grab location from HBase
-              HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
-              if (result.getRow() != null) {
-                String keyFromResult = Bytes.toString(result.getRow());
-                String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
-                String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
-                String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
-
-                if (checkIfValidCommit(metaClient, commitTs)) {
-                  currentRecord = new HoodieRecord(new HoodieKey(currentRecord.getRecordKey(), partitionPath),
+          if (hoodieRecordIterator.hasNext() && statements.size() < multiGetBatchSize) {
+            continue;
+          }
+
+          // get results for batch from Hbase
+          Result[] results = doGet(hTable, statements);
+          // clear statements to be GC'd
+          statements.clear();
+          for (Result result : results) {
+            // first, attempt to grab location from HBase
+            HoodieRecord currentRecord = currentBatchOfRecords.remove(0);
+
+            if (result.getRow() == null) {
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+            String keyFromResult = Bytes.toString(result.getRow());
+            String commitTs = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, COMMIT_TS_COLUMN));
+            String fileId = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, FILE_NAME_COLUMN));
+            String partitionPath = Bytes.toString(result.getValue(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN));
+
+            if (!checkIfValidCommit(metaClient, commitTs)) {
+              // if commit is invalid, treat this as a new taggedRecord
+              taggedRecords.add(currentRecord);
+              continue;
+            }
+
+
+            if (updatePartitionPath && !partitionPath.equals(currentRecord.getPartitionPath())) {

Review comment:
       It seems like only the following lines have changed but the PR changes indentation for other lines making it hard to understand the exact changes, can you please rework this to change only what is intended ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r484628744



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,46 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {

Review comment:
       yes, I see we have a test for true, but can you also add a test for false as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691505393






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691202637






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-689043555


   @hj2016 : can you fix the build failure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691505393






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-699029537


   @hj2016 Can you rebase and squash your commits please ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691379071


   @n3nash The problem nsivabalan said, I have fixed it a few days ago. I don’t know if you are talking about the same problem. If not, can I elaborate on it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-693735896


   @nsivabalan @hj2016 This PR is ready to be landed after the conflicts are addressed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-700783352


   > @n3nash I tried to rebase and squash the commits but there are still two commits. Can you help me?
   
   Since @hj2016 has some problem in rebasing. If you @n3nash have no other concern, you would use `squash and merge` button to merge the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
bvaradar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-685837476


   Will let @nsivabalan  and @n3nash   review this. @hj2016 : Can you rebase this PR  when you get a chance. This PR is being referenced in https://github.com/apache/hudi/issues/1941#issuecomment-685786730 and the issue reported might like to try this patch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r484632011



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,46 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {

Review comment:
       Okay, I will refine this use case later。




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r484922986



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,53 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {
+    final String newCommitTime = "001";
+    final int numRecords = 10;
+    final String oldPartitionPath = "1970/01/01";
+
+    // init old index to hbase
+    List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, numRecords);
+    JavaRDD<HoodieRecord> writeRecords = jsc().parallelize(records, 1);
+    HoodieWriteConfig config = getConfig(true);
+    HBaseIndex index = new HBaseIndex(getConfig(true));
+
+    try (HoodieWriteClient writeClient = getHoodieWriteClient(config);) {
+      // allowed path change test
+      metaClient = HoodieTableMetaClient.reload(metaClient);
+      HoodieTable hoodieTable = HoodieTable.create(metaClient, config, hadoopConf);
+
+      JavaRDD<HoodieRecord> records1 = index.tagLocation(writeRecords, jsc(), hoodieTable);
+      assertEquals(0, records1.filter(record -> record.isCurrentLocationKnown()).count());
+      writeClient.startCommitWithTime(newCommitTime);
+      JavaRDD<WriteStatus> writeStatues = writeClient.upsert(writeRecords, newCommitTime);
+      writeClient.commit(newCommitTime, writeStatues);
+      assertNoWriteErrors(writeStatues.collect());
+      // mock old partition data
+      Table table = utility.getConnection().getTable(TableName.valueOf(TABLE_NAME));

Review comment:
        I modified it, can you confirm it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codecov-io commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706814593


   # [Codecov](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=h1) Report
   > Merging [#1978](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hudi/commit/585ce0094d6527bab988f7657b4e84d12274ee28?el=desc) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/1978/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #1978   +/-   ##
   =========================================
     Coverage     53.60%   53.60%           
   - Complexity     2846     2847    +1     
   =========================================
     Files           359      359           
     Lines         16548    16546    -2     
     Branches       1780     1780           
   =========================================
     Hits           8870     8870           
   + Misses         6920     6917    -3     
   - Partials        758      759    +1     
   ```
   
   | Flag | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | #hudicli | `38.37% <ø> (ø)` | `193.00 <ø> (ø)` | |
   | #hudiclient | `100.00% <ø> (ø)` | `0.00 <ø> (ø)` | |
   | #hudicommon | `54.71% <ø> (+0.01%)` | `1794.00 <ø> (+1.00)` | |
   | #hudihadoopmr | `33.05% <ø> (ø)` | `181.00 <ø> (ø)` | |
   | #hudispark | `65.48% <ø> (ø)` | `304.00 <ø> (ø)` | |
   | #huditimelineservice | `62.29% <ø> (ø)` | `50.00 <ø> (ø)` | |
   | #hudiutilities | `69.98% <ø> (ø)` | `325.00 <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/1978?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../main/java/org/apache/hudi/common/util/Option.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvT3B0aW9uLmphdmE=) | `66.66% <0.00%> (-3.61%)` | `23.00% <0.00%> (+1.00%)` | :arrow_down: |
   | [...in/scala/org/apache/hudi/IncrementalRelation.scala](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1zcGFyay9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2h1ZGkvSW5jcmVtZW50YWxSZWxhdGlvbi5zY2FsYQ==) | `76.19% <0.00%> (ø)` | `20.00% <0.00%> (ø%)` | |
   | [...rg/apache/hudi/common/util/SerializationUtils.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3V0aWwvU2VyaWFsaXphdGlvblV0aWxzLmphdmE=) | `88.00% <0.00%> (ø)` | `3.00% <0.00%> (ø%)` | |
   | [...e/hudi/exception/SchemaCompatabilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGFiaWxpdHlFeGNlcHRpb24uamF2YQ==) | | | |
   | [...e/hudi/exception/SchemaCompatibilityException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL1NjaGVtYUNvbXBhdGliaWxpdHlFeGNlcHRpb24uamF2YQ==) | `33.33% <0.00%> (ø)` | `1.00% <0.00%> (?%)` | |
   | [...ava/org/apache/hudi/exception/HoodieException.java](https://codecov.io/gh/apache/hudi/pull/1978/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvZXhjZXB0aW9uL0hvb2RpZUV4Y2VwdGlvbi5qYXZh) | `50.00% <0.00%> (+16.66%)` | `2.00% <0.00%> (ø%)` | |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691505393


   @nsivabalan are you good with the test changes like @hj2016 is asking? please clarify 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691202637


   @hj2016 Please rebase and fix the tests, we can land after that. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691579116


   > @nsivabalan are you good with the test changes like @hj2016 is asking? please clarify
   
   I did review the commit that addressed my concern and only then approved the PR. @n3nash had some formatting concerns. So left it to him to merge once his feedback is addressed. 
   So, the patch is good from my standpoint. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 edited a comment on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 edited a comment on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-700475643


   @n3nash  I tried to  rebase and squash the commits but there are still two commits. Can you help me?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar merged pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1978:
URL: https://github.com/apache/hudi/pull/1978


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-700475643


   @n3nash  I tried to  rebase and squash the commits but there are still two commits. Can you help me,think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hj2016 commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
hj2016 commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r484009673



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,46 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {

Review comment:
       Do you mean to consider the HBASE_INDEX_UPDATE_PARTITION_PATH parameter true and false test cases when testing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-706806558






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-691202637


   @hj2016 Please rebase and fix the tests, we can land after that. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] n3nash commented on pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
n3nash commented on pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#issuecomment-684085510


   @hj2016 The identation is still off, could you please look into it ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r481292427



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,46 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {

Review comment:
       can we also have another test where, the config value is set to false and the record is tagged w/ old partition. since we have introduced a new config, would be good to have tests covering both branches. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #1978: [HUDI-1184] Fix the support of hbase index partition path change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #1978:
URL: https://github.com/apache/hudi/pull/1978#discussion_r484820440



##########
File path: hudi-client/src/test/java/org/apache/hudi/index/hbase/TestHBaseIndex.java
##########
@@ -156,6 +160,53 @@ public void testSimpleTagLocationAndUpdate() throws Exception {
     }
   }
 
+  @Test
+  public void testTagLocationAndPartitionPathUpdate() throws Exception {
+    final String newCommitTime = "001";
+    final int numRecords = 10;
+    final String oldPartitionPath = "1970/01/01";
+
+    // init old index to hbase
+    List<HoodieRecord> records = dataGen.generateInserts(newCommitTime, numRecords);
+    JavaRDD<HoodieRecord> writeRecords = jsc().parallelize(records, 1);
+    HoodieWriteConfig config = getConfig(true);
+    HBaseIndex index = new HBaseIndex(getConfig(true));
+
+    try (HoodieWriteClient writeClient = getHoodieWriteClient(config);) {
+      // allowed path change test
+      metaClient = HoodieTableMetaClient.reload(metaClient);
+      HoodieTable hoodieTable = HoodieTable.create(metaClient, config, hadoopConf);
+
+      JavaRDD<HoodieRecord> records1 = index.tagLocation(writeRecords, jsc(), hoodieTable);
+      assertEquals(0, records1.filter(record -> record.isCurrentLocationKnown()).count());
+      writeClient.startCommitWithTime(newCommitTime);
+      JavaRDD<WriteStatus> writeStatues = writeClient.upsert(writeRecords, newCommitTime);
+      writeClient.commit(newCommitTime, writeStatues);
+      assertNoWriteErrors(writeStatues.collect());
+      // mock old partition data
+      Table table = utility.getConnection().getTable(TableName.valueOf(TABLE_NAME));

Review comment:
       sorry, I missed to take a closer look at the test last time. instead of fiddling w/ table directly, can we test using index apis only. 
   - generate records with pp_1 and insert
   - update records with pp_2 and call index.tagLocation.
   You should be able to test both paths (false and true for the interested config) with this.
   true path:
       Verify that index.tagLocation returns 2 * #records, and half of them are referring to delete records in pp_1 and remaining records are pointing to pp_2
   false path:
       Verify that index.tagLocation return 1 * #records, and all of them are pointing to pp_1. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org