You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/14 21:41:48 UTC

[GitHub] [incubator-hudi] afilipchik opened a new pull request #1515: [HUDI-795] Ignoring missing aux folde

afilipchik opened a new pull request #1515: [HUDI-795] Ignoring missing aux folde
URL: https://github.com/apache/incubator-hudi/pull/1515
 
 
   ## What is the purpose of the pull request
   
   Seeing breakages on GCS. When something removes the folder deleteAllInstantsOlderorEqualsInAuxMetaFolder breaks.
   
   ## Brief change log
   
     - Modifed HoodieCommitArchiveLog.deleteAllInstantsOlderorEqualsInAuxMetaFolder to ignore missing folder
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r409208032
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
 ##########
 @@ -219,14 +220,23 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {
 
 Review comment:
   not sure about it. Did see that folder disappears/reappears sometimes. Can be related to the way GCS connector treats a situation when all the files from the folder are removed. 
   
   On gcs folders are not real, they are just logical. So, file with name: gs://bucket/folder/file doesn't live in the folder "folder", it just has a long "/" separated name. I saw some logic in the GCS FS java implementation that creates fake file "/" when everything is removed from the logical folder. So, my speculation is that if this call fails or not enabled, that removing all the files from the folder will end up if folder being removed as well. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folde

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folde
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r408978903
 
 

 ##########
 File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
 ##########
 @@ -219,14 +220,23 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {
 
 Review comment:
   Can you please provide specific pointers on how the folder gets removed on GCS? Is it GCS specific? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r412571584



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##########
@@ -219,14 +220,29 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {
+      // On some FSs deletion of all files in the directory can auto remove the directory itself.
+      // GCS is one example, as it doesn't have real directories and subdirectories. When client
+      // removes all the files from a "folder" on GCS is has to create a special "/" to keep the folder
+      // around. If this doesn't happen (timeout, misconfigured client, ...) folder will be deleted and
+      // in this case we should not break when aux folder is not found.
+      // GCS information: (https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork)

Review comment:
       updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on issue #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#issuecomment-617739210


   LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
pratyakshsharma commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411592119



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##########
@@ -219,14 +220,29 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {
+      // On some FSs deletion of all files in the directory can auto remove the directory itself.
+      // GCS is one example, as it doesn't have real directories and subdirectories. When client
+      // removes all the files from a "folder" on GCS is has to create a special "/" to keep the folder
+      // around. If this doesn't happen (timeout, misconfigured client, ...) folder will be deleted and
+      // in this case we should not break when aux folder is not found.
+      // GCS information: (https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork)

Review comment:
       Guess it would be better to use multi line comments here like 
   /*
   *
   */




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] codecov-io commented on issue #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#issuecomment-617481634


   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=h1) Report
   > Merging [#1515](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/644c1cc8bd1965271c4433edccb17aa8fda5f403&el=desc) will **decrease** coverage by `0.55%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1515/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1515      +/-   ##
   ============================================
   - Coverage     72.17%   71.62%   -0.56%     
   - Complexity      289      294       +5     
   ============================================
     Files           372      378       +6     
     Lines         16267    16552     +285     
     Branches       1637     1672      +35     
   ============================================
   + Hits          11741    11855     +114     
   - Misses         3791     3965     +174     
   + Partials        735      732       -3     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [.../org/apache/hudi/table/HoodieCommitArchiveLog.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdGFibGUvSG9vZGllQ29tbWl0QXJjaGl2ZUxvZy5qYXZh) | `76.43% <66.66%> (-1.06%)` | `0.00 <0.00> (ø)` | |
   | [...di/hadoop/realtime/HoodieRealtimeRecordReader.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1oYWRvb3AtbXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaGFkb29wL3JlYWx0aW1lL0hvb2RpZVJlYWx0aW1lUmVjb3JkUmVhZGVyLmphdmE=) | `70.00% <0.00%> (-14.22%)` | `0.00% <0.00%> (ø%)` | |
   | [...ain/java/org/apache/hudi/avro/HoodieAvroUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvYXZyby9Ib29kaWVBdnJvVXRpbHMuamF2YQ==) | `84.82% <0.00%> (-7.18%)` | `0.00% <0.00%> (ø%)` | |
   | [...lities/checkpointing/KafkaConnectHdfsProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2NoZWNrcG9pbnRpbmcvS2Fma2FDb25uZWN0SGRmc1Byb3ZpZGVyLmphdmE=) | `89.28% <0.00%> (-3.03%)` | `14.00% <0.00%> (+2.00%)` | :arrow_down: |
   | [...i/common/table/timeline/TimelineMetadataUtils.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL1RpbWVsaW5lTWV0YWRhdGFVdGlscy5qYXZh) | `93.02% <0.00%> (-2.22%)` | `0.00% <0.00%> (ø%)` | |
   | [...i/utilities/deltastreamer/HoodieDeltaStreamer.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvSG9vZGllRGVsdGFTdHJlYW1lci5qYXZh) | `78.13% <0.00%> (-1.01%)` | `11.00% <0.00%> (+1.00%)` | :arrow_down: |
   | [...in/java/org/apache/hudi/utilities/UtilHelpers.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL1V0aWxIZWxwZXJzLmphdmE=) | `64.70% <0.00%> (-0.71%)` | `22.00% <0.00%> (+1.00%)` | :arrow_down: |
   | [...ain/java/org/apache/hudi/io/HoodieWriteHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllV3JpdGVIYW5kbGUuamF2YQ==) | `74.46% <0.00%> (-0.54%)` | `0.00% <0.00%> (ø%)` | |
   | [...i/common/table/timeline/HoodieDefaultTimeline.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvY29tbW9uL3RhYmxlL3RpbWVsaW5lL0hvb2RpZURlZmF1bHRUaW1lbGluZS5qYXZh) | `92.30% <0.00%> (-0.12%)` | `0.00% <0.00%> (ø%)` | |
   | [...in/java/org/apache/hudi/io/HoodieAppendHandle.java](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvaW8vSG9vZGllQXBwZW5kSGFuZGxlLmphdmE=) | `84.17% <0.00%> (ø)` | `0.00% <0.00%> (ø%)` | |
   | ... and [26 more](https://codecov.io/gh/apache/incubator-hudi/pull/1515/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=footer). Last update [644c1cc...7500d06](https://codecov.io/gh/apache/incubator-hudi/pull/1515?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411536924



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##########
@@ -219,14 +220,23 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {

Review comment:
       Will add comment. On create folder -> no, as GCS will create "folder" automatically. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1515: [HUDI-795] Ignoring missing aux folder

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1515:
URL: https://github.com/apache/incubator-hudi/pull/1515#discussion_r411490140



##########
File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieCommitArchiveLog.java
##########
@@ -219,14 +220,23 @@ private boolean deleteArchivedInstants(List<HoodieInstant> archivedInstants) thr
    * @throws IOException in case of error
    */
   private boolean deleteAllInstantsOlderorEqualsInAuxMetaFolder(HoodieInstant thresholdInstant) throws IOException {
-    List<HoodieInstant> instants = metaClient.scanHoodieInstantsFromFileSystem(
-        new Path(metaClient.getMetaAuxiliaryPath()), HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE, false);
+    List<HoodieInstant> instants = null;
+    boolean success = true;
+    try {
+      instants =
+          metaClient.scanHoodieInstantsFromFileSystem(
+              new Path(metaClient.getMetaAuxiliaryPath()),
+              HoodieActiveTimeline.VALID_EXTENSIONS_IN_ACTIVE_TIMELINE,
+              false);
+    } catch (FileNotFoundException e) {

Review comment:
       @afilipchik : Can you add the above comment in the Exception code block. Easy to understand the reason for this code. Also, Does this mean, we need to create the "aux" folder next time we create a file under aux ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org