You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2022/02/05 02:25:12 UTC

[GitHub] [gobblin] phet commented on a change in pull request #3459: [GOBBLIN-1602] Change hive table location and partition check to validate using FS r…

phet commented on a change in pull request #3459:
URL: https://github.com/apache/gobblin/pull/3459#discussion_r799924710



##########
File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/HiveCopyEntityHelper.java
##########
@@ -750,9 +751,14 @@ else if (desiredTargetExistingPaths.size() > 0) {
 
   private void checkPartitionedTableCompatibility(Table desiredTargetTable, Table existingTargetTable)
       throws IOException {
-    if (!desiredTargetTable.getDataLocation().equals(existingTargetTable.getDataLocation())) {
-      throw new HiveTableLocationNotMatchException(desiredTargetTable.getDataLocation(),
-          existingTargetTable.getDataLocation());
+    try {
+      if (!this.targetFs.resolvePath(desiredTargetTable.getDataLocation())
+          .equals(this.targetFs.resolvePath(existingTargetTable.getDataLocation()))) {
+        throw new HiveTableLocationNotMatchException(desiredTargetTable.getDataLocation(), existingTargetTable.getDataLocation());

Review comment:
       the exception args no longer reflect the equality operands.  could this message become confusing (e.g. if the two do match on their own, but however didn't upon mapping through `this.targetFs.resolvePath()`)?

##########
File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/UnpartitionedTableFileSet.java
##########
@@ -64,11 +65,21 @@ public UnpartitionedTableFileSet(String name, HiveDataset dataset, HiveCopyEntit
 
     Optional<Table> existingTargetTable = this.helper.getExistingTargetTable();
     if (existingTargetTable.isPresent()) {
-      if (!this.helper.getTargetTable().getDataLocation().equals(existingTargetTable.get().getDataLocation())) {
+      boolean path_mismatch = false;
+      try {
+        if (!this.helper.getTargetFs().resolvePath(this.helper.getTargetTable().getDataLocation())
+            .equals(this.helper.getTargetFs().resolvePath(existingTargetTable.get().getDataLocation()))) {
+          path_mismatch = true;
+        }
+      } catch (FileNotFoundException e) {
+        // If desired path does not exist, then user is defining a different snapshot path so check policy
+        path_mismatch = true;
+      }
+      if (path_mismatch) {

Review comment:
       I was actually contemplating similar (boolean flag) above, when I saw two code paths to the same exception thrown.  since java has no widespread, canonical lib for converting exception control flow into values (like scala's https://www.scala-lang.org/api/2.12.4/scala/util/control/Exception$.html ) there's no simple way to phrase this.
   
   as this already recurs twice, I'd seek a utility abstraction.  maybe just a static method taking a `FileSystem` and two 'locations' (what we call `.getDataLocation()` on).  it would be an equality predicate (capturing any exception within and converting to `false`).

##########
File path: gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/hive/HivePartitionFileSet.java
##########
@@ -194,9 +194,10 @@ private Partition getTargetPartition(Partition originPartition, Path targetLocat
     }
   }
 
-  private static void checkPartitionCompatibility(Partition desiredTargetPartition, Partition existingTargetPartition)
+  private void checkPartitionCompatibility(Partition desiredTargetPartition, Partition existingTargetPartition)
       throws IOException {
-    if (!desiredTargetPartition.getDataLocation().equals(existingTargetPartition.getDataLocation())) {
+    if (!hiveCopyEntityHelper.getTargetFs().resolvePath(desiredTargetPartition.getDataLocation())
+        .equals(hiveCopyEntityHelper.getTargetFs().resolvePath(existingTargetPartition.getDataLocation()))) {

Review comment:
       same Q here about exception args no longer paralleling the cmp




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org