You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/01/21 09:10:07 UTC

[GitHub] [hive] aasha opened a new pull request #883: HIVE-22736 Support multiple encryption zones in replication

aasha opened a new pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370152190
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -213,9 +227,18 @@ public int recycle(Path path, RecycleType type, boolean ifPurge) throws IOExcept
         switch (type) {
         case MOVE: {
           LOG.info("Moving {} to {}", path.toString(), cmPath.toString());
-
           // Rename fails if the file with same name already exist.
-          success = fs.rename(path, cmPath);
+          Retry<Boolean> retriable = new Retry<Boolean>(IOException.class) {
+            @Override
+            public Boolean execute() throws IOException {
+              return fs.rename(path, cmPath);
+            }
+          };
+          try {
+            success = retriable.run();
+          } catch (Exception e) {
+            throw new MetaException(org.apache.hadoop.util.StringUtils.stringifyException(e));
 
 Review comment:
   Yes in this case it is just an IOException. But wanted to keep the Retry interface generic. So added Exception.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370046276
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -213,9 +227,18 @@ public int recycle(Path path, RecycleType type, boolean ifPurge) throws IOExcept
         switch (type) {
         case MOVE: {
           LOG.info("Moving {} to {}", path.toString(), cmPath.toString());
-
           // Rename fails if the file with same name already exist.
-          success = fs.rename(path, cmPath);
+          Retry<Boolean> retriable = new Retry<Boolean>(IOException.class) {
+            @Override
+            public Boolean execute() throws IOException {
+              return fs.rename(path, cmPath);
+            }
+          };
+          try {
+            success = retriable.run();
+          } catch (Exception e) {
+            throw new MetaException(org.apache.hadoop.util.StringUtils.stringifyException(e));
 
 Review comment:
   It can be a IOException

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371589038
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -150,16 +152,23 @@ private ReplChangeManager(Configuration conf) throws MetaException {
         if (MetastoreConf.getBoolVar(conf, ConfVars.REPLCMENABLED)) {
           ReplChangeManager.enabled = true;
           ReplChangeManager.conf = conf;
-
+          cmRootDir = MetastoreConf.getVar(conf, ConfVars.REPLCMDIR);
+          encryptedCmRootDir = MetastoreConf.getVar(conf, ConfVars.REPLCMENCRYPTEDDIR);
           //Create default cm root
-          Path cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
+          Path cmroot = new Path(cmRootDir);
+          HdfsEncryptionShim pathEncryptionShim = hadoopShims
+                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+          if (pathEncryptionShim.isPathEncrypted(cmroot)) {
+            LOG.warn(ConfVars.REPLCMDIR + " should not be encrypted. To pass cm dir for encrypted path use "
 
 Review comment:
   It will create issue if user has a single encryption zone setup and he has created a directory with permission to hive as cmroot. And user has not given the permission to create directory on encryption zone root to hive. In that case we will try to create .cmroot (the default value) and will fail. Should we consider this case ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370151392
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -138,15 +149,18 @@ private ReplChangeManager(Configuration conf) throws MetaException {
       if (!inited) {
         if (MetastoreConf.getBoolVar(conf, ConfVars.REPLCMENABLED)) {
           ReplChangeManager.enabled = true;
-          ReplChangeManager.cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
           ReplChangeManager.conf = conf;
 
+          //Create default cm root
+          Path cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
           FileSystem cmFs = cmroot.getFileSystem(conf);
           // Create cmroot with permission 700 if not exist
           if (!cmFs.exists(cmroot)) {
             cmFs.mkdirs(cmroot);
             cmFs.setPermission(cmroot, new FsPermission("700"));
           }
+          cmRootMapping.put(NO_ENCRYPTION, cmroot);
 
 Review comment:
   This is for REPLCMDIR. This will always be unencrypted. Encryption cm directory we have separated. If there is any unencrypted path for which recycle is called, we will use this. Else we will create a cm for that zone

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370051185
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -493,4 +524,29 @@ public static String joinWithSeparator(Iterable<?> strings) {
   public static String[] getListFromSeparatedString(String commaSeparatedString) {
     return commaSeparatedString.split("\\s*" + TXN_WRITE_EVENT_FILE_SEPARATOR + "\\s*");
   }
+
+  private static Path getCmRoot(Path path) throws IOException {
+    Path cmroot = null;
+    HdfsEncryptionShim pathEncryptionShim = hadoopShims.createHdfsEncryptionShim(path.getFileSystem(conf), conf);
+    if (!pathEncryptionShim.isPathEncrypted(path)) {
+      cmroot = cmRootMapping.get(NO_ENCRYPTION);
+    } else {
+      EncryptionZone encryptionZone = pathEncryptionShim.getEncryptionZoneForPath(path);
+      cmroot = cmRootMapping.get(encryptionZone.getPath());
+      if (cmroot == null) {
+        synchronized (instance) {
+          cmroot = new Path(path.getFileSystem(conf).getUri() + encryptionZone.getPath()
+                  + MetastoreConf.getVar(conf, ConfVars.REPLCMENCRYPTEDDIR));
+          FileSystem cmFs = cmroot.getFileSystem(conf);
+          // Create cmroot with permission 700 if not exist
+          if (!cmFs.exists(cmroot)) {
+            cmFs.mkdirs(cmroot);
+            cmFs.setPermission(cmroot, new FsPermission("700"));
+          }
+          cmRootMapping.put(encryptionZone.getPath(), cmroot);
 
 Review comment:
   Do we need a map..or a set is fine ..as the mapping from EZ root to CM root is constant ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371588458
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java
 ##########
 @@ -48,6 +48,8 @@ public void setUp() throws Exception {
     MetastoreConf.setBoolVar(conf, ConfVars.METASTORE_CACHE_CAN_USE_EVENT, true);
     MetastoreConf.setBoolVar(conf, ConfVars.HIVE_TXN_STATS_ENABLED, true);
     MetastoreConf.setBoolVar(conf, ConfVars.AGGREGATE_STATS_CACHE_ENABLED, false);
+    MetastoreConf.setBoolVar(conf, ConfVars.REPLCMENABLED, true);
+    MetastoreConf.setVar(conf, ConfVars.REPLCMDIR, "cmroot");
 
 Review comment:
   why we need to set this ? is it not working with default value ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373916040
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -572,4 +579,12 @@ private static void createCmRoot(Path cmroot) throws IOException {
       cmFs.setPermission(cmroot, new FsPermission("700"));
     }
   }
+
+  @VisibleForTesting
+  static void deinit() {
 
 Review comment:
   Name Should be deInit ..any other better name reset or something 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370047831
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -461,12 +485,19 @@ static void scheduleCMClearer(Configuration conf) {
           .namingPattern(CM_THREAD_NAME_PREFIX + "%d")
           .daemon(true)
           .build());
-      executor.scheduleAtFixedRate(new CMClearer(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR),
-          MetastoreConf.getTimeVar(conf, ConfVars.REPLCMRETIAN, TimeUnit.SECONDS), conf),
-          0, MetastoreConf.getTimeVar(conf, ConfVars.REPLCMINTERVAL, TimeUnit.SECONDS), TimeUnit.SECONDS);
+      for (Path cmroot : cmRootMapping.values()) {
 
 Review comment:
   The map is populated at run time..so at init time there may be no entry added to the map. I think we should send the map to the scheduler and let it iterate to cleanup the cm root for each EZ in the map.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370049810
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -493,4 +524,29 @@ public static String joinWithSeparator(Iterable<?> strings) {
   public static String[] getListFromSeparatedString(String commaSeparatedString) {
     return commaSeparatedString.split("\\s*" + TXN_WRITE_EVENT_FILE_SEPARATOR + "\\s*");
   }
+
+  private static Path getCmRoot(Path path) throws IOException {
+    Path cmroot = null;
+    HdfsEncryptionShim pathEncryptionShim = hadoopShims.createHdfsEncryptionShim(path.getFileSystem(conf), conf);
+    if (!pathEncryptionShim.isPathEncrypted(path)) {
+      cmroot = cmRootMapping.get(NO_ENCRYPTION);
+    } else {
+      EncryptionZone encryptionZone = pathEncryptionShim.getEncryptionZoneForPath(path);
+      cmroot = cmRootMapping.get(encryptionZone.getPath());
 
 Review comment:
   will it work fine if the path is not encrypted ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371620301
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -442,32 +451,39 @@ public static boolean isCMFileUri(Path fromPath) {
     public void run() {
       try {
         LOG.info("CMClearer started");
-
-        long now = System.currentTimeMillis();
-        FileSystem fs = cmroot.getFileSystem(conf);
-        FileStatus[] files = fs.listStatus(cmroot);
-
-        for (FileStatus file : files) {
-          long modifiedTime = file.getModificationTime();
-          if (now - modifiedTime > secRetain*1000) {
-            try {
-              if (fs.getXAttrs(file.getPath()).containsKey(REMAIN_IN_TRASH_TAG)) {
-                boolean succ = Trash.moveToAppropriateTrash(fs, file.getPath(), conf);
-                if (succ) {
-                  LOG.debug("Move " + file.toString() + " to trash");
-                } else {
-                  LOG.warn("Fail to move " + file.toString() + " to trash");
-                }
-              } else {
-                boolean succ = fs.delete(file.getPath(), false);
-                if (succ) {
-                  LOG.debug("Remove " + file.toString());
+        for (String encryptionZone : encryptionZones) {
+          Path cmroot;
 
 Review comment:
   Yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371604799
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java
 ##########
 @@ -48,6 +48,8 @@ public void setUp() throws Exception {
     MetastoreConf.setBoolVar(conf, ConfVars.METASTORE_CACHE_CAN_USE_EVENT, true);
     MetastoreConf.setBoolVar(conf, ConfVars.HIVE_TXN_STATS_ENABLED, true);
     MetastoreConf.setBoolVar(conf, ConfVars.AGGREGATE_STATS_CACHE_ENABLED, false);
+    MetastoreConf.setBoolVar(conf, ConfVars.REPLCMENABLED, true);
+    MetastoreConf.setVar(conf, ConfVars.REPLCMDIR, "cmroot");
 
 Review comment:
   REPLCMDIR default value is /user/${system:user.name}/repl/functions/
   UTs are not able to resolve ${system:user.name}

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371620277
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -150,16 +152,23 @@ private ReplChangeManager(Configuration conf) throws MetaException {
         if (MetastoreConf.getBoolVar(conf, ConfVars.REPLCMENABLED)) {
           ReplChangeManager.enabled = true;
           ReplChangeManager.conf = conf;
-
+          cmRootDir = MetastoreConf.getVar(conf, ConfVars.REPLCMDIR);
+          encryptedCmRootDir = MetastoreConf.getVar(conf, ConfVars.REPLCMENCRYPTEDDIR);
           //Create default cm root
-          Path cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
+          Path cmroot = new Path(cmRootDir);
+          HdfsEncryptionShim pathEncryptionShim = hadoopShims
+                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+          if (pathEncryptionShim.isPathEncrypted(cmroot)) {
+            LOG.warn(ConfVars.REPLCMDIR + " should not be encrypted. To pass cm dir for encrypted path use "
 
 Review comment:
   Yes this case needs to be handled

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371589321
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -442,32 +451,39 @@ public static boolean isCMFileUri(Path fromPath) {
     public void run() {
       try {
         LOG.info("CMClearer started");
-
-        long now = System.currentTimeMillis();
-        FileSystem fs = cmroot.getFileSystem(conf);
-        FileStatus[] files = fs.listStatus(cmroot);
-
-        for (FileStatus file : files) {
-          long modifiedTime = file.getModificationTime();
-          if (now - modifiedTime > secRetain*1000) {
-            try {
-              if (fs.getXAttrs(file.getPath()).containsKey(REMAIN_IN_TRASH_TAG)) {
-                boolean succ = Trash.moveToAppropriateTrash(fs, file.getPath(), conf);
-                if (succ) {
-                  LOG.debug("Move " + file.toString() + " to trash");
-                } else {
-                  LOG.warn("Fail to move " + file.toString() + " to trash");
-                }
-              } else {
-                boolean succ = fs.delete(file.getPath(), false);
-                if (succ) {
-                  LOG.debug("Remove " + file.toString());
+        for (String encryptionZone : encryptionZones) {
+          Path cmroot;
 
 Review comment:
   Do we have any existing test case for CMCleaner ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370040731
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -452,6 +452,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     REPLCMRETIAN("hive.repl.cm.retain","24h",
         new TimeValidator(TimeUnit.HOURS),
         "Time to retain removed files in cmrootdir."),
+    REPLCMENCRYPTEDDIR("hive.encrypted.repl.cmrootdir", "/cmroot/",
 
 Review comment:
   it can be .cmroot

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373925973
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -156,13 +157,21 @@ private ReplChangeManager(Configuration conf) throws MetaException {
           //Create default cm root
           Path cmroot = new Path(cmRootDir);
           createCmRoot(cmroot);
+          FileSystem cmRootFs = cmroot.getFileSystem(conf);
           HdfsEncryptionShim pathEncryptionShim = hadoopShims
-                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+                  .createHdfsEncryptionShim(cmRootFs, conf);
           if (pathEncryptionShim.isPathEncrypted(cmroot)) {
-            String encryptionZonePath = pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
-            encryptionZones.put(encryptionZonePath, cmroot);
+            //cm root cannot be encrypted. So we move the current cmroot data to a cmroot in a encrypted zone
+            String encryptionZonePath = cmRootFs.getUri()
+                    + pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
+            cmRootFs.rename(cmroot, new Path(encryptionZonePath + encryptedCmRootDir));
 
 Review comment:
   Yes we can do a copy but that will take a lot of time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370041263
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -452,6 +452,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
     REPLCMRETIAN("hive.repl.cm.retain","24h",
         new TimeValidator(TimeUnit.HOURS),
         "Time to retain removed files in cmrootdir."),
+    REPLCMENCRYPTEDDIR("hive.encrypted.repl.cmrootdir", "/cmroot/",
 
 Review comment:
   The name can hive.repl.cm.encryptionzone.rootdir

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373917644
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -531,30 +541,23 @@ private static Path getCmRoot(Path path) throws IOException {
     Path cmroot = null;
     if (enabled) {
       HdfsEncryptionShim pathEncryptionShim = hadoopShims.createHdfsEncryptionShim(path.getFileSystem(conf), conf);
-      if (!pathEncryptionShim.isPathEncrypted(path)) {
-        if (encryptionZones.containsKey(NO_ENCRYPTION)) {
-          cmroot = encryptionZones.get(NO_ENCRYPTION);
-        } else {
+      if (pathEncryptionShim.isPathEncrypted(path)) {
+        String encryptionZonePath = path.getFileSystem(conf).getUri()
+                + pathEncryptionShim.getEncryptionZoneForPath(path).getPath();
+        cmroot = new Path(encryptionZonePath + Path.SEPARATOR + encryptedCmRootDir);
+        if (!encryptionZones.contains(encryptionZonePath)) {
 
 Review comment:
   This check is done twice 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370045759
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -138,15 +149,18 @@ private ReplChangeManager(Configuration conf) throws MetaException {
       if (!inited) {
         if (MetastoreConf.getBoolVar(conf, ConfVars.REPLCMENABLED)) {
           ReplChangeManager.enabled = true;
-          ReplChangeManager.cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
           ReplChangeManager.conf = conf;
 
+          //Create default cm root
+          Path cmroot = new Path(MetastoreConf.getVar(conf, ConfVars.REPLCMDIR));
           FileSystem cmFs = cmroot.getFileSystem(conf);
           // Create cmroot with permission 700 if not exist
           if (!cmFs.exists(cmroot)) {
             cmFs.mkdirs(cmroot);
             cmFs.setPermission(cmroot, new FsPermission("700"));
           }
+          cmRootMapping.put(NO_ENCRYPTION, cmroot);
 
 Review comment:
   should we check if default path is not encrypted ..then only store it against NO_ENCRYPTION?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371589875
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##########
 @@ -2836,9 +2836,14 @@ private boolean checkTableDataShouldBeDeleted(Table tbl, boolean deleteData) {
      *                data from warehouse
      * @param shouldEnableCm If cm should be enabled
      */
-    private void deleteTableData(Path tablePath, boolean ifPurge, boolean shouldEnableCm) throws MetaException {
+    private void deleteTableData(Path tablePath, boolean ifPurge, boolean shouldEnableCm) {
       if (tablePath != null) {
-        wh.deleteDir(tablePath, true, ifPurge, shouldEnableCm);
+        try {
+          wh.deleteDir(tablePath, true, ifPurge, shouldEnableCm);
+        } catch (MetaException e) {
 
 Review comment:
   shall we change it to any exception ..including run time exception ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373918256
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -156,13 +157,21 @@ private ReplChangeManager(Configuration conf) throws MetaException {
           //Create default cm root
           Path cmroot = new Path(cmRootDir);
           createCmRoot(cmroot);
+          FileSystem cmRootFs = cmroot.getFileSystem(conf);
           HdfsEncryptionShim pathEncryptionShim = hadoopShims
-                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+                  .createHdfsEncryptionShim(cmRootFs, conf);
           if (pathEncryptionShim.isPathEncrypted(cmroot)) {
-            String encryptionZonePath = pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
-            encryptionZones.put(encryptionZonePath, cmroot);
+            //cm root cannot be encrypted. So we move the current cmroot data to a cmroot in a encrypted zone
+            String encryptionZonePath = cmRootFs.getUri()
+                    + pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
+            cmRootFs.rename(cmroot, new Path(encryptionZonePath + encryptedCmRootDir));
 
 Review comment:
   move will cause issue if some repl dumps are already there with old cm path 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370051549
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
 ##########
 @@ -915,6 +915,8 @@ public static ConfVars getMetaConf(String name) {
             "This class is used to store and retrieval of raw metadata objects such as table, database"),
     REPLCMDIR("metastore.repl.cmrootdir", "hive.repl.cmrootdir", "/user/${system:user.name}/cmroot/",
         "Root dir for ChangeManager, used for deleted files."),
+    REPLCMENCRYPTEDDIR("metastore.encrypted.repl.cmrootdir", "hive.encrypted.repl.cmrootdir", "/cmroot/",
+            "Root dir for ChangeManager if encryption zones are enabled, used for deleted files."),
 
 Review comment:
   Same as above ..the name should start with metastore.repl.cm.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
aasha commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373925908
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -156,13 +157,21 @@ private ReplChangeManager(Configuration conf) throws MetaException {
           //Create default cm root
           Path cmroot = new Path(cmRootDir);
           createCmRoot(cmroot);
+          FileSystem cmRootFs = cmroot.getFileSystem(conf);
           HdfsEncryptionShim pathEncryptionShim = hadoopShims
-                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+                  .createHdfsEncryptionShim(cmRootFs, conf);
           if (pathEncryptionShim.isPathEncrypted(cmroot)) {
 
 Review comment:
   At recycle we will throw an exception if ever a non encrypted path is passed. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370053902
 
 

 ##########
 File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 ##########
 @@ -2835,17 +2834,25 @@ private boolean checkTableDataShouldBeDeleted(Table tbl, boolean deleteData) {
      * @param tablePath
      * @param ifPurge completely purge the table (skipping trash) while removing
      *                data from warehouse
-     * @param db database the table belongs to
+     * @param shouldEnableCm If cm should be enabled
      */
-    private void deleteTableData(Path tablePath, boolean ifPurge, Database db) {
+    private void deleteTableData(Path tablePath, boolean ifPurge, boolean shouldEnableCm) throws MetaException {
+      if (tablePath != null) {
+        wh.deleteDir(tablePath, true, ifPurge, shouldEnableCm);
+      }
+    }
 
+    /**
+     * Deletes the data in a table's location, if it fails logs an error.
+     *
+     * @param tablePath
+     * @param ifPurge completely purge the table (skipping trash) while removing
+     *                data from warehouse
+     * @param db Database
+     */
+    private void deleteTableData(Path tablePath, boolean ifPurge, Database db) throws MetaException {
       if (tablePath != null) {
-        try {
-          wh.deleteDir(tablePath, true, ifPurge, db);
-        } catch (Exception e) {
-          LOG.error("Failed to delete table directory: " + tablePath +
-              " " + e.getMessage());
-        }
+        wh.deleteDir(tablePath, true, ifPurge, db);
 
 Review comment:
   for now ..just log the error ..ignore the exception while deleting.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373917503
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -156,13 +157,21 @@ private ReplChangeManager(Configuration conf) throws MetaException {
           //Create default cm root
           Path cmroot = new Path(cmRootDir);
           createCmRoot(cmroot);
+          FileSystem cmRootFs = cmroot.getFileSystem(conf);
           HdfsEncryptionShim pathEncryptionShim = hadoopShims
-                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+                  .createHdfsEncryptionShim(cmRootFs, conf);
           if (pathEncryptionShim.isPathEncrypted(cmroot)) {
 
 Review comment:
   if cmroot is encrypted ..it should throw exception ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r371588545
 
 

 ##########
 File path: itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniHS2.java
 ##########
 @@ -104,8 +104,11 @@
   public static void setupBeforeClass() throws Exception {
     MiniHS2.cleanupLocalDir();
     HiveConf conf = new HiveConf();
+   // conf.set(HiveConf.ConfVars.REPLCMDIR.varname, "hdfs://cmroot");
+   // conf.set(ConfVars.REPLCMENABLED.varname, "true");
 
 Review comment:
   remove the lines if not required.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r370057676
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -493,4 +524,29 @@ public static String joinWithSeparator(Iterable<?> strings) {
   public static String[] getListFromSeparatedString(String commaSeparatedString) {
     return commaSeparatedString.split("\\s*" + TXN_WRITE_EVENT_FILE_SEPARATOR + "\\s*");
   }
+
+  private static Path getCmRoot(Path path) throws IOException {
 
 Review comment:
   Need some synchronization to avoid multiple path being added for same EZ..does file system gives that gurantee ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication

Posted by GitBox <gi...@apache.org>.
maheshk114 commented on a change in pull request #883: HIVE-22736 Support multiple encryption zones in replication
URL: https://github.com/apache/hive/pull/883#discussion_r373916968
 
 

 ##########
 File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
 ##########
 @@ -156,13 +157,21 @@ private ReplChangeManager(Configuration conf) throws MetaException {
           //Create default cm root
           Path cmroot = new Path(cmRootDir);
           createCmRoot(cmroot);
+          FileSystem cmRootFs = cmroot.getFileSystem(conf);
           HdfsEncryptionShim pathEncryptionShim = hadoopShims
-                  .createHdfsEncryptionShim(cmroot.getFileSystem(conf), conf);
+                  .createHdfsEncryptionShim(cmRootFs, conf);
           if (pathEncryptionShim.isPathEncrypted(cmroot)) {
-            String encryptionZonePath = pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
-            encryptionZones.put(encryptionZonePath, cmroot);
+            //cm root cannot be encrypted. So we move the current cmroot data to a cmroot in a encrypted zone
+            String encryptionZonePath = cmRootFs.getUri()
+                    + pathEncryptionShim.getEncryptionZoneForPath(cmroot).getPath();
+            cmRootFs.rename(cmroot, new Path(encryptionZonePath + encryptedCmRootDir));
 
 Review comment:
   validation of encryptedCmRootDir should be done first for absolute path and then we should do the move 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org