You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/12 13:08:57 UTC

[GitHub] [hive] zeroflag opened a new pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

zeroflag opened a new pull request #1856:
URL: https://github.com/apache/hive/pull/1856


   When table is created using CTAS, with transactional=false, data is loaded in managed location where as table metadata has location pointing to external location, resulting in 0 rows when querying the table.
   
   cc: @rajkrrsingh, @nrg4878 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zeroflag closed pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
zeroflag closed pull request #1856:
URL: https://github.com/apache/hive/pull/1856


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a change in pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r555819727



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13272,6 +13274,7 @@ private void updateDefaultTblProps(Map<String, String> source, Map<String, Strin
         retValue = convertToAcidByDefault(storageFormat, qualifiedTableName, sortCols, retValue);
       }
     }
+    retValue.put(TABLE_IS_CTAS, Boolean.toString(isCTAS));

Review comment:
       I think it is better to set this only when it is true, so we dont mud the rest of the tables with non-essential properties.

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java
##########
@@ -583,29 +584,34 @@ public Table transformCreateTable(Table table, List<String> processorCapabilitie
       throw new MetaException("Database " + dbName + " for table " + table.getTableName() + " could not be found");
     }
 
-      if (TableType.MANAGED_TABLE.name().equals(tableType)) {
+    if (TableType.MANAGED_TABLE.name().equals(tableType)) {
       LOG.debug("Table is a MANAGED_TABLE");
       txnal = params.get(TABLE_IS_TRANSACTIONAL);
       txn_properties = params.get(TABLE_TRANSACTIONAL_PROPERTIES);
+      boolean ctas = Boolean.valueOf(params.getOrDefault(TABLE_IS_CTAS, "false"));
       isInsertAcid = (txn_properties != null && txn_properties.equalsIgnoreCase("insert_only"));
       if ((txnal == null || txnal.equalsIgnoreCase("FALSE")) && !isInsertAcid) { // non-ACID MANAGED TABLE
-        LOG.info("Converting " + newTable.getTableName() + " to EXTERNAL tableType for " + processorId);
-        newTable.setTableType(TableType.EXTERNAL_TABLE.toString());
-        params.remove(TABLE_IS_TRANSACTIONAL);
-        params.remove(TABLE_TRANSACTIONAL_PROPERTIES);
-        params.put("EXTERNAL", "TRUE");
-        params.put(EXTERNAL_TABLE_PURGE, "TRUE");
-        params.put("TRANSLATED_TO_EXTERNAL", "TRUE");
-        newTable.setParameters(params);
-        LOG.info("Modified table params are:" + params.toString());
-
-        if (!table.isSetSd() || table.getSd().getLocation() == null) {
-          try {
-            Path newPath = hmsHandler.getWh().getDefaultTablePath(db, table.getTableName(), true);
-            newTable.getSd().setLocation(newPath.toString());
-            LOG.info("Modified location from null to " + newPath);
-          } catch (Exception e) {
-            LOG.warn("Exception determining external table location:" + e.getMessage());
+        if (ctas) {

Review comment:
       Shouldn't we be removing the property "TABLE_IS_CTAS" here from the table params if set. This way we have a net zero properties in the metadata?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zeroflag commented on a change in pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
zeroflag commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r571968371



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13272,6 +13274,7 @@ private void updateDefaultTblProps(Map<String, String> source, Map<String, Strin
         retValue = convertToAcidByDefault(storageFormat, qualifiedTableName, sortCols, retValue);
       }
     }
+    retValue.put(TABLE_IS_CTAS, Boolean.toString(isCTAS));

Review comment:
       OK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a change in pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r555819727



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -13272,6 +13274,7 @@ private void updateDefaultTblProps(Map<String, String> source, Map<String, Strin
         retValue = convertToAcidByDefault(storageFormat, qualifiedTableName, sortCols, retValue);
       }
     }
+    retValue.put(TABLE_IS_CTAS, Boolean.toString(isCTAS));

Review comment:
       I think it is better to set this only when it is true, so we dont mud the rest of the tables with non-essential properties.

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java
##########
@@ -583,29 +584,34 @@ public Table transformCreateTable(Table table, List<String> processorCapabilitie
       throw new MetaException("Database " + dbName + " for table " + table.getTableName() + " could not be found");
     }
 
-      if (TableType.MANAGED_TABLE.name().equals(tableType)) {
+    if (TableType.MANAGED_TABLE.name().equals(tableType)) {
       LOG.debug("Table is a MANAGED_TABLE");
       txnal = params.get(TABLE_IS_TRANSACTIONAL);
       txn_properties = params.get(TABLE_TRANSACTIONAL_PROPERTIES);
+      boolean ctas = Boolean.valueOf(params.getOrDefault(TABLE_IS_CTAS, "false"));
       isInsertAcid = (txn_properties != null && txn_properties.equalsIgnoreCase("insert_only"));
       if ((txnal == null || txnal.equalsIgnoreCase("FALSE")) && !isInsertAcid) { // non-ACID MANAGED TABLE
-        LOG.info("Converting " + newTable.getTableName() + " to EXTERNAL tableType for " + processorId);
-        newTable.setTableType(TableType.EXTERNAL_TABLE.toString());
-        params.remove(TABLE_IS_TRANSACTIONAL);
-        params.remove(TABLE_TRANSACTIONAL_PROPERTIES);
-        params.put("EXTERNAL", "TRUE");
-        params.put(EXTERNAL_TABLE_PURGE, "TRUE");
-        params.put("TRANSLATED_TO_EXTERNAL", "TRUE");
-        newTable.setParameters(params);
-        LOG.info("Modified table params are:" + params.toString());
-
-        if (!table.isSetSd() || table.getSd().getLocation() == null) {
-          try {
-            Path newPath = hmsHandler.getWh().getDefaultTablePath(db, table.getTableName(), true);
-            newTable.getSd().setLocation(newPath.toString());
-            LOG.info("Modified location from null to " + newPath);
-          } catch (Exception e) {
-            LOG.warn("Exception determining external table location:" + e.getMessage());
+        if (ctas) {

Review comment:
       Shouldn't we be removing the property "TABLE_IS_CTAS" here from the table params if set. This way we have a net zero properties in the metadata?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on pull request #1856:
URL: https://github.com/apache/hive/pull/1856#issuecomment-777185671


   Fix has been merged to master. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] mustafaiman commented on pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
mustafaiman commented on pull request #1856:
URL: https://github.com/apache/hive/pull/1856#issuecomment-775514851


   @zeroflag @nrg4878 is this ready to merge?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zeroflag commented on a change in pull request #1856: HIVE-24625 CTAS non transactional table loads data from incorrect path (amagyar, rajkrrsingh)

Posted by GitBox <gi...@apache.org>.
zeroflag commented on a change in pull request #1856:
URL: https://github.com/apache/hive/pull/1856#discussion_r571952378



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java
##########
@@ -583,29 +584,34 @@ public Table transformCreateTable(Table table, List<String> processorCapabilitie
       throw new MetaException("Database " + dbName + " for table " + table.getTableName() + " could not be found");
     }
 
-      if (TableType.MANAGED_TABLE.name().equals(tableType)) {
+    if (TableType.MANAGED_TABLE.name().equals(tableType)) {
       LOG.debug("Table is a MANAGED_TABLE");
       txnal = params.get(TABLE_IS_TRANSACTIONAL);
       txn_properties = params.get(TABLE_TRANSACTIONAL_PROPERTIES);
+      boolean ctas = Boolean.valueOf(params.getOrDefault(TABLE_IS_CTAS, "false"));
       isInsertAcid = (txn_properties != null && txn_properties.equalsIgnoreCase("insert_only"));
       if ((txnal == null || txnal.equalsIgnoreCase("FALSE")) && !isInsertAcid) { // non-ACID MANAGED TABLE
-        LOG.info("Converting " + newTable.getTableName() + " to EXTERNAL tableType for " + processorId);
-        newTable.setTableType(TableType.EXTERNAL_TABLE.toString());
-        params.remove(TABLE_IS_TRANSACTIONAL);
-        params.remove(TABLE_TRANSACTIONAL_PROPERTIES);
-        params.put("EXTERNAL", "TRUE");
-        params.put(EXTERNAL_TABLE_PURGE, "TRUE");
-        params.put("TRANSLATED_TO_EXTERNAL", "TRUE");
-        newTable.setParameters(params);
-        LOG.info("Modified table params are:" + params.toString());
-
-        if (!table.isSetSd() || table.getSd().getLocation() == null) {
-          try {
-            Path newPath = hmsHandler.getWh().getDefaultTablePath(db, table.getTableName(), true);
-            newTable.getSd().setLocation(newPath.toString());
-            LOG.info("Modified location from null to " + newPath);
-          } catch (Exception e) {
-            LOG.warn("Exception determining external table location:" + e.getMessage());
+        if (ctas) {

Review comment:
       We remove it from the outside at HiveMetaStore>>create_table_core. Since transformCreateTable is not always called (only if the transformer is not null).
   
   ```      
   if (tbl.getParameters() != null) {
       tbl.getParameters().remove(TABLE_IS_CTAS);
   }
     ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org