You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2022/09/20 05:17:02 UTC

[impala] branch branch-4.1.1 updated (9cd363e06 -> 7dce6bc8f)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch branch-4.1.1
in repository https://gitbox.apache.org/repos/asf/impala.git


    from 9cd363e06 Pin to asf-impala-4.1 branch of native-toolchain
     new a8f953e7f IMPALA-11557: Fix memory leak in BlockingRowBatchQueue
     new 15c619734 IMPALA-11580: Fix memory leak in legacy catalog mode when applying incremental partition updates
     new 7dce6bc8f IMPALA-11342: Fix class loading in Hive UDFs' constructors

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/runtime/blocking-row-batch-queue.h            |  2 +-
 .../org/apache/impala/catalog/HdfsPartition.java     |  5 +++++
 .../org/apache/impala/catalog/ImpaladCatalog.java    |  8 ++++++--
 .../hive/executor/HiveJavaFunctionFactoryImpl.java   | 20 +++++++++++---------
 .../apache/impala/hive/executor/HiveUdfLoader.java   | 12 ++++--------
 .../org/apache/impala/hive/executor/UdfExecutor.java |  2 +-
 .../org/apache/impala/ImportsNearbyClassesUdf.java   |  5 +++++
 .../{UtilForUdf.java => UtilForUdfConstructor.java}  |  4 ++--
 .../{UtilForUdf.java => UtilForUdfInitialize.java}   |  4 ++--
 9 files changed, 37 insertions(+), 25 deletions(-)
 copy java/test-hive-udfs/src/main/java/org/apache/impala/{UtilForUdf.java => UtilForUdfConstructor.java} (89%)
 copy java/test-hive-udfs/src/main/java/org/apache/impala/{UtilForUdf.java => UtilForUdfInitialize.java} (89%)


[impala] 03/03: IMPALA-11342: Fix class loading in Hive UDFs' constructors

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.1
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 7dce6bc8f2051aba533073f5beca99867e730a09
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Fri Jun 10 20:18:14 2022 +0200

    IMPALA-11342: Fix class loading in Hive UDFs' constructors
    
    Loading new classes from the same jar in the constructor of UDFs
    did not work in the catalog because the URLClassLoader was closed
    too early. Extended the lifecycle of the class loader a bit to
    let the catalog finish all initialisation.
    
    Note that the instantiation of legacy Hive UDFs doesn't seem
    necessary in the catalog, we can get all relevant info from
    the class. Generic UDFs do need to be instantiated to be able
    to call initialize().
    
    Testing:
    - added new classes to load in test UDFs and loaded these
      in constructor / initialize()
    - ran the Hive UDF ee tests
    
    Merge conflicts in branch-4.1:
    - HiveJavaFunctionFactoryImpl.java ignores the case for GenericUDF
    - Ignores changes in GenericImportsNearbyClassesUdf.java
    
    Change-Id: If16e38b8fc3b2577a5d32104ea9e6948b9562e24
    Reviewed-on: http://gerrit.cloudera.org:8080/18611
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-on: http://gerrit.cloudera.org:8080/19009
    Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
---
 .../hive/executor/HiveJavaFunctionFactoryImpl.java   | 20 +++++++++++---------
 .../apache/impala/hive/executor/HiveUdfLoader.java   | 12 ++++--------
 .../org/apache/impala/hive/executor/UdfExecutor.java |  2 +-
 .../org/apache/impala/ImportsNearbyClassesUdf.java   |  5 +++++
 ...rbyClassesUdf.java => UtilForUdfConstructor.java} | 12 ++++--------
 ...arbyClassesUdf.java => UtilForUdfInitialize.java} | 12 ++++--------
 6 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java b/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
index c90b09bbe..44131874f 100644
--- a/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
+++ b/fe/src/main/java/org/apache/impala/hive/executor/HiveJavaFunctionFactoryImpl.java
@@ -44,15 +44,17 @@ public class HiveJavaFunctionFactoryImpl implements HiveJavaFunctionFactory {
     checkValidFunction(hiveFn);
     String jarUri = hiveFn.getResourceUris().get(0).getUri();
     String fnName = hiveFn.getDbName() + "." + hiveFn.getFunctionName();
-    HiveUdfLoader javaClass = HiveUdfLoader.createWithLocalPath(localLibPath, hiveFn);
-    switch (javaClass.getUDFClassType()) {
-      case UDF:
-        return new HiveLegacyJavaFunction(javaClass.getUDFClass(), hiveFn, retType,
-            paramTypes);
-      default:
-        throw new CatalogException("Function " + fnName + ": The class "
-            + jarUri + " does not derive "
-            + "from a known supported Hive UDF class (UDF).");
+    try (HiveUdfLoader javaClass
+        = HiveUdfLoader.createWithLocalPath(localLibPath, hiveFn)) {
+      switch (javaClass.getUDFClassType()) {
+        case UDF:
+          return new HiveLegacyJavaFunction(javaClass.getUDFClass(), hiveFn, retType,
+              paramTypes);
+        default:
+          throw new CatalogException("Function " + fnName + ": The class "
+              + jarUri + " does not derive "
+              + "from a known supported Hive UDF class (UDF).");
+      }
     }
   }
 
diff --git a/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfLoader.java b/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfLoader.java
index 15f84f54d..9eaa47c61 100644
--- a/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfLoader.java
+++ b/fe/src/main/java/org/apache/impala/hive/executor/HiveUdfLoader.java
@@ -38,7 +38,7 @@ import org.apache.log4j.Logger;
  * Class responsible for the Java reflection needed to fetch the UDF
  * class and function.
  */
-public class HiveUdfLoader {
+public class HiveUdfLoader implements AutoCloseable {
   private static final Logger LOG = Logger.getLogger(HiveUdfLoader.class);
   private final Class<?> udfClass_;
 
@@ -58,24 +58,20 @@ public class HiveUdfLoader {
    *     more classes, so we allow this flexibility. In this case, the caller is
    *     responsible to call "close" on this object.
    */
-  public HiveUdfLoader(String localJarPath, String className,
-      boolean persistLoader) throws CatalogException {
+  public HiveUdfLoader(String localJarPath, String className) throws CatalogException {
     LOG.debug("Loading UDF '" + className + "' from " + localJarPath);
     // If the localJarPath is not set, we use the System ClassLoader which
     // does not need to be tracked for closing.
     classLoader_ = getClassLoader(localJarPath);
     udfClass_  = loadUDFClass(className, classLoader_);
     classType_ = FunctionUtils.getUDFClassType(udfClass_);
-
-    if (!persistLoader) {
-      close();
-    }
   }
 
   public Class<?> getUDFClass() {
     return udfClass_;
   }
 
+  @Override
   public void close() {
     // We only need to close URLClassLoaders. If no jar was present at instantiation,
     // it uses the SystemClassLoader (leaving this in for legacy purposes, but I'm not
@@ -152,7 +148,7 @@ public class HiveUdfLoader {
         }
         localJarPathString = localJarPath.toString();
       }
-      return new HiveUdfLoader(localJarPathString, fn.getClassName(), false);
+      return new HiveUdfLoader(localJarPathString, fn.getClassName());
     } catch (Exception e) {
       String errorMsg = "Could not load class " + fn.getClassName() + " from "
           + "jar " + uri + ": " + e.getMessage();
diff --git a/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java b/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
index 79f225e70..3a15ec35a 100644
--- a/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
+++ b/fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
@@ -64,7 +64,7 @@ public class UdfExecutor {
     }
     try {
       checkValidRequest(request);
-      udfLoader_ = new HiveUdfLoader(location, request.fn.scalar_fn.symbol, true);
+      udfLoader_ = new HiveUdfLoader(location, request.fn.scalar_fn.symbol);
       hiveUdfExecutor_ = createHiveUdfExecutor(request, udfLoader_);
       LOG.debug("Loaded UDF '" + request.fn.scalar_fn.symbol + "' from "
           + request.local_location);
diff --git a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java b/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
index 05fbdd7b6..3aa31e9b1 100644
--- a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
+++ b/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
@@ -27,6 +27,11 @@ import org.apache.hadoop.io.Text;
  * the class loader.
  */
 public class ImportsNearbyClassesUdf extends UDF {
+  public ImportsNearbyClassesUdf() {
+    // Ensure that new classes can be loaded in constructor.
+    UtilForUdfConstructor.getHello();
+  }
+
   public Text evaluate(Text para) throws ParseException {
     return new Text(UtilForUdf.getHello());
   }
diff --git a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java b/java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfConstructor.java
similarity index 75%
copy from java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
copy to java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfConstructor.java
index 05fbdd7b6..b0534bbdb 100644
--- a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
+++ b/java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfConstructor.java
@@ -21,13 +21,9 @@ import java.text.ParseException;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.Text;
 
-/**
- * Helper for a regression test for IMPALA-8016: this
- * uses another class (from the same jar) at evaluate() time, needing
- * the class loader.
- */
-public class ImportsNearbyClassesUdf extends UDF {
-  public Text evaluate(Text para) throws ParseException {
-    return new Text(UtilForUdf.getHello());
+/** IMPALA-11342 regression test: Helper for ImportNearbyClasses. */
+public class UtilForUdfConstructor extends UDF {
+  public static String getHello() {
+    return "Hello";
   }
 }
diff --git a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java b/java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfInitialize.java
similarity index 75%
copy from java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
copy to java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfInitialize.java
index 05fbdd7b6..3118a3eaa 100644
--- a/java/test-hive-udfs/src/main/java/org/apache/impala/ImportsNearbyClassesUdf.java
+++ b/java/test-hive-udfs/src/main/java/org/apache/impala/UtilForUdfInitialize.java
@@ -21,13 +21,9 @@ import java.text.ParseException;
 import org.apache.hadoop.hive.ql.exec.UDF;
 import org.apache.hadoop.io.Text;
 
-/**
- * Helper for a regression test for IMPALA-8016: this
- * uses another class (from the same jar) at evaluate() time, needing
- * the class loader.
- */
-public class ImportsNearbyClassesUdf extends UDF {
-  public Text evaluate(Text para) throws ParseException {
-    return new Text(UtilForUdf.getHello());
+/** IMPALA-11342 regression test: Helper for ImportNearbyClasses. */
+public class UtilForUdfInitialize extends UDF {
+  public static String getHello() {
+    return "Hello";
   }
 }


[impala] 02/03: IMPALA-11580: Fix memory leak in legacy catalog mode when applying incremental partition updates

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.1
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 15c61973480d28d3d544560a8d0a77f1ffb5d6a3
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Sep 13 16:23:40 2022 +0800

    IMPALA-11580: Fix memory leak in legacy catalog mode when applying incremental partition updates
    
    In the legacy catalog mode, catalogd propagates incremental metadata
    updates at the partition level. While applying the updates, impalad
    reuses the existing partition objects and moves them to a new HdfsTable
    object. However, the partition objects are immutable, which means their
    reference to the old table object remains unchanged. JVM GC cannot
    collect the stale table objects since they still have active reference
    from the partitions, which results in memory leak.
    
    This patch fixes the issue by recreating a new partition object based on
    the existing partition object with the new table field.
    
    Tests:
     - Verified locally that after applying the patch, I don’t see the
       number of live HdfsTable objects keeps bumping.
    
    Change-Id: Ie04ff243c6b82c1a06c489da74353f2d8afe423a
    Reviewed-on: http://gerrit.cloudera.org:8080/18978
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
    Reviewed-on: http://gerrit.cloudera.org:8080/19008
    Tested-by: Quanlong Huang <hu...@gmail.com>
---
 fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java  | 5 +++++
 fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java | 8 ++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
index b29428999..5329e1b26 100644
--- a/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
+++ b/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
@@ -1248,6 +1248,10 @@ public class HdfsPartition extends CatalogObjectImpl
       this(partition.table_);
       oldInstance_ = partition;
       prevId_ = oldInstance_.id_;
+      copyFromPartition(partition);
+    }
+
+    public Builder copyFromPartition(HdfsPartition partition) {
       partitionKeyValues_ = partition.partitionKeyValues_;
       fileFormatDescriptor_ = partition.fileFormatDescriptor_;
       setFileDescriptors(partition);
@@ -1266,6 +1270,7 @@ public class HdfsPartition extends CatalogObjectImpl
       // Take over the in-flight events
       inFlightEvents_ = partition.inFlightEvents_;
       lastCompactionId_ = partition.lastCompactionId_;
+      return this;
     }
 
     public HdfsPartition build() {
diff --git a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
index a03393475..bed23effa 100644
--- a/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
@@ -494,8 +494,12 @@ public class ImpaladCatalog extends Catalog implements FeCatalog {
         for (PrunablePartition part : ((HdfsTable) existingTable).getPartitions()) {
           numExistingParts++;
           if (tHdfsTable.partitions.containsKey(part.getId())) {
-            Preconditions.checkState(
-                newHdfsTable.addPartitionNoThrow((HdfsPartition) part));
+            // Create a new partition instance under the new table object and copy all
+            // the fields of the existing partition.
+            HdfsPartition newPart = new HdfsPartition.Builder(newHdfsTable, part.getId())
+                .copyFromPartition((HdfsPartition) part)
+                .build();
+            Preconditions.checkState(newHdfsTable.addPartitionNoThrow(newPart));
           } else {
             numDeletedParts++;
           }


[impala] 01/03: IMPALA-11557: Fix memory leak in BlockingRowBatchQueue

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.1
in repository https://gitbox.apache.org/repos/asf/impala.git

commit a8f953e7f2b73fdc58f7e9384a6b63199066e2cb
Author: hexianqing <he...@126.com>
AuthorDate: Fri Sep 9 17:43:10 2022 +0800

    IMPALA-11557: Fix memory leak in BlockingRowBatchQueue
    
    'batch_queue_' is a pointer to store the RowBatches. It's
    initialized in the constructor but not deleted in the destructor.
    
    The way to fix in the patch is to use std::unique_ptr.
    
    Change-Id: I656316b6575ce74a03b83fcd45e772c763835d56
    Reviewed-on: http://gerrit.cloudera.org:8080/18960
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Quanlong Huang <hu...@gmail.com>
    Reviewed-on: http://gerrit.cloudera.org:8080/19007
    Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
---
 be/src/runtime/blocking-row-batch-queue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/runtime/blocking-row-batch-queue.h b/be/src/runtime/blocking-row-batch-queue.h
index f7fb63c4f..eb80db648 100644
--- a/be/src/runtime/blocking-row-batch-queue.h
+++ b/be/src/runtime/blocking-row-batch-queue.h
@@ -109,6 +109,6 @@ class BlockingRowBatchQueue {
   std::list<std::unique_ptr<RowBatch>> cleanup_queue_;
 
   /// BlockingQueue that stores the RowBatches
-  BlockingQueue<std::unique_ptr<RowBatch>, RowBatchBytesFn>* batch_queue_;
+  std::unique_ptr<BlockingQueue<std::unique_ptr<RowBatch>, RowBatchBytesFn>> batch_queue_;
 };
 }