You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/03 13:55:00 UTC
[jira] [Work logged] (HIVE-24169) HiveServer2 UDF cache

     [ https://issues.apache.org/jira/browse/HIVE-24169?focusedWorklogId=506864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506864 ]

ASF GitHub Bot logged work on HIVE-24169:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Nov/20 13:54
            Start Date: 03/Nov/20 13:54
    Worklog Time Spent: 10m 
      Work Description: vihangk1 commented on a change in pull request #1503:
URL: https://github.com/apache/hive/pull/1503#discussion_r516198595



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -448,8 +450,10 @@ public SessionState(HiveConf conf, String userName) {
         parentLoader, Collections.emptyList(), true);
     final ClassLoader currentLoader = AccessController.doPrivileged(addAction);
     this.sessionConf.setClassLoader(currentLoader);
+    Map<String, String> udfCacheMap = getUDFCacheMap();

Review comment:
       the map is already accessible to this class and there is no need to use a getter here.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/Registry.java
##########
@@ -583,7 +583,13 @@ public void unregisterFunction(String functionName) throws HiveException {
         }
         mFunctions.remove(functionName);
         fi.discarded();
+        FunctionResource[] resources = fi.getResources();
         if (fi.isPersistent()) {
+          Map<String, String> udfCacheMap = SessionState.getUDFCacheMap();
+          for(FunctionResource fr : resources){
+            //remove from udf cache if it's saved.
+            udfCacheMap.remove(fr.getResourceURI());

Review comment:
       Do we need to remove the downloaded file here too?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/util/ResourceDownloader.java
##########
@@ -71,42 +87,61 @@ public static boolean isFileUri(String value) {
     }
   }
 
-  public List<URI> resolveAndDownload(String source, boolean convertToUnix)
+  public List<URI> resolveAndDownload(String source, boolean convertToUnix, boolean useCache)
       throws URISyntaxException, IOException {
-    return resolveAndDownloadInternal(createURI(source), null, convertToUnix, true);
+    return resolveAndDownloadInternal(createURI(source), null, convertToUnix, true, useCache);
   }
 
   public List<URI> downloadExternal(URI source, String subDir, boolean convertToUnix)
       throws URISyntaxException, IOException {
-    return resolveAndDownloadInternal(source, subDir, convertToUnix, false);
+    return resolveAndDownloadInternal(source, subDir, convertToUnix, false, false);
+  }
+  public List<URI> downloadExternal(URI source, String subDir, boolean convertToUnix, boolean useCache)
+      throws URISyntaxException, IOException {
+    return resolveAndDownloadInternal(source, subDir, convertToUnix, false, useCache);
   }
 
   private List<URI> resolveAndDownloadInternal(URI source, String subDir,
-      boolean convertToUnix, boolean isLocalAllowed) throws URISyntaxException, IOException {
+      boolean convertToUnix, boolean isLocalAllowed, boolean useCache) throws URISyntaxException, IOException {
     switch (getURLType(source)) {
     case FILE: return isLocalAllowed ? Collections.singletonList(source) : null;
     case IVY: return dependencyResolver.downloadDependencies(source);
     case HDFS:
     case OTHER:
-      return Collections.singletonList(createURI(downloadResource(source, subDir, convertToUnix)));
+      return Collections.singletonList(createURI(downloadResource(source, subDir, convertToUnix, useCache)));

Review comment:
       Looks like we are going to use the cache for HDFS source URLs too. Do we need to do that for HDFS as well? May be use cache only in case it is a remote file system.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
##########
@@ -491,6 +495,16 @@ public void resetThreadName() {
       Thread.currentThread().setName(names[names.length - 1].trim());
     }
   }
+  public static Map<String, String> getUDFCacheMap(){
+    return udfLocalResource;
+  }
+
+  public synchronized static File getUdfFileDir(){
+    if(udfFileDir == null){

Review comment:
       formatting is off here.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -3552,6 +3552,8 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
         "The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery."),
     HIVE_SERVER2_ZOOKEEPER_PUBLISH_CONFIGS("hive.server2.zookeeper.publish.configs", true,
         "Whether we should publish HiveServer2's configs to ZooKeeper."),
+    HIVE_SERVER2_UDF_CACHE_ENABLED("hive.server2.udf.cache.enabled", false,
+        "Whether HiveServer2 UDF cache is enabled. Disabled by default."),

Review comment:
       I think it would be good to more a more detailed information here. The point of having a description field is to have more details than the name itself :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 506864)
    Time Spent: 40m  (was: 0.5h)

> HiveServer2 UDF cache
> ---------------------
>
>                 Key: HIVE-24169
>                 URL: https://issues.apache.org/jira/browse/HIVE-24169
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 4.0.0
>            Reporter: Sam An
>            Assignee: Sam An
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> UDF is cache per session. This optional feature can help speed up UDF access in S3 scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)