You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2016/02/24 16:00:25 UTC

[jira] [Commented] (HIVE-13132) Hive should lazily load and cache metastore (permanent) functions

    [ https://issues.apache.org/jira/browse/HIVE-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163143#comment-15163143 ] 

Alan Gates commented on HIVE-13132:
-----------------------------------

Several comments:
# It would be good to test whether HIVE-2573 solves the issue, since there's no point in making further changes if it does.
# I see how this code prevents the system from repeatedly downloading the functions (since it tracks whether the metastore has been searched) but I don't see how it prevents pre-fetching all the functions at startup.
# I don't think using statics in the FunctionRegistry will work.  This will cause HiveServer2 to share the function names across sessions, which we don't want because there won't be a way to force new functions to be downloaded.  That is, HS2 will download the set of functions when it first starts, and not do so again because the static haveSearchedMetastore will be true.

cc [~jdere] and [~sershe] since both of you have done work in this area recently.

> Hive should lazily load and cache metastore (permanent) functions
> -----------------------------------------------------------------
>
>                 Key: HIVE-13132
>                 URL: https://issues.apache.org/jira/browse/HIVE-13132
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.1
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>         Attachments: HIVE-13132.1.patch
>
>
> In Hive 0.13.1, we have noticed that as the number of databases increases, the start-up time of the Hive interactive shell increases. This is because during start-up, all databases are iterated over to fetch the permanent functions to display in the {{SHOW FUNCTIONS}} output.
> {noformat:title=FunctionRegistry.java}
>   private static Set<String> getFunctionNames(boolean searchMetastore) {
>     Set<String> functionNames = mFunctions.keySet();
>     if (searchMetastore) {
>       functionNames = new HashSet<String>(functionNames);
>       try {
>         Hive db = getHive();
>         List<String> dbNames = db.getAllDatabases();
>         for (String dbName : dbNames) {
>           List<String> funcNames = db.getFunctions(dbName, "*");
>           for (String funcName : funcNames) {
>             functionNames.add(FunctionUtils.qualifyFunctionName(funcName, dbName));
>           }
>         }
>       } catch (Exception e) {
>         LOG.error(e);
>         // Continue on, we can still return the functions we've gotten to this point.
>       }
>     }
>     return functionNames;
>   }
> {noformat}
> Instead of eagerly loading all metastore functions, we should only load them the first time {{SHOW FUNCTIONS}} is invoked. We should also cache the results.
> Note that this issue may have been fixed by HIVE-2573, though I haven't verified this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)