You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Vuk Ercegovac (JIRA)" <ji...@apache.org> on 2018/03/15 00:00:01 UTC

[jira] [Created] (IMPALA-6670) Executor-only impalads do not refresh their lib-cache entries

Vuk Ercegovac created IMPALA-6670:
-------------------------------------

             Summary: Executor-only impalads do not refresh their lib-cache entries
                 Key: IMPALA-6670
                 URL: https://issues.apache.org/jira/browse/IMPALA-6670
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
    Affects Versions: Impala 2.9.0
            Reporter: Vuk Ercegovac


When impalads are only executors, there is no way for their lib-cache entries to be refreshed. As far as I can tell, the version of the cached file will remain the same until the impalad is restarted (and a query with a udf/uda that references that file is eval'd on that node).

In contrast, impalads that are both executors and coordinators will receive metadata updates which will result in the cache entry being refreshed. Even in this mode, there is room for inconsistency (e.g., update the jar between coordination and evaluation), but all impalads can be made to converge.

Basic steps to repro:
 * Make two jars (I used impala-hive-udfs.jar), one with TestUdf.class and the other with TestUdf.class + ReplaceStringUdf.class
 * Clear the state

drop function scratch.identity(boolean);
 drop function scratch.replace_string(string);
 * cp part1.jar to tmp.jar

hadoop fs -cp -f /test-warehouse/scratch.db/part1.jar /test-warehouse/scratch.db/tmp.jar
 * create identity from tmp.jar

create function scratch.identity(boolean) returns boolean
 location '/test-warehouse/scratch.db/tmp.jar'
 symbol='org.apache.impala.TestUdf';
 * Run a query on all nodes

select count( *) from functional.alltypes where scratch.identity(bool_col) = bool_col;
 * cp part2.jar to tmp.jar

hadoop fs -cp -f /test-warehouse/scratch.db/part2.jar /test-warehouse/scratch.db/tmp.jar
 * create replace_string function

create function scratch.replace_string(string) returns string
 location '/test-warehouse/scratch.db/tmp.jar'
 symbol='org.apache.impala.ReplaceStringUdf';
 * run a query

select count( *) from functional.alltypes where scratch.replace_string(string_col) = string_col;

When all impalads are both executors and coordinators, the second query works.

With:

./bin/start-impala-cluster.py --num_coordinators=1

The second query always results in:

WARNINGS: ImpalaRuntimeException: Unable to find class.
 CAUSED BY: ClassNotFoundException: org.apache.impala.ReplaceStringUdf

(each backend still has the previous version of tmp.jar)

Currently, executors do not need metadata other than what is supplied by coordinators in the plan. Libs are excluded from this scheme; each impalad tries to maintain consistency with the lib files stored in the FS as of the time of function creation (little more complicated ...). 

One option here is that plans include lib version information so that impalads can know when a refresh is needed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)