You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/07/16 21:54:00 UTC

[jira] [Resolved] (IMPALA-8486) test_udf_update_via_drop and test_udf_update_via_create fail on local catalog

     [ https://issues.apache.org/jira/browse/IMPALA-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang resolved IMPALA-8486.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.3.0

> test_udf_update_via_drop and test_udf_update_via_create fail on local catalog
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-8486
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8486
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Quanlong Huang
>            Priority: Critical
>              Labels: catalog-v2
>             Fix For: Impala 3.3.0
>
>
> {noformat}
>  TestUdfTargeted.test_udf_update_via_drop[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> tests/query_test/test_udfs.py:541: in test_udf_update_via_drop
>     self._run_query_all_impalads(exec_options, query_stmt, ["New UDF"])
> tests/query_test/test_udfs.py:52: in _run_query_all_impalads
>     assert result.data == expected
> E   assert ['Old UDF'] == ['New UDF']
> E     At index 0 diff: 'Old UDF' != 'New UDF'
> E     Full diff:
> E     - ['Old UDF']
> E     + ['New UDF']
> ----------------------------
> {noformat}
> The tests are checking that the local UDF caches on each impalad get invalidated by a drop/create of a function referencing the HDFS file containing the UDF. The test fails because the local catalog, unlike the regular catalog, doesn't invalidate LibCache entries upon receiving a catalog update.
> I looked at this for long enough to realise that the invalidation mechanism is fundamentally broken - it doesn't work with dedicated executors. It also creates a race between the statestore updates and queries referencing the UDFs - if the queries win the race, then they can incorrectly use the old version that should have been invalidated.
> I think this is a potentially problematic issue because old JAR/SO versions could persist in the cache indefinitely if old versions are overwritten in place.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)