You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/07/01 17:36:00 UTC

[jira] [Commented] (IMPALA-5444) Asynchronous code generation

    [ https://issues.apache.org/jira/browse/IMPALA-5444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149590#comment-17149590 ] 

ASF subversion and git services commented on IMPALA-5444:
---------------------------------------------------------

Commit 6c8a3dfc339e43a8992af2ff3429ba5940a061ec in impala's branch refs/heads/master from Daniel Becker
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6c8a3df ]

IMPALA-5444: Asynchronous code generation

This commit introduces optional asynchronous code generation.

Asynchronous code generation means that instead of waiting for codegen
to finish, the query starts in interpreted mode while codegen is done on
another thread.

All the function pointers that point to codegen'd functions are changed
to be atomic, wrapped in a CodegenFnPtr. These are initialised to
nullptr and as long as they are nullptr, the corresponding interpreted
functions are used (as before). When code generation is ready, the
funtion pointers are set by the codegen thread. No synchronisation is
needed as the function pointers are atomic and it is not a problem if,
at a given moment, only a subset of the codegen'd function pointers are
set and the rest are interpreted.

Asynchronous code generation can be turned on using the ASYNC_CODEGEN
boolean query option.

Testing:
 - In exhaustive mode, a limited number of end-to-end tests are run in
   async mode and with debug actions randomly delaying the codegen
   thread and the main thread after starting codegen to test various
   scenarios of relative timing. The number of such tests is kept
   small to avoid increasing the running time of the tests by too much.
 - Added a new end-to-end test, tests/query_test/test_async_codegen.py,
   which tests three relative timings:

    1. Async codegen finishes before query execution starts (only
       codegen'd code runs).
    2. Query execution finishes before async codegen finishes (only
       interpreted code runs).
    3. Async codegen finishes during query execution (both interpreted
       and condegen'd code runs, switching to codegen from interpreted
       mode.

Change-Id: Ia7cbfa7c6734dcf03641629429057d6a4194aa6b
Reviewed-on: http://gerrit.cloudera.org:8080/15105
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Asynchronous code generation
> ----------------------------
>
>                 Key: IMPALA-5444
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5444
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Michael Ho
>            Assignee: Daniel Becker
>            Priority: Minor
>              Labels: codegen
>
> Currently, codegen happens during the preparation phase of a query fragment. In other words, the query fragment cannot start running until the code generation is complete. There are queries in which the code generation time is taking a huge amount of time. While we should disable codegen in some exec nodes if we can accurately estimate in the planner that running without codegen will be better off (e.g. number of rows to process is relatively small), we will still pay the price if say the stats is stale or the estimation is off.
> With async codegen, the idea is that we should run the code generation in a separate thread so that codegen is not on the critical path of the query execution. Once codegen completes for a fragment, we can atomically swap the function pointers of compiled functions embedded in the exec nodes. The exec nodes all currently support falling back to interpretation if the codegend functions don't exist anyway (i.e. the pointer to the compiled function is NULL). In some cases, it can occur that the query may run to completion before codegen completes. Once IMPALA-3259 is fixed (if feasible), we should be able to cancel the codegen execution.
> Another thing to note is that we should be able to bound the codegen work to a set of threads in thread pool so as to control the CPU and memory resources consumed by codegen.
> Another potential extension of this decoupling is IMPALA-9660.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org