You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/03/07 13:50:00 UTC

[jira] [Commented] (IMPALA-11223) ASM files from different fragments conflict when using asm_module_dir

    [ https://issues.apache.org/jira/browse/IMPALA-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697440#comment-17697440 ] 

ASF subversion and git services commented on IMPALA-11223:
----------------------------------------------------------

Commit d98ab986a6a2218523ec147f70110300f019150b in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d98ab986a ]

IMPALA-11223: Use unique id to create codegen instances

When startup flag asm_module_dir is set, impalad will dump the codegen
disassembly to files under that folder. The file name is "id.asm" in
which "id" is the codegen instance id. Before IMPALA-4080 (f2837e9), we
used fragment instance id as the codegen id. After that, since codegen
is done in fragment level (shared by fragment instances), we use query
id instead. This introduces conflicts between different fragments. The
asm files will be overwritten.

The same conflict happens in dumping IR modules (when unopt_module_dir
or opt_module_dir is set).

This changes the codegen instance id to be "QueryID_FragmentName_PID".
The PID suffix is needed since we usually have several impalads running
together on our dev box.

Also adds logs when IR or disassembly are dumped to files. It helps to
know which instance performs the codegen.

Tests:
 - Manually verified the asm file names are expected.

Change-Id: I7672906365c916bbe750eeb9906cab38573e6c31
Reviewed-on: http://gerrit.cloudera.org:8080/19505
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> ASM files from different fragments conflict when using asm_module_dir
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-11223
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11223
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.1.0
>            Reporter: Joe McDonnell
>            Assignee: Quanlong Huang
>            Priority: Major
>
> For debugging codegen, it is useful to be able to inspect the generated assembly. The asm_module_dir startup parameter directs Impala to dump the codegen assembly to files in a directory. It currently dumps the assembly for a query into '${query_id}.asm', but there are actually multiple codegen operations going on in a single query and the output from one will overwrite the output from previous ones. I added a debug statement to the dumping code, and it gets called multiple times:
> {noformat}
> I0404 12:25:34.453413  6527 codegen-symbol-emitter.cc:58] 574df02d4b904ee9:9fa6181b00000001] Writing disassembly to: /data/Impala/logs/asm_module_dir/574df02d4b904ee9:9fa6181b00000000.asm
> I0404 12:25:34.463084  6528 codegen-symbol-emitter.cc:58] 574df02d4b904ee9:9fa6181b00000000] Writing disassembly to: /data/Impala/logs/asm_module_dir/574df02d4b904ee9:9fa6181b00000000.asm
> ...
> I0404 12:25:34.491320  6529 codegen-symbol-emitter.cc:58] 574df02d4b904ee9:9fa6181b00000004] Writing disassembly to: /data/Impala/logs/asm_module_dir/574df02d4b904ee9:9fa6181b00000000.asm{noformat}
> We should fix the filenames so that these collisions do not occur. One option would be to use the fragment id rather than the query id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org