You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/06/09 07:42:00 UTC

[jira] [Commented] (IMPALA-5845) Impala should de-duplicate row parsing error

    [ https://issues.apache.org/jira/browse/IMPALA-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552017#comment-17552017 ] 

ASF subversion and git services commented on IMPALA-5845:
---------------------------------------------------------

Commit 7273cfdfb901b9ef564c2737cf00c7a8abb57f07 in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7273cfdfb ]

IMPALA-5845: Limit the number of non-fatal errors logging to INFO

RuntimeState::LogError() does both error aggregation to the coordinator
and logging the error to the log file depending on the vlog_level. This
can flood INFO log if the specified vlog_level is 1 and makes it
difficult to analyze other more significant log lines. This patch limits
the number of errors logged to INFO based on max_error_logs_per_instance
flag (default is 2000). When this number is exceeded, vlog_level=1 will
be downgraded to vlog_level=2.

To allow easy debugging in the future, this flag will be ignored if the
user sets query option max_errors < 0, which in that case all errors
targetting vlog_level 1 will be logged.

This patch also fixes a bug where the error count is not increased for
non-general error code that is already in 'error_log_' map.

Testing:
- Add test_logging.py::TestLoggingCore

Change-Id: I924768ec461735c172fbf75d6415033bbdb77f9b
Reviewed-on: http://gerrit.cloudera.org:8080/18565
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Impala should de-duplicate row parsing error
> --------------------------------------------
>
>                 Key: IMPALA-5845
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5845
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.1.0
>            Reporter: Juan Yu
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: ramp-up, supportability
>             Fix For: Impala 4.2.0
>
>
> Impala log file grew very quickly with lots of error like
>  I0824 10:44:46.527885  8679 runtime-state.cc:217] Error from query 804d64b80df65fda:a5349b0700000000: Error parsing row: file: hdfs://nameservice1/user/hive/tpcds.db/store_sales/00005.parq, before offset: 120795952
> There are 622000 errors for only 141 unique files
> Impala already de-duplicate similar error in lots of scenarios, could the row parsing error be de-duplicated as well to reduce log size and easier troubleshooting?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org