You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2018/06/22 13:46:26 UTC

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10793


Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................

WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Some data errors (for example out-of-range parquet timestamps)
can dominate logs if a table contains a large number of rows with
invalid data. If an error has its own error code, then these errors
are already aggregated for the user in RuntimeState, but the logs
will contain a new line for every occurrence, which is rarely useful.

There is always a compromise to make between providing as much
information as possible and reducing log size - the one chosen
here is logging the first occurrence of the error when it happens and
the number of similar errors at the end (per query+table+rowgroup+column).

Utility class RuntimeState::LogCollector is added to collect
logs "locally", so not in RuntimeState, as RuntimeState is shared
between different scanner threads/fragments of a query and would
make some kind of locking necessary.

This change should improve the performance of scans hitting many
data errors even if logging is turned off, because the counting
of already occured errors no longer needs locking and the
substitution of paramaters in the error string is also avoided.

TODOs:
- add some kind of testing
- change other data errors (like out of range Kudu timestamp) to
  use a similar logic
- I am not completely satisfied with the interface, so I am open
  to suggestions

Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/runtime-state.cc
M be/src/runtime/runtime-state.h
4 files changed, 76 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/10793/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has abandoned this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Abandoned

Abandoning since https://gerrit.cloudera.org/#/c/18565/ was merged and it helps with the issue
-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Lars Volker <lv...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> RuntimeState is per fragment instance state. It's not shared between fragme
The join build thread is the other case to keep in mind.



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 22:24:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> My problem with changing RuntimeState::LogError() to log only the first err
So it sounds like the motivation is to be able to define a "scope" for what is consider to be redundant errors? I agree doing that in LogError() would not be straightforward.

My main concern with building more logic on top of LogError() is that the error log stuff seems a bit broken and so adding more complication on top is hard to reason about. Specifically, the use of max_errors query option seems inconsistent. So wondering if we need to step back and rethink how the warning collection works generally rather than adding another layer on top.

Anyway, I'm okay with adding this scoped warning collector thingy if you feel it's needed but it needs more explanation in the class comment.


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@234
PS1, Line 234:   /// when a new error arrives or when errors need to be flushed.
this comment needs more explanation. With the current comment, it's not easy for someone to determine whether they should use this class or just call LogError() directly. You should explain how this differs from just calling LogError().


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@241
PS1, Line 241: if there is no error 
isn't it: return true if there is already an error with this error code?



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Jun 2018 16:46:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

Thanks for the comments! I will not update the code today but I would like to respond to comments about the general approach:

"Currently we only store the first concrete error message per error code, but we might need all unique error messages."

In case of this error, every message would be the same for a given file+column, and column readers are not reused between columns/files. Things would be different if the values were logged, but data itself is considered sensitive information.

A new function like AddIfMessageIsUnique() could be added to LogCollector that would create a map from the messages and check uniqueness, but I would only add it if it was actually used somewhere. Such a function would also raise some questions: can the map grow without limit? If no, what to do if the limit was reached?


-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 18:17:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(4 comments)

I like this proposal, I'm only afraid that the aggregation level might be too coarse.

Currently we only store the first concrete error message per error code, but we might need all unique error messages.

Maybe it would be better to aggregate at the error message level, i.e. to store all unique error messages and count them.
This way we can't save constructing the error messages, but on the plus side, the interface could be simpler.

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.h@236
PS1, Line 236: runtime_
nit: I think it's confusing to call it 'runtime' since it usually has a different meaning.
I did a quick sampling over the Impala code base and found that we usually call these variables as 'state_'.


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc
File be/src/runtime/runtime-state.cc:

http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@216
PS1, Line 216:  
Maybe you could add a DCHECK that checks errors_.find(message.error()) == errors_.end()


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@221
PS1, Line 221:   lock_guard<SpinLock> l(runtime_->error_log_lock_);
             :   for (auto e: errors_) {
             :     // It is assumed that the first occurrence was already reported to RuntimeState.
             :     int new_error_count =  e.second.count - 1;
             :     runtime_->error_log_[e.first].count += new_error_count;
To me it seems only this part needs locking.
The second half of the for loop could be done in an other for loop without locking the spinlock.


http://gerrit.cloudera.org:8080/#/c/10793/1/be/src/runtime/runtime-state.cc@226
PS1, Line 226: PrintId(runtime_->query_id())
There's a chance that the compiler does it for us, but it might be a good idea to store the result of PrintId() since it is a quite costly function.



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 11:57:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

Csaba, any updates here?


-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 06 Sep 2018 20:21:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> The join build thread is the other case to keep in mind.
Right, but I don't think we do much LogError() from that (other than from inside the scan node), so it doesn't seem like it really contributes to lock contention.



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 22:32:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
            : will contain a new line for every occurrence
> seems like we could change that generally.
I was hesitant to go that way, because messages with the same error code but different parameters can contain useful information (for example different column names).


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> Right, but I don't think we do much LogError() from that (other than from i
I agree that  this logic would be mainly useful for scanners.  KuduSink has a per row error though (TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION).

I thought about adding these functions to scanner code, but I did not find a suitable class - HdfsScanner would be a good candidate, but it does not have common ancestors with KuduScanner, and both classes should be able to use this logic. Do you have a good place in mind?

The primary motivation is to avoid massive logging with this specific message. Avoiding locking came up as a secondary benefit, as it seemed trivial to implement it with the new interface.



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Jun 2018 14:45:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
            : will contain a new line for every occurrence
> But isn't that the point of the change (as you state below).  I guess it wo
The log lines to eliminate are the ones that report the exact same information. These would be merged to max two lines: 

1. one at the first occurrence of the error
2. one when the error collector is flushed - this would contain the number of errors since 1. 

This logic is probably more complicated then it should be, but it ensures that minimal information is lost.

It would be simpler to avoid the flushing logic by logging only the first errors, but as RuntimeState reports the number of occurrences per error code, some way is needed to increment the counters.


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> As the class is written today, it's not really tied to RuntimeState. You co
My problem with changing RuntimeState::LogError() to log only the first error for a code is that it would completely hide if an error occurs in more than one files/columns. Having one LogCollector per ColumnReader means that the error in one column cannot hide the one in another.

An alternate approach would be to keep a LogCollector per HdfsScanner, and keep the last message, and log only if the message was changed. This would simplify the interface and work well with column readers.

Another possibility would be to create a class to collect specially data errors. Its LogError() function could expect table/file/column parameters, and use them as the key of a map and count errors separately.



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Jun 2018 14:14:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
            : will contain a new line for every occurrence
seems like we could change that generally.


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
RuntimeState is per fragment instance state. It's not shared between fragments or even instances. In that regard, it's already the per-thread state for most of execution. The scanners are currently (with mt_dop=0) the important exception.

So then, what is the primary reason to add the new class? Is it to avoid the locking or avoid the log spew. For the former, perhaps this new class would be better to live somewhere specific to the scanners (otherwise, I think it's unclear whether code should use this class or call directly to the LogError() interface). For the latter, we can additionally consider fixing LogError() to avoid the redundant logs (note that most callers of LogError() are the scanners especially in cases that will produce many messages).



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Jun 2018 22:16:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@12
PS1, Line 12: the logs
            : will contain a new line for every occurrence
> I was hesitant to go that way, because messages with the same error code bu
But isn't that the point of the change (as you state below).  I guess it would help if you could summarize which cases you feel the logging is useful to preserve (even when the errors are aggregated) and which you feel the logging should be eliminated.


http://gerrit.cloudera.org:8080/#/c/10793/1//COMMIT_MSG@22
PS1, Line 22: fragments
> I agree that  this logic would be mainly useful for scanners.  KuduSink has
As the class is written today, it's not really tied to RuntimeState. You could put it anywhere, right?

But, it's still not clear to me what the rule of thumb is for when developers should use this new class vs. just calling SetError() directly going forward (if, as you say, sometimes the individual logging is useful).

If the primary motivation is to deal with the logging, why not just do something simpler, rather than introduce a whole class, like pass a boolean to SetError() (or add a parallel SetError() interface) that only logs the first time that error code is encountered (but doesn't introduce extra state)?



-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Jun 2018 15:32:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-7178: Add the possibility to reduce logging for common data errors

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/10793 )

Change subject: WIP IMPALA-7178: Add the possibility to reduce logging for common data errors
......................................................................


Patch Set 1:

Any updates? Should we abandon this?


-- 
To view, visit http://gerrit.cloudera.org:8080/10793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie3b7c1fd020a7ba5e0d9c619e1b67236dce198aa
Gerrit-Change-Number: 10793
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@gmail.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Dec 2018 21:54:46 +0000
Gerrit-HasComments: No