You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2018/08/27 18:24:08 UTC

[GitHub] ilooner commented on issue #1445: DRILL-6706: fixed null pointer exception in HashJoin

ilooner commented on issue #1445: DRILL-6706: fixed null pointer exception in HashJoin
URL: https://github.com/apache/drill/pull/1445#issuecomment-416320678

@sachouche @vvysotskyi I don't agree this should be handled by the column sizes map. The issue is that operators are expecting a column with the name of MYCOLUMN (because that is the name provided by the planner), but instead the input column has a name of `MYCOLUMN` . This can cause errors at many points in an operator's execution, not just within the RecordBatchSizer's columnSizes map. For example, in HashJoin the HashTable uses the unquoted column names provided by the planner to retrieve the key column from the incoming record batch (See ChainedHashTable.createAndSetupHashTable). So while this fix resolves a fatal exception in the batch sizer, it does not address the issue of functional correctness in other parts of the code like the HashTable which may be silently generating incorrect results.

If we close this issue now with a temporary fix, some poor soul may spend weeks debugging strange and unexpected data correctness issues down the line. In order to avoid that scenario and to increase the urgency of fixing the root cause, I am actually thinking that we should leave the bug unfixed until we have a permanent fix for the parquet reader. What are your guys thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services