You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "JayjeetAtGithub (via GitHub)" <gi...@apache.org> on 2023/09/11 15:47:49 UTC

[GitHub] [arrow-datafusion] JayjeetAtGithub commented on pull request #7401: Implement `CardinalityAwareRowConverter` while doing streaming merge

JayjeetAtGithub commented on PR #7401:
URL: https://github.com/apache/arrow-datafusion/pull/7401#issuecomment-1714151452

   @alamb Thanks a lot for taking a look. Some thoughts/observations regarding the failing tests:
   
   Before exec: `dict<int, utf8>, int` -> `CardinalityAwareRowConvert::convert_columns` (preserve_dict=false) -> `Row`
   After exec: `Row`-> `CardinalityAwareRowConvert::convert_rows` -> `utf8, int`
   
   So, basically the exec node returns `utf8` instead of `dict`, which is what we want, but it causes type mismatch for the next exec nodes in the query plan, which is what we see in the test failures. Since, I was working on fixing the `GROUP BY`, I added code to convert back `utf8` arrays into `dict` arrays in the `GroupedHashAggregateStream`.  I think if we do something similar for the other exec nodes that deal with the cardinality aware row converter, we should be good. 
   
   Unfortunately, all these does not yet fix the `GROUP BY` OOM issue yet, and I am focusing on reproducing and fixing that end-to-end and verifying if the conversion back to `dict` is not causing any issues. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org