You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Alexander Petrossian (PAF) (Jira)" <ji...@apache.org> on 2024/01/11 12:05:00 UTC
[jira] [Commented] (ORC-1554) Filtering by columns, nested in LISTs
[ https://issues.apache.org/jira/browse/ORC-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805543#comment-17805543 ]
Alexander Petrossian (PAF) commented on ORC-1554:
-------------------------------------------------
I've prepared a fix, will report PR later, about 20 lines of code.
> Filtering by columns, nested in LISTs
> -------------------------------------
>
> Key: ORC-1554
> URL: https://issues.apache.org/jira/browse/ORC-1554
> Project: ORC
> Issue Type: Improvement
> Affects Versions: 1.9.2
> Reporter: Alexander Petrossian (PAF)
> Priority: Major
>
> Currently searchArgument supports fields inside arrays, and that works.
> We use even very nested columns and it works fine, row groups get properly included:
> {noformat}
> data.request.eventItem._elem.UsageEventItem.usage.CustomerFacingServiceUsage.relatedParty._elem.resource._elem.value
> {noformat}
> Alas, [allowSARGToFilter mechanism|ORC-743] does not handle values inside arrays.
> Two show-stoppers here.
> Small
> https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/OrcFilterContext.java#L80:
> {code:java}
> static boolean isNull(ColumnVector[] vectorBranch, int idx) throws IllegalArgumentException {
> for (ColumnVector v : vectorBranch) {
> if (v instanceof ListColumnVector || v instanceof MapColumnVector) {
> throw new IllegalArgumentException(String.format(
> "Found vector: %s in branch. List and Map vectors are not supported in isNull "
> + "determination", v));
> }
> {code}
> Big
> https://github.com/apache/orc/blob/v1.9.2/java/core/src/java/org/apache/orc/impl/filter/LeafFilter.java#L70
> {code:java}
> ColumnVector[] branch = fc.findColumnVector(colName);
> ColumnVector v = branch[branch.length - 1];
> ...
> if (!OrcFilterContext.isNull(branch, rowIdx) &&
> allowWithNegation(v, rowIdx)) {
> {code}
> Here code is indexing *v* with *rowIdx*, which is totally wrong if v is nested into some LIST (or MAP).
> Row index iterates records.
> But v contains column values, which are potentially fewer or more than table records.
> Their indexing nature is different.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)