You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2023/03/06 09:23:00 UTC
[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate
[ https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shenxingwuying updated KUDU-3455:
---------------------------------
Description:
Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory.
{code:java}
// java
List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for (int idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>(); ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate predicate = predicates.get(column.getName()); List<byte[]> predicateValues; if (predicate.getType() == KuduPredicate.PredicateType.EQUALITY) { predicateValues = Collections.singletonList(predicate.getLower()); } else { predicateValues = Arrays.asList(predicate.getInListValues()); } // For each of the encoded string, replicate it by the number of values in // equality and in-list predicate. for (PartialRow row : rows) { for (byte[] predicateValue : predicateValues) { PartialRow newRow = new PartialRow(row); newRow.setRaw(idx, predicateValue); newRows.add(newRow); } } rows = newRows; } for (PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row, hashSchema); hashBuckets.set(hash); }
{code}
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
was:
[java] Improve space complexity about prune hash partitions for in-list predicate
Pruning hash partitions for in-list predicate at java-client, the logic
codes has a high space complexity, and it may cause java-client out
of memory.
This patch fixes the problem and provide a recursive algorithm, that
uses a method like 'deep first search' to pick all combinations and
try to release PartialRow objects ASAP.
> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
> Key: KUDU-3455
> URL: https://issues.apache.org/jira/browse/KUDU-3455
> Project: Kudu
> Issue Type: Task
> Reporter: shenxingwuying
> Assignee: shenxingwuying
> Priority: Major
>
> Improve space complexity about prune hash partitions for in-list predicate
> Pruning hash partitions for in-list predicate at java-client, the logic
> codes has a high space complexity, and it may cause java-client out
> of memory.
>
>
> {code:java}
> // java
> List<PartialRow> rows = Arrays.asList(schema.newPartialRow()); for (int idx : columnIdxs) { List<PartialRow> newRows = new ArrayList<>(); ColumnSchema column = schema.getColumnByIndex(idx); KuduPredicate predicate = predicates.get(column.getName()); List<byte[]> predicateValues; if (predicate.getType() == KuduPredicate.PredicateType.EQUALITY) { predicateValues = Collections.singletonList(predicate.getLower()); } else { predicateValues = Arrays.asList(predicate.getInListValues()); } // For each of the encoded string, replicate it by the number of values in // equality and in-list predicate. for (PartialRow row : rows) { for (byte[] predicateValue : predicateValues) { PartialRow newRow = new PartialRow(row); newRow.setRaw(idx, predicateValue); newRows.add(newRow); } } rows = newRows; } for (PartialRow row : rows) { int hash = KeyEncoder.getHashBucket(row, hashSchema); hashBuckets.set(hash); }
> {code}
>
>
>
>
>
>
> This patch fixes the problem and provide a recursive algorithm, that
> uses a method like 'deep first search' to pick all combinations and
> try to release PartialRow objects ASAP.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)