You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "shenxingwuying (Jira)" <ji...@apache.org> on 2023/03/06 09:23:00 UTC

[jira] [Updated] (KUDU-3455) Improve space complexity about prune hash partitions for in-list predicate

     [ https://issues.apache.org/jira/browse/KUDU-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

shenxingwuying updated KUDU-3455:
---------------------------------
    Description: 
Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.

 

 
{code:java}
// java
    List<PartialRow> rows = Arrays.asList(schema.newPartialRow());    for (int idx : columnIdxs) {      List<PartialRow> newRows = new ArrayList<>();      ColumnSchema column = schema.getColumnByIndex(idx);      KuduPredicate predicate = predicates.get(column.getName());      List<byte[]> predicateValues;      if (predicate.getType() == KuduPredicate.PredicateType.EQUALITY) {        predicateValues = Collections.singletonList(predicate.getLower());      } else {        predicateValues = Arrays.asList(predicate.getInListValues());      }      // For each of the encoded string, replicate it by the number of values in      // equality and in-list predicate.      for (PartialRow row : rows) {        for (byte[] predicateValue : predicateValues) {          PartialRow newRow = new PartialRow(row);          newRow.setRaw(idx, predicateValue);          newRows.add(newRow);        }      }      rows = newRows;    }    for (PartialRow row : rows) {      int hash = KeyEncoder.getHashBucket(row, hashSchema);      hashBuckets.set(hash);    }

{code}
 

 

 

 

 

 

    This patch fixes the problem and provide a recursive algorithm, that
    uses a method like 'deep first search' to pick all combinations and
    try to release PartialRow objects ASAP.

  was:
    [java] Improve space complexity about prune hash partitions for in-list predicate

    Pruning hash partitions for in-list predicate at java-client, the logic
    codes has a high space complexity, and it may cause java-client out
    of memory.

    This patch fixes the problem and provide a recursive algorithm, that
    uses a method like 'deep first search' to pick all combinations and
    try to release PartialRow objects ASAP.


> Improve space complexity about prune hash partitions for in-list predicate
> --------------------------------------------------------------------------
>
>                 Key: KUDU-3455
>                 URL: https://issues.apache.org/jira/browse/KUDU-3455
>             Project: Kudu
>          Issue Type: Task
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>
> Improve space complexity about prune hash partitions for in-list predicate
>     Pruning hash partitions for in-list predicate at java-client, the logic
>     codes has a high space complexity, and it may cause java-client out
>     of memory.
>  
>  
> {code:java}
> // java
>     List<PartialRow> rows = Arrays.asList(schema.newPartialRow());    for (int idx : columnIdxs) {      List<PartialRow> newRows = new ArrayList<>();      ColumnSchema column = schema.getColumnByIndex(idx);      KuduPredicate predicate = predicates.get(column.getName());      List<byte[]> predicateValues;      if (predicate.getType() == KuduPredicate.PredicateType.EQUALITY) {        predicateValues = Collections.singletonList(predicate.getLower());      } else {        predicateValues = Arrays.asList(predicate.getInListValues());      }      // For each of the encoded string, replicate it by the number of values in      // equality and in-list predicate.      for (PartialRow row : rows) {        for (byte[] predicateValue : predicateValues) {          PartialRow newRow = new PartialRow(row);          newRow.setRaw(idx, predicateValue);          newRows.add(newRow);        }      }      rows = newRows;    }    for (PartialRow row : rows) {      int hash = KeyEncoder.getHashBucket(row, hashSchema);      hashBuckets.set(hash);    }
> {code}
>  
>  
>  
>  
>  
>  
>     This patch fixes the problem and provide a recursive algorithm, that
>     uses a method like 'deep first search' to pick all combinations and
>     try to release PartialRow objects ASAP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)