You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@phoenix.apache.org by "Chen Feng (Jira)" <ji...@apache.org> on 2020/07/03 07:29:00 UTC

[jira] [Comment Edited] (PHOENIX-5987) PointLookup may cost too much memory

    [ https://issues.apache.org/jira/browse/PHOENIX-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150789#comment-17150789 ] 

Chen Feng edited comment on PHOENIX-5987 at 7/3/20, 7:28 AM:
-------------------------------------------------------------

A simple hot fix is avoid using point look up. We added the following code in org.apache.phoenix.compile.ScanRanges#isPointLookup().

 
{code:java}
isPointLookup() {
  ... 
  return true;
}

// change to follows

isPointLookup() {
  ... 

  // new added start
  long numOfScans = 1; for (List<KeyRange> orRange: ranges) { numOfScans *= orRange.size(); }
  if (numOfScans > THRESHOLD) {
    return true;
  }
  // new added end

  return true;
}
{code}
We meet the following questions:

1. isPointLookup is an static method. How to set THRESHOLD in configuration, is there any example?

2. Should we use some fine-grained optimizations to use as much small scans as possible? E.g. k1 in (1,2) and k2 in (10,20,30) and k3 in (100, 200, 300), we turn to six scans of Scan1=[(1,10,100), (1,10,300)], Scan2=[(1,20,100), (1,20,300)], ..., Scan6=[(2,30,100), (2,30,300)].


was (Author: fengchen8086):
A simple hot fix is avoid using point look up. We added the following code in org.apache.phoenix.compile.ScanRanges#isPointLookup().

 

 
{code:java}
isPointLookup() {
  ... 
  return true;
}

// change to follows

isPointLookup() {
  ... 

  // new added start
  long numOfScans = 1; for (List<KeyRange> orRange: ranges) { numOfScans *= orRange.size(); }
  if (numOfScans > THRESHOLD) {
    return true;
  }
  // new added end

  return true;
}
{code}
 

We meet the following questions:

1. isPointLookup is an static method. How to set THRESHOLD in configuration, is there any example?

2. Should we use some fine-grained optimizations to use as much small scans as possible? E.g. k1 in (1,2) and k2 in (10,20,30) and k3 in (100, 200, 300), we turn to six scans of Scan1=[(1,10,100), (1,10,300)], Scan2=[(1,20,100), (1,20,300)], ..., Scan6=[(2,30,100), (2,30,300)].

 

 

> PointLookup may cost too much memory
> ------------------------------------
>
>                 Key: PHOENIX-5987
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5987
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Chen Feng
>            Priority: Major
>
> When all rowkeys are covered in where conditions, Phoenix use point look up to switch "a huge range scan" to "multi single key scans".
> However, the number single key scans are too huge, it quick exhausts the memory.
> We meet such condition in hour product environment as follow:
> We have a table with five primary keys like k1, k2, ..., k5, all key types are UNSIGNED_LONG.
> The query sql like " ... where k1 = 1 and k2 = 1 and k3 in (1,2,3,...,l) and k4 in (1,2,3,...,m) and k5 in (1,2,3,...,n)"
> We have l=600, m=800 and n=1000, so the possible number of look up scans is 1*1*600*800*1000=480,000,000.
> Each scan rowkey costs 5*8=40 bytes. Therefore the total memory cost is 480,000,000 * 50bytes = 25GB.
> 25GB exceeds the JMX configuration and causes OOM exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)