You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by David Wang <da...@gmail.com> on 2014/03/01 05:56:36 UTC

Cannot run query containing inner join on Phoenix 3.0.0 when data size increased from 10 MB to 1 GB

Hi,


I successfully ran a query containing an inner join in Phoenix 3.0 on a 10
MB data set.

But when I increased the data size from 10 MB to 1 GB, and try to run the
same query, I get an error.  I summarized each step of what I did and the
error I encountered.  I would appreciate any help or advice.


1).  Below is the original TPC-H Query 5 before I translated it to
phoenix-style:

select
   n_name,
   sum(l_extendedprice * (1 - l_discount)) as revenue
from
   customer,
   orders,
   lineitem,
   supplier,
   nation,
   region
where
   c_custkey = o_custkey
   and l_orderkey = o_orderkey
   and l_suppkey = s_suppkey
   and c_nationkey = s_nationkey
   and s_nationkey = n_nationkey
   and n_regionkey = r_regionkey
   and r_name = '[REGION]'
   and o_orderdate >= date '[DATE]'
   and o_orderdate < date '[DATE]' + interval '1' year
group by
   n_name
order by
   revenue desc;


2). The sizes of each table in my query are as follows:

lineitem - 725 MB
orders - 164 MB
customer - 24 MB
supplier - 1.4 MB
nation - 2.2 KB
region - 400 B
The heap size of my region servers is 4 GB.

3). I modified this statement to following according to Maryann's
suggestion (which was to place the largest table first):

select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue
from lineitem inner join orders on l_orderkey = o_orderkey
                   inner join supplier on l_suppkey = s_suppkey
                   inner join customer on c_nationkey = s_nationkey and
c_custkey = o_custkey
                   inner join nation on s_nationkey = n_nationkey
                   inner join region on n_regionkey = r_regionkey
where r_name = 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate <
'1994-01-01'
group by n_name order by revenue desc

4).When I execute at very first time I get the following error:

java.lang.RuntimeException:
com.salesforce.phoenix.exception.PhoenixIOException:
com.salesforce.phoenix.exception.PhoenixIOException: Failed after
attempts=14, exceptions:
Mon Feb 24 19:36:50 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:51 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:52 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:54 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:36:56 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:00 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:04 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:12 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:37:28 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId:  @2[]
Mon Feb 24 19:38:00 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:39:05 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:40:09 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:41:13 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0
Mon Feb 24 19:42:18 EST 2014,
org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
java.io.IOException: java.io.IOException: Could not find hash cache for
joinId: i? 0

        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
        at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)
        at sqlline.SqlLine.print(SqlLine.java:1735)
        at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)
        at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
        at sqlline.SqlLine.dispatch(SqlLine.java:821)
        at sqlline.SqlLine.begin(SqlLine.java:699)
        at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
        at sqlline.SqlLine.main(SqlLine.java:424)

5).I re-execute at 2nd time I got the result that is correct with the
solution.

My cluster settting is one master and three slaves. Each machine has
8-cores and 8-GB RAM. A total of 1 GB data was distributed in three slaves
and running in three machines (monitoring by top command on each machine).

Thank you so much,

David

Re: Cannot run query containing inner join on Phoenix 3.0.0 when data size increased from 10 MB to 1 GB

Posted by Maryann Xue <ma...@gmail.com>.
Hi David,

What do you mean by "first time"? Could you please share your server logs?

And also could you please try with the latest master branch, and check if
there are any warnings in you CLIENT logs?


Thanks,
Maryann



On Sat, Mar 1, 2014 at 12:56 PM, David Wang <da...@gmail.com> wrote:

> Hi,
>
>
> I successfully ran a query containing an inner join in Phoenix 3.0 on a 10
> MB data set.
>
> But when I increased the data size from 10 MB to 1 GB, and try to run the
> same query, I get an error.  I summarized each step of what I did and the
> error I encountered.  I would appreciate any help or advice.
>
>
> 1).  Below is the original TPC-H Query 5 before I translated it to
> phoenix-style:
>
> select
>    n_name,
>    sum(l_extendedprice * (1 - l_discount)) as revenue
> from
>    customer,
>    orders,
>    lineitem,
>    supplier,
>    nation,
>    region
> where
>    c_custkey = o_custkey
>    and l_orderkey = o_orderkey
>    and l_suppkey = s_suppkey
>    and c_nationkey = s_nationkey
>    and s_nationkey = n_nationkey
>    and n_regionkey = r_regionkey
>    and r_name = '[REGION]'
>    and o_orderdate >= date '[DATE]'
>    and o_orderdate < date '[DATE]' + interval '1' year
> group by
>    n_name
> order by
>    revenue desc;
>
>
> 2). The sizes of each table in my query are as follows:
>
> lineitem - 725 MB
> orders - 164 MB
> customer - 24 MB
> supplier - 1.4 MB
> nation - 2.2 KB
> region - 400 B
> The heap size of my region servers is 4 GB.
>
> 3). I modified this statement to following according to Maryann's
> suggestion (which was to place the largest table first):
>
> select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue
> from lineitem inner join orders on l_orderkey = o_orderkey
>                    inner join supplier on l_suppkey = s_suppkey
>                    inner join customer on c_nationkey = s_nationkey and
> c_custkey = o_custkey
>                    inner join nation on s_nationkey = n_nationkey
>                    inner join region on n_regionkey = r_regionkey
> where r_name = 'AMERICA' and o_orderdate >= '1993-01-01' and o_orderdate <
> '1994-01-01'
> group by n_name order by revenue desc
>
> 4).When I execute at very first time I get the following error:
>
> java.lang.RuntimeException:
> com.salesforce.phoenix.exception.PhoenixIOException:
> com.salesforce.phoenix.exception.PhoenixIOException: Failed after
> attempts=14, exceptions:
> Mon Feb 24 19:36:50 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:36:51 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:36:52 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:36:54 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:36:56 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:37:00 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:37:04 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:37:12 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:37:28 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId:  @2[]
> Mon Feb 24 19:38:00 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId: i? 0
> Mon Feb 24 19:39:05 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId: i? 0
> Mon Feb 24 19:40:09 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId: i? 0
> Mon Feb 24 19:41:13 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId: i? 0
> Mon Feb 24 19:42:18 EST 2014,
> org.apache.hadoop.hbase.client.ScannerCallable@53f5fcb6,
> java.io.IOException: java.io.IOException: Could not find hash cache for
> joinId: i? 0
>
>         at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
>         at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)
>         at sqlline.SqlLine.print(SqlLine.java:1735)
>         at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)
>         at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
>         at sqlline.SqlLine.dispatch(SqlLine.java:821)
>         at sqlline.SqlLine.begin(SqlLine.java:699)
>         at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
>         at sqlline.SqlLine.main(SqlLine.java:424)
>
> 5).I re-execute at 2nd time I got the result that is correct with the
> solution.
>
> My cluster settting is one master and three slaves. Each machine has
> 8-cores and 8-GB RAM. A total of 1 GB data was distributed in three slaves
> and running in three machines (monitoring by top command on each machine).
>
> Thank you so much,
>
> David
>



-- 
Thanks,
Maryann