You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Sasikumar Natarajan <sa...@gmail.com> on 2016/10/03 10:47:57 UTC

Re: Phoenix ResultSet.next() takes a long time for first row

Thanks Ankit for the response.

We have tried adding salt buckets and we didn't get much improvement. So we
are trying to pre-split the regions. If we are pre-splitting on col1 and we
have 1,00,000 values of it in hand. I have the questions below,
1) How do we make a create table script with 1,00,000 splits. Is there an
easy way to do this?
2) What if we have to add more splits later based on the new additions for
col1?

Thanks,
Sasikumar Natarajan.

On Wed, Sep 28, 2016 at 3:51 PM, Ankit Singhal <an...@gmail.com>
wrote:

> Sorry Sasi, missed your last mails.
>
> It seems that you have one region in a table or the query touching one
> region because of monotonically increasing key['MK00100','YOU',4]  .
> Varying performance is because you may have filter which are aggressive and
> skipping lots of rows in between (*0*  (7965 ms), *2041* (7155 ms),
> *4126 *(1630 ms)) and that's why server is taking time.
>
> can you try after doing salting on the table.
> https://phoenix.apache.org/salted.html
>
>
>
>
> On Wed, Sep 28, 2016 at 10:47 AM, Sasikumar Natarajan <sa...@gmail.com>
> wrote:
>
>> Any one has suggestions for the performance issue discussed in this
>> thread?. Your suggestions would help me resolve this issue.
>>
>> Infrastructure details:
>>
>> Azure HDInsight HBase
>>
>> Type Node Size        Cores       Nodes
>> Head D3 V2 8 2
>> Region D3 V2 16 4
>> ZooKeeper D3 V2 12 3
>> Thanks,
>> Sasikumar Natarajan.
>>
>>
>> On Fri, Sep 23, 2016 at 7:57 AM, Sasikumar Natarajan <sa...@gmail.com>
>> wrote:
>>
>>> Also its not only the first time it takes time when we call
>>> ResultSet.next().
>>>
>>> When we iterate over ResultSet, it takes a long time initially and then
>>> iterates faster. Again after few iterations, it takes sometime and this
>>> goes on.
>>>
>>>
>>>
>>> Sample observation:
>>>
>>>
>>>
>>> Total Rows available on ResultSet : 5130
>>>
>>> Statement.executeQuery() has taken : 702 ms
>>>
>>> ResultSet Indices at which long time has been taken : *0*  (7965 ms),
>>> *2041* (7155 ms), *4126 *(1630 ms)
>>>
>>> On Fri, Sep 23, 2016 at 7:52 AM, Sasikumar Natarajan <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi Ankit,
>>>>            Where does the server processing happens, on the HBase
>>>> cluster or the server where Phoenix core runs.
>>>>
>>>> PFB the details you have asked for,
>>>>
>>>> Query:
>>>>
>>>> SELECT col1, col2, col5, col7, col11, col12 FROM SPL_FINAL where
>>>> col1='MK00100' and col2='YOU' and col3=4 and col5 in (?,?,?,?,?) and ((col7
>>>> between to_date('2016-08-01 00:00:00.000') and to_date('2016-08-05
>>>> 23:59:59.000')) or (col8 between to_date('2016-08-01 00:00:00.000') and
>>>> to_date('2016-08-05 23:59:59.000')))
>>>>
>>>>
>>>> Explain plan:
>>>>
>>>> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER SPL_FINAL
>>>> ['MK00100','YOU',4]
>>>>     SERVER FILTER BY (COL5 IN ('100','101','105','234','653') AND
>>>> ((COL7 >= TIMESTAMP '2016-08-01 00:00:00.000' AND COL7 <= TIMESTAMP
>>>> '2016-08-05 23:59:59.000') OR (COL8 >= TIMESTAMP '2016-08-01 00:00:00.000'
>>>> AND COL8 <= TIMESTAMP '2016-08-05 23:59:59.000')))
>>>> DDL:
>>>>
>>>> CREATE TABLE IF NOT EXISTS SPL_FINAL
>>>> (col1 VARCHAR NOT NULL,
>>>> col2 VARCHAR NOT NULL,
>>>> col3 INTEGER NOT NULL,
>>>> col4 INTEGER NOT NULL,
>>>> col5 VARCHAR NOT NULL,
>>>> col6 VARCHAR NOT NULL,
>>>> col7 TIMESTAMP NOT NULL,
>>>> col8 TIMESTAMP NOT NULL,
>>>> ext.col9 VARCHAR,
>>>> ext.col10 VARCHAR,
>>>> pri.col11 VARCHAR[], //this column contains 3600 items in every row
>>>> pri.col12 VARCHAR
>>>> ext.col13 BOOLEAN
>>>> CONSTRAINT SPL_FINAL_PK PRIMARY KEY (col1, col2, col3, col4, col5,
>>>> col6, col7, col8)) COMPRESSION='SNAPPY';
>>>>
>>>> Thanks,
>>>> Sasikumar Natarajan.
>>>>
>>>> On Thu, Sep 22, 2016 at 12:36 PM, Ankit Singhal <
>>>> ankitsinghal59@gmail.com> wrote:
>>>>
>>>>> Share some more details about the query, DDL and explain plan. In
>>>>> Phoenix, there are cases where we do some server processing at the time
>>>>> when rs.next() is called first time but subsequent next() should be faster.
>>>>>
>>>>> On Thu, Sep 22, 2016 at 9:52 AM, Sasikumar Natarajan <
>>>>> sasincj@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>     I'm using Apache Phoenix core 4.4.0-HBase-1.1 library to query
>>>>>> the data available on Phoenix server.
>>>>>>
>>>>>> preparedStatement.executeQuery()  seems to be taking less time. But
>>>>>> to enter into *while (rs.next()) {} *takes a long time. I would like
>>>>>> to know what is causing the delay to make the ResultSet ready. Please share
>>>>>> your thoughts on this.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Sasikumar Natarajan
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Sasikumar Natarajan
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Sasikumar Natarajan
>>>
>>
>>
>>
>> --
>> Regards,
>> Sasikumar Natarajan
>>
>
>


-- 
Regards,
Sasikumar Natarajan