You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Loknath Priyatham Teja Singamsetty (JIRA)" <ji...@apache.org> on 2017/06/02 08:48:04 UTC
[jira] [Commented] (PHOENIX-3773) Implement FIRST_VALUES aggregate function

    [ https://issues.apache.org/jira/browse/PHOENIX-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034333#comment-16034333 ] 

Loknath Priyatham Teja Singamsetty  commented on PHOENIX-3773:
--------------------------------------------------------------

[~jamestaylor]  

bq. Then, once you have all the values, combine them together using the PArrayDataType.appendItemToArray() method

I tried this last week as well but for some reason the output is not as expected. Yesterday made changes to use PArrayDataType.appendItemToArray(), upon debugging found two things:

a) For fixed length data types, the appendItemToArray is actually prepending the arrayBytes reversing the array construction. For the time being used prependItemToArray() method instead which fixed this issue. The following lines of code in appendItemToArray seems to be the reason behind this which copies the new bytes to front of array and older bytes to the end.

{quote}
            newArray = new byte[length + elementLength];

            System.arraycopy(arrayBytes, offset, newArray, 0, length); 
            System.arraycopy(elementBytes, elementOffset, newArray, length, elementLength);
{quote}
b) For variable length data types, the Array construction results in ArrayIndexOutOfBoundsException. Here is the stack trace

java.lang.ArrayIndexOutOfBoundsException: 32767
	at org.apache.phoenix.schema.types.PArrayDataType.prependItemToArray(PArrayDataType.java:545)
	at org.apache.phoenix.expression.aggregator.FirstLastValueBaseClientAggregator.evaluate(FirstLastValueBaseClientAggregator.java:117)
	at org.apache.phoenix.schema.KeyValueSchema.toBytes(KeyValueSchema.java:112)
	at org.apache.phoenix.schema.KeyValueSchema.toBytes(KeyValueSchema.java:93)
	at org.apache.phoenix.expression.aggregator.Aggregators.toBytes(Aggregators.java:112)
	at org.apache.phoenix.iterate.BaseGroupedAggregatingResultIterator.next(BaseGroupedAggregatingResultIterator.java:82)
	at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:778)
	at org.apache.phoenix.end2end.FirstValuesFunctionIT.varcharDatatypeSimpleTest(FirstValuesFunctionIT.java:100)

I'm debugging this further. 
 

bq.Probably a good idea to have a test that asks for the top 3 values when there are only 2 values to make sure that case works too (if you don't have that already).

Test case is included already.



> Implement FIRST_VALUES aggregate function
> -----------------------------------------
>
>                 Key: PHOENIX-3773
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3773
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: Loknath Priyatham Teja Singamsetty 
>              Labels: SFDC
>             Fix For: 4.11.0
>
>         Attachments: PHOENIX-3773_4.x-HBase-0.98.patch, PHOENIX-3773_master.patch, PHOENIX-3773.patch, PHOENIX-3773.v2.patch, PHOENIX-3773.v3.patch
>
>
> Similar to FIRST_VALUE, but would allow the user to specify how many values to keep. This could use a MinMaxPriorityQueue under the covers and be much more efficient than using multiple NTH_VALUE calls to do the same like this:
> {code}
> SELECT entity_id,
>        NTH_VALUE(user_id,1) WITHIN GROUP (ORDER BY last_read_date DESC) as nth1_user_id,
>        NTH_VALUE(user_id,2) WITHIN GROUP (ORDER BY last_read_date DESC) as nth2_user_id,
>        NTH_VALUE(user_id,3) WITHIN GROUP (ORDER BY last_read_date DESC) as nth3_user_id,
>        count(*)
> FROM  MY_TABLE 
> WHERE tenant_id='00Dx0000000XXXX'
> AND entity_id in ('0D5x000000ABCD','0D5x000000ABCE')
> GROUP BY entity_id;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)