You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Swapna Swapna <ta...@gmail.com> on 2016/05/03 20:14:43 UTC

Question on aggregate function

Hi,

I'm trying to implement aggregate function on multiple columns (as an
arguments) like:

myaggFunc(col1,col2)

And I would want to return the results by each column after applying
aggregate operation.

The output would be something like:

col1, count ( aggregate of all records for col1)
col2, count

Inorder to return the results in the above format, what is the return data
type (of the method) should I have to choose?

Thanks

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
That helps. Thank You James. I'm in the right direction not trying further
to achieve non-standard multi arg version of an
aggregate function :):)

And as a next step,  I will push my existing (single col) aggregate
function and will keep you posted.

Regards
Swapna

On Tue, May 10, 2016 at 10:57 AM, James Taylor <ja...@apache.org>
wrote:

> Yes, the way to optimize it is to not represent data in column qualifiers,
> but as the value of a column instead (perhaps in the primary key
> constraint) and to do the group by query I mentioned before.
>
> Otherwise, you can do separate aggregations as you've shown as it'd perform
> the same as trying to support a non standard multi arg version of an
> aggregate function.
>
> Thanks,
> James
>
> On Tuesday, May 10, 2016, Swapna Swapna <ta...@gmail.com> wrote:
>
> > Hi James,
> >
> > thanks for your response. In the below example, us & uk are column
> > qualifiers.
> >
> > * rowkey                c:us        c:uk*
> >  20161001             3             4
> >  20161002             1             2
> >
> >
> > This is how my query looks like:
> > select sum1(us) as US, sum1(uk) as UK from table;
> >
> > which returns the below output: (as expected)
> > *US  UK*
> > 4     6
> >
> > is there any better way to achieve/optimize this. This seems to be not an
> > ideal solution when we have large number of columns.
> >
> > Thanks
> > Swapna
> >
> >
> > On Tue, May 10, 2016 at 12:04 AM, James Taylor <jamestaylor@apache.org
> > <javascript:;>>
> > wrote:
> >
> > > We don't have aggregate functions with multiple arguments, so I can't
> > > provide any pointers. It's unclear what semantics you're trying to
> > achieve
> > > with the multiple arguments. Can you give a concrete example? Based on
> > your
> > > other example, you'd want to do a GROUP BY, like this:
> > >
> > > select sum(col) from table group by country;
> > >
> > > On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <talktoswapna@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Hi James/Team,
> > > >
> > > >
> > > > myaggFunc(col1,col2)
> > > >
> > > > I tried implementing this new aggregate function with multiple (2, to
> > > start
> > > > with)  columns , expressions as arguments.
> > > >
> > > > And its giving me this error:
> > > >
> > > > index (1) must be less than size (1)
> > > >
> > > >
> > > > My function definition:
> > > >
> > > > @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
> > > > MyAggParseNode.class, args = {
> > > >         @FunctionParseNode.Argument(),
> > > >         @FunctionParseNode.Argument()})
> > > >
> > > > is there any example that I can refer, which accepts multiple fields
> > > > as arguments to function.
> > > >
> > > > Any pointers would really help.
> > > >
> > > > Thank you,
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <
> talktoswapna@gmail.com
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Hi James,
> > > > >
> > > > > the new ones are in similar lines to existing aggregate functions:
> > > > >
> > > > > I misinterpreted this definition, thanks for clarifying :
> > > > > *A reference to a column is also an expression*
> > > > >
> > > > > Regards
> > > > > Swapna
> > > > >
> > > > > On Tue, May 3, 2016 at 11:39 PM, James Taylor <
> > jamestaylor@apache.org <javascript:;>>
> > > > > wrote:
> > > > >
> > > > >> Hi Swapna,
> > > > >> All our aggregate functions allow expressions as arguments and it
> > > > wouldn't
> > > > >> make sense to have these new ones be different. A reference to a
> > > column
> > > > is
> > > > >> also an expression. It doesn't change the HBase data model being
> > > sparse.
> > > > >>
> > > > >> I think the next step should be for you to submit a patch that the
> > > > >> community can take a look at, as it's too difficult to discuss
> this
> > > > >> without
> > > > >> that.
> > > > >>
> > > > >> Thanks,
> > > > >> James
> > > > >>
> > > > >> On Tuesday, May 3, 2016, Swapna Swapna <talktoswapna@gmail.com
> > <javascript:;>>
> > > wrote:
> > > > >>
> > > > >> > Hi James,
> > > > >> >
> > > > >> > Thanks for your swift response.
> > > > >> >
> > > > >> > I wouldn't be able to use the expression in the below query
> > rather I
> > > > >> would
> > > > >> > have to provide the columns (as arguments) which I'm interested
> in
> > > to
> > > > >> > perform the aggregation on respective provided columns.
> > > > >> >
> > > > >> > myaggFunc(col1,col2)
> > > > >> >
> > > > >> > the reason being, the hbase data is sparsed and I would not know
> > the
> > > > >> column
> > > > >> > values. Data fetch is based on a row key.
> > > > >> >
> > > > >> > expression example:
> > > > >> >
> > > > >> > ID=1 OR NAME='Hi'
> > > > >> >
> > > > >> > Regards
> > > > >> >
> > > > >> > Swapna
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <
> > > jamestaylor@apache.org <javascript:;>
> > > > >> > <javascript:;>> wrote:
> > > > >> >
> > > > >> > > Hi Swapna,
> > > > >> > > The return type is typically derived from looking at the
> return
> > > > types
> > > > >> of
> > > > >> > > each of the input arguments and choosing what'll work without
> > > losing
> > > > >> > > precision. For example, take a look at this loop in
> > > > ExpressionCompiler
> > > > >> > that
> > > > >> > > determines this for expressions that are added together:
> > > > >> > >
> > > > >> > >         new ArithmeticExpressionFactory() {
> > > > >> > >             @Override
> > > > >> > >             public Expression create(ArithmeticParseNode node,
> > > > >> > > List<Expression> children) throws SQLException {
> > > > >> > >                 boolean foundDate = false;
> > > > >> > >                 Determinism determinism = Determinism.ALWAYS;
> > > > >> > >                 PDataType theType = null;
> > > > >> > >                 for(int i = 0; i < children.size(); i++) {
> > > > >> > >
> > > > >> > > Your probably already doing this, but make sure you don't
> assume
> > > the
> > > > >> > > arguments are column references, but allow them to be any
> > > > expression.
> > > > >> > >
> > > > >> > > Also, it'd be great to see what you've got so far without
> > handling
> > > > >> > multiple
> > > > >> > > arguments to your function (in the form of a pull request) so
> > > folks
> > > > >> can
> > > > >> > get
> > > > >> > > you feedback on your work so far.
> > > > >> > >
> > > > >> > > Thanks, and we appreciate the contributions!
> > > > >> > >
> > > > >> > > James
> > > > >> > >
> > > > >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
> > > > >> talktoswapna@gmail.com <javascript:;>
> > > > >> > <javascript:;>>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Sure,
> > > > >> > > >
> > > > >> > > > Hbase data that I have is:
> > > > >> > > >
> > > > >> > > > rowkey                us         uk
> > > > >> > > > 20161001           3            4
> > > > >> > > > 20161002           1            2
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > select myaggFunc(us) from table :    // this is returning
> > output
> > > > as
> > > > >> :
> > > > >> > > > 4
> > > > >> > > > select myaggFunc(uk) from table :    // this is returning
> > output
> > > > as
> > > > >> :
> > > > >> > > > 6
> > > > >> > > >
> > > > >> > > > In similar to that, i'm visualizing the query like: select
> > > > >> > > > myaggFunc1(us,uk)
> > > > >> > > > from table;  //with multiple columns
> > > > >> > > >
> > > > >> > > > to return output:   (based on the aggregation logic, below
> > > results
> > > > >> are
> > > > >> > > for
> > > > >> > > > sum aggregation)
> > > > >> > > > us   4
> > > > >> > > > uk   6
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> > > > >> jamestaylor@apache.org <javascript:;>
> > > > >> > <javascript:;>>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Removing user list (please don't cross post)
> > > > >> > > > >
> > > > >> > > > > Can you give us a full example of the query you have in
> > mind?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > James
> > > > >> > > > >
> > > > >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > > > >> > talktoswapna@gmail.com <javascript:;> <javascript:;>
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi,
> > > > >> > > > > >
> > > > >> > > > > > I'm trying to implement aggregate function on multiple
> > > columns
> > > > >> (as
> > > > >> > an
> > > > >> > > > > > arguments) like:
> > > > >> > > > > >
> > > > >> > > > > > myaggFunc(col1,col2)
> > > > >> > > > > >
> > > > >> > > > > > And I would want to return the results by each column
> > after
> > > > >> > applying
> > > > >> > > > > > aggregate operation.
> > > > >> > > > > >
> > > > >> > > > > > The output would be something like:
> > > > >> > > > > >
> > > > >> > > > > > col1, count ( aggregate of all records for col1)
> > > > >> > > > > > col2, count
> > > > >> > > > > >
> > > > >> > > > > > Inorder to return the results in the above format, what
> is
> > > the
> > > > >> > return
> > > > >> > > > > data
> > > > >> > > > > > type (of the method) should I have to choose?
> > > > >> > > > > >
> > > > >> > > > > > Thanks
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by James Taylor <ja...@apache.org>.
Yes, the way to optimize it is to not represent data in column qualifiers,
but as the value of a column instead (perhaps in the primary key
constraint) and to do the group by query I mentioned before.

Otherwise, you can do separate aggregations as you've shown as it'd perform
the same as trying to support a non standard multi arg version of an
aggregate function.

Thanks,
James

On Tuesday, May 10, 2016, Swapna Swapna <ta...@gmail.com> wrote:

> Hi James,
>
> thanks for your response. In the below example, us & uk are column
> qualifiers.
>
> * rowkey                c:us        c:uk*
>  20161001             3             4
>  20161002             1             2
>
>
> This is how my query looks like:
> select sum1(us) as US, sum1(uk) as UK from table;
>
> which returns the below output: (as expected)
> *US  UK*
> 4     6
>
> is there any better way to achieve/optimize this. This seems to be not an
> ideal solution when we have large number of columns.
>
> Thanks
> Swapna
>
>
> On Tue, May 10, 2016 at 12:04 AM, James Taylor <jamestaylor@apache.org
> <javascript:;>>
> wrote:
>
> > We don't have aggregate functions with multiple arguments, so I can't
> > provide any pointers. It's unclear what semantics you're trying to
> achieve
> > with the multiple arguments. Can you give a concrete example? Based on
> your
> > other example, you'd want to do a GROUP BY, like this:
> >
> > select sum(col) from table group by country;
> >
> > On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <talktoswapna@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Hi James/Team,
> > >
> > >
> > > myaggFunc(col1,col2)
> > >
> > > I tried implementing this new aggregate function with multiple (2, to
> > start
> > > with)  columns , expressions as arguments.
> > >
> > > And its giving me this error:
> > >
> > > index (1) must be less than size (1)
> > >
> > >
> > > My function definition:
> > >
> > > @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
> > > MyAggParseNode.class, args = {
> > >         @FunctionParseNode.Argument(),
> > >         @FunctionParseNode.Argument()})
> > >
> > > is there any example that I can refer, which accepts multiple fields
> > > as arguments to function.
> > >
> > > Any pointers would really help.
> > >
> > > Thank you,
> > >
> > >
> > >
> > >
> > >
> > > On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <talktoswapna@gmail.com
> <javascript:;>>
> > > wrote:
> > >
> > > > Hi James,
> > > >
> > > > the new ones are in similar lines to existing aggregate functions:
> > > >
> > > > I misinterpreted this definition, thanks for clarifying :
> > > > *A reference to a column is also an expression*
> > > >
> > > > Regards
> > > > Swapna
> > > >
> > > > On Tue, May 3, 2016 at 11:39 PM, James Taylor <
> jamestaylor@apache.org <javascript:;>>
> > > > wrote:
> > > >
> > > >> Hi Swapna,
> > > >> All our aggregate functions allow expressions as arguments and it
> > > wouldn't
> > > >> make sense to have these new ones be different. A reference to a
> > column
> > > is
> > > >> also an expression. It doesn't change the HBase data model being
> > sparse.
> > > >>
> > > >> I think the next step should be for you to submit a patch that the
> > > >> community can take a look at, as it's too difficult to discuss this
> > > >> without
> > > >> that.
> > > >>
> > > >> Thanks,
> > > >> James
> > > >>
> > > >> On Tuesday, May 3, 2016, Swapna Swapna <talktoswapna@gmail.com
> <javascript:;>>
> > wrote:
> > > >>
> > > >> > Hi James,
> > > >> >
> > > >> > Thanks for your swift response.
> > > >> >
> > > >> > I wouldn't be able to use the expression in the below query
> rather I
> > > >> would
> > > >> > have to provide the columns (as arguments) which I'm interested in
> > to
> > > >> > perform the aggregation on respective provided columns.
> > > >> >
> > > >> > myaggFunc(col1,col2)
> > > >> >
> > > >> > the reason being, the hbase data is sparsed and I would not know
> the
> > > >> column
> > > >> > values. Data fetch is based on a row key.
> > > >> >
> > > >> > expression example:
> > > >> >
> > > >> > ID=1 OR NAME='Hi'
> > > >> >
> > > >> > Regards
> > > >> >
> > > >> > Swapna
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <
> > jamestaylor@apache.org <javascript:;>
> > > >> > <javascript:;>> wrote:
> > > >> >
> > > >> > > Hi Swapna,
> > > >> > > The return type is typically derived from looking at the return
> > > types
> > > >> of
> > > >> > > each of the input arguments and choosing what'll work without
> > losing
> > > >> > > precision. For example, take a look at this loop in
> > > ExpressionCompiler
> > > >> > that
> > > >> > > determines this for expressions that are added together:
> > > >> > >
> > > >> > >         new ArithmeticExpressionFactory() {
> > > >> > >             @Override
> > > >> > >             public Expression create(ArithmeticParseNode node,
> > > >> > > List<Expression> children) throws SQLException {
> > > >> > >                 boolean foundDate = false;
> > > >> > >                 Determinism determinism = Determinism.ALWAYS;
> > > >> > >                 PDataType theType = null;
> > > >> > >                 for(int i = 0; i < children.size(); i++) {
> > > >> > >
> > > >> > > Your probably already doing this, but make sure you don't assume
> > the
> > > >> > > arguments are column references, but allow them to be any
> > > expression.
> > > >> > >
> > > >> > > Also, it'd be great to see what you've got so far without
> handling
> > > >> > multiple
> > > >> > > arguments to your function (in the form of a pull request) so
> > folks
> > > >> can
> > > >> > get
> > > >> > > you feedback on your work so far.
> > > >> > >
> > > >> > > Thanks, and we appreciate the contributions!
> > > >> > >
> > > >> > > James
> > > >> > >
> > > >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
> > > >> talktoswapna@gmail.com <javascript:;>
> > > >> > <javascript:;>>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Sure,
> > > >> > > >
> > > >> > > > Hbase data that I have is:
> > > >> > > >
> > > >> > > > rowkey                us         uk
> > > >> > > > 20161001           3            4
> > > >> > > > 20161002           1            2
> > > >> > > >
> > > >> > > >
> > > >> > > > select myaggFunc(us) from table :    // this is returning
> output
> > > as
> > > >> :
> > > >> > > > 4
> > > >> > > > select myaggFunc(uk) from table :    // this is returning
> output
> > > as
> > > >> :
> > > >> > > > 6
> > > >> > > >
> > > >> > > > In similar to that, i'm visualizing the query like: select
> > > >> > > > myaggFunc1(us,uk)
> > > >> > > > from table;  //with multiple columns
> > > >> > > >
> > > >> > > > to return output:   (based on the aggregation logic, below
> > results
> > > >> are
> > > >> > > for
> > > >> > > > sum aggregation)
> > > >> > > > us   4
> > > >> > > > uk   6
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> > > >> jamestaylor@apache.org <javascript:;>
> > > >> > <javascript:;>>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Removing user list (please don't cross post)
> > > >> > > > >
> > > >> > > > > Can you give us a full example of the query you have in
> mind?
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > > James
> > > >> > > > >
> > > >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > > >> > talktoswapna@gmail.com <javascript:;> <javascript:;>
> > > >> > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi,
> > > >> > > > > >
> > > >> > > > > > I'm trying to implement aggregate function on multiple
> > columns
> > > >> (as
> > > >> > an
> > > >> > > > > > arguments) like:
> > > >> > > > > >
> > > >> > > > > > myaggFunc(col1,col2)
> > > >> > > > > >
> > > >> > > > > > And I would want to return the results by each column
> after
> > > >> > applying
> > > >> > > > > > aggregate operation.
> > > >> > > > > >
> > > >> > > > > > The output would be something like:
> > > >> > > > > >
> > > >> > > > > > col1, count ( aggregate of all records for col1)
> > > >> > > > > > col2, count
> > > >> > > > > >
> > > >> > > > > > Inorder to return the results in the above format, what is
> > the
> > > >> > return
> > > >> > > > > data
> > > >> > > > > > type (of the method) should I have to choose?
> > > >> > > > > >
> > > >> > > > > > Thanks
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Hi James,

thanks for your response. In the below example, us & uk are column
qualifiers.

* rowkey                c:us        c:uk*
 20161001             3             4
 20161002             1             2


This is how my query looks like:
select sum1(us) as US, sum1(uk) as UK from table;

which returns the below output: (as expected)
*US  UK*
4     6

is there any better way to achieve/optimize this. This seems to be not an
ideal solution when we have large number of columns.

Thanks
Swapna


On Tue, May 10, 2016 at 12:04 AM, James Taylor <ja...@apache.org>
wrote:

> We don't have aggregate functions with multiple arguments, so I can't
> provide any pointers. It's unclear what semantics you're trying to achieve
> with the multiple arguments. Can you give a concrete example? Based on your
> other example, you'd want to do a GROUP BY, like this:
>
> select sum(col) from table group by country;
>
> On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <ta...@gmail.com>
> wrote:
>
> > Hi James/Team,
> >
> >
> > myaggFunc(col1,col2)
> >
> > I tried implementing this new aggregate function with multiple (2, to
> start
> > with)  columns , expressions as arguments.
> >
> > And its giving me this error:
> >
> > index (1) must be less than size (1)
> >
> >
> > My function definition:
> >
> > @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
> > MyAggParseNode.class, args = {
> >         @FunctionParseNode.Argument(),
> >         @FunctionParseNode.Argument()})
> >
> > is there any example that I can refer, which accepts multiple fields
> > as arguments to function.
> >
> > Any pointers would really help.
> >
> > Thank you,
> >
> >
> >
> >
> >
> > On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <ta...@gmail.com>
> > wrote:
> >
> > > Hi James,
> > >
> > > the new ones are in similar lines to existing aggregate functions:
> > >
> > > I misinterpreted this definition, thanks for clarifying :
> > > *A reference to a column is also an expression*
> > >
> > > Regards
> > > Swapna
> > >
> > > On Tue, May 3, 2016 at 11:39 PM, James Taylor <ja...@apache.org>
> > > wrote:
> > >
> > >> Hi Swapna,
> > >> All our aggregate functions allow expressions as arguments and it
> > wouldn't
> > >> make sense to have these new ones be different. A reference to a
> column
> > is
> > >> also an expression. It doesn't change the HBase data model being
> sparse.
> > >>
> > >> I think the next step should be for you to submit a patch that the
> > >> community can take a look at, as it's too difficult to discuss this
> > >> without
> > >> that.
> > >>
> > >> Thanks,
> > >> James
> > >>
> > >> On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com>
> wrote:
> > >>
> > >> > Hi James,
> > >> >
> > >> > Thanks for your swift response.
> > >> >
> > >> > I wouldn't be able to use the expression in the below query rather I
> > >> would
> > >> > have to provide the columns (as arguments) which I'm interested in
> to
> > >> > perform the aggregation on respective provided columns.
> > >> >
> > >> > myaggFunc(col1,col2)
> > >> >
> > >> > the reason being, the hbase data is sparsed and I would not know the
> > >> column
> > >> > values. Data fetch is based on a row key.
> > >> >
> > >> > expression example:
> > >> >
> > >> > ID=1 OR NAME='Hi'
> > >> >
> > >> > Regards
> > >> >
> > >> > Swapna
> > >> >
> > >> >
> > >> >
> > >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <
> jamestaylor@apache.org
> > >> > <javascript:;>> wrote:
> > >> >
> > >> > > Hi Swapna,
> > >> > > The return type is typically derived from looking at the return
> > types
> > >> of
> > >> > > each of the input arguments and choosing what'll work without
> losing
> > >> > > precision. For example, take a look at this loop in
> > ExpressionCompiler
> > >> > that
> > >> > > determines this for expressions that are added together:
> > >> > >
> > >> > >         new ArithmeticExpressionFactory() {
> > >> > >             @Override
> > >> > >             public Expression create(ArithmeticParseNode node,
> > >> > > List<Expression> children) throws SQLException {
> > >> > >                 boolean foundDate = false;
> > >> > >                 Determinism determinism = Determinism.ALWAYS;
> > >> > >                 PDataType theType = null;
> > >> > >                 for(int i = 0; i < children.size(); i++) {
> > >> > >
> > >> > > Your probably already doing this, but make sure you don't assume
> the
> > >> > > arguments are column references, but allow them to be any
> > expression.
> > >> > >
> > >> > > Also, it'd be great to see what you've got so far without handling
> > >> > multiple
> > >> > > arguments to your function (in the form of a pull request) so
> folks
> > >> can
> > >> > get
> > >> > > you feedback on your work so far.
> > >> > >
> > >> > > Thanks, and we appreciate the contributions!
> > >> > >
> > >> > > James
> > >> > >
> > >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
> > >> talktoswapna@gmail.com
> > >> > <javascript:;>>
> > >> > > wrote:
> > >> > >
> > >> > > > Sure,
> > >> > > >
> > >> > > > Hbase data that I have is:
> > >> > > >
> > >> > > > rowkey                us         uk
> > >> > > > 20161001           3            4
> > >> > > > 20161002           1            2
> > >> > > >
> > >> > > >
> > >> > > > select myaggFunc(us) from table :    // this is returning output
> > as
> > >> :
> > >> > > > 4
> > >> > > > select myaggFunc(uk) from table :    // this is returning output
> > as
> > >> :
> > >> > > > 6
> > >> > > >
> > >> > > > In similar to that, i'm visualizing the query like: select
> > >> > > > myaggFunc1(us,uk)
> > >> > > > from table;  //with multiple columns
> > >> > > >
> > >> > > > to return output:   (based on the aggregation logic, below
> results
> > >> are
> > >> > > for
> > >> > > > sum aggregation)
> > >> > > > us   4
> > >> > > > uk   6
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> > >> jamestaylor@apache.org
> > >> > <javascript:;>>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Removing user list (please don't cross post)
> > >> > > > >
> > >> > > > > Can you give us a full example of the query you have in mind?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > James
> > >> > > > >
> > >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > >> > talktoswapna@gmail.com <javascript:;>
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi,
> > >> > > > > >
> > >> > > > > > I'm trying to implement aggregate function on multiple
> columns
> > >> (as
> > >> > an
> > >> > > > > > arguments) like:
> > >> > > > > >
> > >> > > > > > myaggFunc(col1,col2)
> > >> > > > > >
> > >> > > > > > And I would want to return the results by each column after
> > >> > applying
> > >> > > > > > aggregate operation.
> > >> > > > > >
> > >> > > > > > The output would be something like:
> > >> > > > > >
> > >> > > > > > col1, count ( aggregate of all records for col1)
> > >> > > > > > col2, count
> > >> > > > > >
> > >> > > > > > Inorder to return the results in the above format, what is
> the
> > >> > return
> > >> > > > > data
> > >> > > > > > type (of the method) should I have to choose?
> > >> > > > > >
> > >> > > > > > Thanks
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Question on aggregate function

Posted by James Taylor <ja...@apache.org>.
We don't have aggregate functions with multiple arguments, so I can't
provide any pointers. It's unclear what semantics you're trying to achieve
with the multiple arguments. Can you give a concrete example? Based on your
other example, you'd want to do a GROUP BY, like this:

select sum(col) from table group by country;

On Mon, May 9, 2016 at 4:57 PM, Swapna Swapna <ta...@gmail.com>
wrote:

> Hi James/Team,
>
>
> myaggFunc(col1,col2)
>
> I tried implementing this new aggregate function with multiple (2, to start
> with)  columns , expressions as arguments.
>
> And its giving me this error:
>
> index (1) must be less than size (1)
>
>
> My function definition:
>
> @FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
> MyAggParseNode.class, args = {
>         @FunctionParseNode.Argument(),
>         @FunctionParseNode.Argument()})
>
> is there any example that I can refer, which accepts multiple fields
> as arguments to function.
>
> Any pointers would really help.
>
> Thank you,
>
>
>
>
>
> On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <ta...@gmail.com>
> wrote:
>
> > Hi James,
> >
> > the new ones are in similar lines to existing aggregate functions:
> >
> > I misinterpreted this definition, thanks for clarifying :
> > *A reference to a column is also an expression*
> >
> > Regards
> > Swapna
> >
> > On Tue, May 3, 2016 at 11:39 PM, James Taylor <ja...@apache.org>
> > wrote:
> >
> >> Hi Swapna,
> >> All our aggregate functions allow expressions as arguments and it
> wouldn't
> >> make sense to have these new ones be different. A reference to a column
> is
> >> also an expression. It doesn't change the HBase data model being sparse.
> >>
> >> I think the next step should be for you to submit a patch that the
> >> community can take a look at, as it's too difficult to discuss this
> >> without
> >> that.
> >>
> >> Thanks,
> >> James
> >>
> >> On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com> wrote:
> >>
> >> > Hi James,
> >> >
> >> > Thanks for your swift response.
> >> >
> >> > I wouldn't be able to use the expression in the below query rather I
> >> would
> >> > have to provide the columns (as arguments) which I'm interested in to
> >> > perform the aggregation on respective provided columns.
> >> >
> >> > myaggFunc(col1,col2)
> >> >
> >> > the reason being, the hbase data is sparsed and I would not know the
> >> column
> >> > values. Data fetch is based on a row key.
> >> >
> >> > expression example:
> >> >
> >> > ID=1 OR NAME='Hi'
> >> >
> >> > Regards
> >> >
> >> > Swapna
> >> >
> >> >
> >> >
> >> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <jamestaylor@apache.org
> >> > <javascript:;>> wrote:
> >> >
> >> > > Hi Swapna,
> >> > > The return type is typically derived from looking at the return
> types
> >> of
> >> > > each of the input arguments and choosing what'll work without losing
> >> > > precision. For example, take a look at this loop in
> ExpressionCompiler
> >> > that
> >> > > determines this for expressions that are added together:
> >> > >
> >> > >         new ArithmeticExpressionFactory() {
> >> > >             @Override
> >> > >             public Expression create(ArithmeticParseNode node,
> >> > > List<Expression> children) throws SQLException {
> >> > >                 boolean foundDate = false;
> >> > >                 Determinism determinism = Determinism.ALWAYS;
> >> > >                 PDataType theType = null;
> >> > >                 for(int i = 0; i < children.size(); i++) {
> >> > >
> >> > > Your probably already doing this, but make sure you don't assume the
> >> > > arguments are column references, but allow them to be any
> expression.
> >> > >
> >> > > Also, it'd be great to see what you've got so far without handling
> >> > multiple
> >> > > arguments to your function (in the form of a pull request) so folks
> >> can
> >> > get
> >> > > you feedback on your work so far.
> >> > >
> >> > > Thanks, and we appreciate the contributions!
> >> > >
> >> > > James
> >> > >
> >> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
> >> talktoswapna@gmail.com
> >> > <javascript:;>>
> >> > > wrote:
> >> > >
> >> > > > Sure,
> >> > > >
> >> > > > Hbase data that I have is:
> >> > > >
> >> > > > rowkey                us         uk
> >> > > > 20161001           3            4
> >> > > > 20161002           1            2
> >> > > >
> >> > > >
> >> > > > select myaggFunc(us) from table :    // this is returning output
> as
> >> :
> >> > > > 4
> >> > > > select myaggFunc(uk) from table :    // this is returning output
> as
> >> :
> >> > > > 6
> >> > > >
> >> > > > In similar to that, i'm visualizing the query like: select
> >> > > > myaggFunc1(us,uk)
> >> > > > from table;  //with multiple columns
> >> > > >
> >> > > > to return output:   (based on the aggregation logic, below results
> >> are
> >> > > for
> >> > > > sum aggregation)
> >> > > > us   4
> >> > > > uk   6
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> >> jamestaylor@apache.org
> >> > <javascript:;>>
> >> > > > wrote:
> >> > > >
> >> > > > > Removing user list (please don't cross post)
> >> > > > >
> >> > > > > Can you give us a full example of the query you have in mind?
> >> > > > >
> >> > > > > Thanks,
> >> > > > > James
> >> > > > >
> >> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> >> > talktoswapna@gmail.com <javascript:;>
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I'm trying to implement aggregate function on multiple columns
> >> (as
> >> > an
> >> > > > > > arguments) like:
> >> > > > > >
> >> > > > > > myaggFunc(col1,col2)
> >> > > > > >
> >> > > > > > And I would want to return the results by each column after
> >> > applying
> >> > > > > > aggregate operation.
> >> > > > > >
> >> > > > > > The output would be something like:
> >> > > > > >
> >> > > > > > col1, count ( aggregate of all records for col1)
> >> > > > > > col2, count
> >> > > > > >
> >> > > > > > Inorder to return the results in the above format, what is the
> >> > return
> >> > > > > data
> >> > > > > > type (of the method) should I have to choose?
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Hi James/Team,


myaggFunc(col1,col2)

I tried implementing this new aggregate function with multiple (2, to start
with)  columns , expressions as arguments.

And its giving me this error:

index (1) must be less than size (1)


My function definition:

@FunctionParseNode.BuiltInFunction(name = myaggFunc.NAME, nodeClass =
MyAggParseNode.class, args = {
        @FunctionParseNode.Argument(),
        @FunctionParseNode.Argument()})

is there any example that I can refer, which accepts multiple fields
as arguments to function.

Any pointers would really help.

Thank you,





On Wed, May 4, 2016 at 12:05 AM, Swapna Swapna <ta...@gmail.com>
wrote:

> Hi James,
>
> the new ones are in similar lines to existing aggregate functions:
>
> I misinterpreted this definition, thanks for clarifying :
> *A reference to a column is also an expression*
>
> Regards
> Swapna
>
> On Tue, May 3, 2016 at 11:39 PM, James Taylor <ja...@apache.org>
> wrote:
>
>> Hi Swapna,
>> All our aggregate functions allow expressions as arguments and it wouldn't
>> make sense to have these new ones be different. A reference to a column is
>> also an expression. It doesn't change the HBase data model being sparse.
>>
>> I think the next step should be for you to submit a patch that the
>> community can take a look at, as it's too difficult to discuss this
>> without
>> that.
>>
>> Thanks,
>> James
>>
>> On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com> wrote:
>>
>> > Hi James,
>> >
>> > Thanks for your swift response.
>> >
>> > I wouldn't be able to use the expression in the below query rather I
>> would
>> > have to provide the columns (as arguments) which I'm interested in to
>> > perform the aggregation on respective provided columns.
>> >
>> > myaggFunc(col1,col2)
>> >
>> > the reason being, the hbase data is sparsed and I would not know the
>> column
>> > values. Data fetch is based on a row key.
>> >
>> > expression example:
>> >
>> > ID=1 OR NAME='Hi'
>> >
>> > Regards
>> >
>> > Swapna
>> >
>> >
>> >
>> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <jamestaylor@apache.org
>> > <javascript:;>> wrote:
>> >
>> > > Hi Swapna,
>> > > The return type is typically derived from looking at the return types
>> of
>> > > each of the input arguments and choosing what'll work without losing
>> > > precision. For example, take a look at this loop in ExpressionCompiler
>> > that
>> > > determines this for expressions that are added together:
>> > >
>> > >         new ArithmeticExpressionFactory() {
>> > >             @Override
>> > >             public Expression create(ArithmeticParseNode node,
>> > > List<Expression> children) throws SQLException {
>> > >                 boolean foundDate = false;
>> > >                 Determinism determinism = Determinism.ALWAYS;
>> > >                 PDataType theType = null;
>> > >                 for(int i = 0; i < children.size(); i++) {
>> > >
>> > > Your probably already doing this, but make sure you don't assume the
>> > > arguments are column references, but allow them to be any expression.
>> > >
>> > > Also, it'd be great to see what you've got so far without handling
>> > multiple
>> > > arguments to your function (in the form of a pull request) so folks
>> can
>> > get
>> > > you feedback on your work so far.
>> > >
>> > > Thanks, and we appreciate the contributions!
>> > >
>> > > James
>> > >
>> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <
>> talktoswapna@gmail.com
>> > <javascript:;>>
>> > > wrote:
>> > >
>> > > > Sure,
>> > > >
>> > > > Hbase data that I have is:
>> > > >
>> > > > rowkey                us         uk
>> > > > 20161001           3            4
>> > > > 20161002           1            2
>> > > >
>> > > >
>> > > > select myaggFunc(us) from table :    // this is returning output as
>> :
>> > > > 4
>> > > > select myaggFunc(uk) from table :    // this is returning output as
>> :
>> > > > 6
>> > > >
>> > > > In similar to that, i'm visualizing the query like: select
>> > > > myaggFunc1(us,uk)
>> > > > from table;  //with multiple columns
>> > > >
>> > > > to return output:   (based on the aggregation logic, below results
>> are
>> > > for
>> > > > sum aggregation)
>> > > > us   4
>> > > > uk   6
>> > > >
>> > > >
>> > > >
>> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
>> jamestaylor@apache.org
>> > <javascript:;>>
>> > > > wrote:
>> > > >
>> > > > > Removing user list (please don't cross post)
>> > > > >
>> > > > > Can you give us a full example of the query you have in mind?
>> > > > >
>> > > > > Thanks,
>> > > > > James
>> > > > >
>> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
>> > talktoswapna@gmail.com <javascript:;>
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I'm trying to implement aggregate function on multiple columns
>> (as
>> > an
>> > > > > > arguments) like:
>> > > > > >
>> > > > > > myaggFunc(col1,col2)
>> > > > > >
>> > > > > > And I would want to return the results by each column after
>> > applying
>> > > > > > aggregate operation.
>> > > > > >
>> > > > > > The output would be something like:
>> > > > > >
>> > > > > > col1, count ( aggregate of all records for col1)
>> > > > > > col2, count
>> > > > > >
>> > > > > > Inorder to return the results in the above format, what is the
>> > return
>> > > > > data
>> > > > > > type (of the method) should I have to choose?
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Hi James,

the new ones are in similar lines to existing aggregate functions:

I misinterpreted this definition, thanks for clarifying :
*A reference to a column is also an expression*

Regards
Swapna

On Tue, May 3, 2016 at 11:39 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Swapna,
> All our aggregate functions allow expressions as arguments and it wouldn't
> make sense to have these new ones be different. A reference to a column is
> also an expression. It doesn't change the HBase data model being sparse.
>
> I think the next step should be for you to submit a patch that the
> community can take a look at, as it's too difficult to discuss this without
> that.
>
> Thanks,
> James
>
> On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com> wrote:
>
> > Hi James,
> >
> > Thanks for your swift response.
> >
> > I wouldn't be able to use the expression in the below query rather I
> would
> > have to provide the columns (as arguments) which I'm interested in to
> > perform the aggregation on respective provided columns.
> >
> > myaggFunc(col1,col2)
> >
> > the reason being, the hbase data is sparsed and I would not know the
> column
> > values. Data fetch is based on a row key.
> >
> > expression example:
> >
> > ID=1 OR NAME='Hi'
> >
> > Regards
> >
> > Swapna
> >
> >
> >
> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <jamestaylor@apache.org
> > <javascript:;>> wrote:
> >
> > > Hi Swapna,
> > > The return type is typically derived from looking at the return types
> of
> > > each of the input arguments and choosing what'll work without losing
> > > precision. For example, take a look at this loop in ExpressionCompiler
> > that
> > > determines this for expressions that are added together:
> > >
> > >         new ArithmeticExpressionFactory() {
> > >             @Override
> > >             public Expression create(ArithmeticParseNode node,
> > > List<Expression> children) throws SQLException {
> > >                 boolean foundDate = false;
> > >                 Determinism determinism = Determinism.ALWAYS;
> > >                 PDataType theType = null;
> > >                 for(int i = 0; i < children.size(); i++) {
> > >
> > > Your probably already doing this, but make sure you don't assume the
> > > arguments are column references, but allow them to be any expression.
> > >
> > > Also, it'd be great to see what you've got so far without handling
> > multiple
> > > arguments to your function (in the form of a pull request) so folks can
> > get
> > > you feedback on your work so far.
> > >
> > > Thanks, and we appreciate the contributions!
> > >
> > > James
> > >
> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <talktoswapna@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Sure,
> > > >
> > > > Hbase data that I have is:
> > > >
> > > > rowkey                us         uk
> > > > 20161001           3            4
> > > > 20161002           1            2
> > > >
> > > >
> > > > select myaggFunc(us) from table :    // this is returning output as :
> > > > 4
> > > > select myaggFunc(uk) from table :    // this is returning output as :
> > > > 6
> > > >
> > > > In similar to that, i'm visualizing the query like: select
> > > > myaggFunc1(us,uk)
> > > > from table;  //with multiple columns
> > > >
> > > > to return output:   (based on the aggregation logic, below results
> are
> > > for
> > > > sum aggregation)
> > > > us   4
> > > > uk   6
> > > >
> > > >
> > > >
> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> jamestaylor@apache.org
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Removing user list (please don't cross post)
> > > > >
> > > > > Can you give us a full example of the query you have in mind?
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > talktoswapna@gmail.com <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm trying to implement aggregate function on multiple columns
> (as
> > an
> > > > > > arguments) like:
> > > > > >
> > > > > > myaggFunc(col1,col2)
> > > > > >
> > > > > > And I would want to return the results by each column after
> > applying
> > > > > > aggregate operation.
> > > > > >
> > > > > > The output would be something like:
> > > > > >
> > > > > > col1, count ( aggregate of all records for col1)
> > > > > > col2, count
> > > > > >
> > > > > > Inorder to return the results in the above format, what is the
> > return
> > > > > data
> > > > > > type (of the method) should I have to choose?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Sure James. Will take a look on the process.

Regards
Swapna

On Tue, May 3, 2016 at 11:39 PM, James Taylor <ja...@apache.org>
wrote:

> Hi Swapna,
> All our aggregate functions allow expressions as arguments and it wouldn't
> make sense to have these new ones be different. A reference to a column is
> also an expression. It doesn't change the HBase data model being sparse.
>
> I think the next step should be for you to submit a patch that the
> community can take a look at, as it's too difficult to discuss this without
> that.
>
> Thanks,
> James
>
> On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com> wrote:
>
> > Hi James,
> >
> > Thanks for your swift response.
> >
> > I wouldn't be able to use the expression in the below query rather I
> would
> > have to provide the columns (as arguments) which I'm interested in to
> > perform the aggregation on respective provided columns.
> >
> > myaggFunc(col1,col2)
> >
> > the reason being, the hbase data is sparsed and I would not know the
> column
> > values. Data fetch is based on a row key.
> >
> > expression example:
> >
> > ID=1 OR NAME='Hi'
> >
> > Regards
> >
> > Swapna
> >
> >
> >
> > On Tue, May 3, 2016 at 7:17 PM, James Taylor <jamestaylor@apache.org
> > <javascript:;>> wrote:
> >
> > > Hi Swapna,
> > > The return type is typically derived from looking at the return types
> of
> > > each of the input arguments and choosing what'll work without losing
> > > precision. For example, take a look at this loop in ExpressionCompiler
> > that
> > > determines this for expressions that are added together:
> > >
> > >         new ArithmeticExpressionFactory() {
> > >             @Override
> > >             public Expression create(ArithmeticParseNode node,
> > > List<Expression> children) throws SQLException {
> > >                 boolean foundDate = false;
> > >                 Determinism determinism = Determinism.ALWAYS;
> > >                 PDataType theType = null;
> > >                 for(int i = 0; i < children.size(); i++) {
> > >
> > > Your probably already doing this, but make sure you don't assume the
> > > arguments are column references, but allow them to be any expression.
> > >
> > > Also, it'd be great to see what you've got so far without handling
> > multiple
> > > arguments to your function (in the form of a pull request) so folks can
> > get
> > > you feedback on your work so far.
> > >
> > > Thanks, and we appreciate the contributions!
> > >
> > > James
> > >
> > > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <talktoswapna@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Sure,
> > > >
> > > > Hbase data that I have is:
> > > >
> > > > rowkey                us         uk
> > > > 20161001           3            4
> > > > 20161002           1            2
> > > >
> > > >
> > > > select myaggFunc(us) from table :    // this is returning output as :
> > > > 4
> > > > select myaggFunc(uk) from table :    // this is returning output as :
> > > > 6
> > > >
> > > > In similar to that, i'm visualizing the query like: select
> > > > myaggFunc1(us,uk)
> > > > from table;  //with multiple columns
> > > >
> > > > to return output:   (based on the aggregation logic, below results
> are
> > > for
> > > > sum aggregation)
> > > > us   4
> > > > uk   6
> > > >
> > > >
> > > >
> > > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <
> jamestaylor@apache.org
> > <javascript:;>>
> > > > wrote:
> > > >
> > > > > Removing user list (please don't cross post)
> > > > >
> > > > > Can you give us a full example of the query you have in mind?
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> > talktoswapna@gmail.com <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm trying to implement aggregate function on multiple columns
> (as
> > an
> > > > > > arguments) like:
> > > > > >
> > > > > > myaggFunc(col1,col2)
> > > > > >
> > > > > > And I would want to return the results by each column after
> > applying
> > > > > > aggregate operation.
> > > > > >
> > > > > > The output would be something like:
> > > > > >
> > > > > > col1, count ( aggregate of all records for col1)
> > > > > > col2, count
> > > > > >
> > > > > > Inorder to return the results in the above format, what is the
> > return
> > > > > data
> > > > > > type (of the method) should I have to choose?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by James Taylor <ja...@apache.org>.
Hi Swapna,
All our aggregate functions allow expressions as arguments and it wouldn't
make sense to have these new ones be different. A reference to a column is
also an expression. It doesn't change the HBase data model being sparse.

I think the next step should be for you to submit a patch that the
community can take a look at, as it's too difficult to discuss this without
that.

Thanks,
James

On Tuesday, May 3, 2016, Swapna Swapna <ta...@gmail.com> wrote:

> Hi James,
>
> Thanks for your swift response.
>
> I wouldn't be able to use the expression in the below query rather I would
> have to provide the columns (as arguments) which I'm interested in to
> perform the aggregation on respective provided columns.
>
> myaggFunc(col1,col2)
>
> the reason being, the hbase data is sparsed and I would not know the column
> values. Data fetch is based on a row key.
>
> expression example:
>
> ID=1 OR NAME='Hi'
>
> Regards
>
> Swapna
>
>
>
> On Tue, May 3, 2016 at 7:17 PM, James Taylor <jamestaylor@apache.org
> <javascript:;>> wrote:
>
> > Hi Swapna,
> > The return type is typically derived from looking at the return types of
> > each of the input arguments and choosing what'll work without losing
> > precision. For example, take a look at this loop in ExpressionCompiler
> that
> > determines this for expressions that are added together:
> >
> >         new ArithmeticExpressionFactory() {
> >             @Override
> >             public Expression create(ArithmeticParseNode node,
> > List<Expression> children) throws SQLException {
> >                 boolean foundDate = false;
> >                 Determinism determinism = Determinism.ALWAYS;
> >                 PDataType theType = null;
> >                 for(int i = 0; i < children.size(); i++) {
> >
> > Your probably already doing this, but make sure you don't assume the
> > arguments are column references, but allow them to be any expression.
> >
> > Also, it'd be great to see what you've got so far without handling
> multiple
> > arguments to your function (in the form of a pull request) so folks can
> get
> > you feedback on your work so far.
> >
> > Thanks, and we appreciate the contributions!
> >
> > James
> >
> > On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <talktoswapna@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Sure,
> > >
> > > Hbase data that I have is:
> > >
> > > rowkey                us         uk
> > > 20161001           3            4
> > > 20161002           1            2
> > >
> > >
> > > select myaggFunc(us) from table :    // this is returning output as :
> > > 4
> > > select myaggFunc(uk) from table :    // this is returning output as :
> > > 6
> > >
> > > In similar to that, i'm visualizing the query like: select
> > > myaggFunc1(us,uk)
> > > from table;  //with multiple columns
> > >
> > > to return output:   (based on the aggregation logic, below results are
> > for
> > > sum aggregation)
> > > us   4
> > > uk   6
> > >
> > >
> > >
> > > On Tue, May 3, 2016 at 11:33 AM, James Taylor <jamestaylor@apache.org
> <javascript:;>>
> > > wrote:
> > >
> > > > Removing user list (please don't cross post)
> > > >
> > > > Can you give us a full example of the query you have in mind?
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <
> talktoswapna@gmail.com <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm trying to implement aggregate function on multiple columns (as
> an
> > > > > arguments) like:
> > > > >
> > > > > myaggFunc(col1,col2)
> > > > >
> > > > > And I would want to return the results by each column after
> applying
> > > > > aggregate operation.
> > > > >
> > > > > The output would be something like:
> > > > >
> > > > > col1, count ( aggregate of all records for col1)
> > > > > col2, count
> > > > >
> > > > > Inorder to return the results in the above format, what is the
> return
> > > > data
> > > > > type (of the method) should I have to choose?
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Hi James,

Thanks for your swift response.

I wouldn't be able to use the expression in the below query rather I would
have to provide the columns (as arguments) which I'm interested in to
perform the aggregation on respective provided columns.

myaggFunc(col1,col2)

the reason being, the hbase data is sparsed and I would not know the column
values. Data fetch is based on a row key.

expression example:

ID=1 OR NAME='Hi'

Regards

Swapna



On Tue, May 3, 2016 at 7:17 PM, James Taylor <ja...@apache.org> wrote:

> Hi Swapna,
> The return type is typically derived from looking at the return types of
> each of the input arguments and choosing what'll work without losing
> precision. For example, take a look at this loop in ExpressionCompiler that
> determines this for expressions that are added together:
>
>         new ArithmeticExpressionFactory() {
>             @Override
>             public Expression create(ArithmeticParseNode node,
> List<Expression> children) throws SQLException {
>                 boolean foundDate = false;
>                 Determinism determinism = Determinism.ALWAYS;
>                 PDataType theType = null;
>                 for(int i = 0; i < children.size(); i++) {
>
> Your probably already doing this, but make sure you don't assume the
> arguments are column references, but allow them to be any expression.
>
> Also, it'd be great to see what you've got so far without handling multiple
> arguments to your function (in the form of a pull request) so folks can get
> you feedback on your work so far.
>
> Thanks, and we appreciate the contributions!
>
> James
>
> On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <ta...@gmail.com>
> wrote:
>
> > Sure,
> >
> > Hbase data that I have is:
> >
> > rowkey                us         uk
> > 20161001           3            4
> > 20161002           1            2
> >
> >
> > select myaggFunc(us) from table :    // this is returning output as :
> > 4
> > select myaggFunc(uk) from table :    // this is returning output as :
> > 6
> >
> > In similar to that, i'm visualizing the query like: select
> > myaggFunc1(us,uk)
> > from table;  //with multiple columns
> >
> > to return output:   (based on the aggregation logic, below results are
> for
> > sum aggregation)
> > us   4
> > uk   6
> >
> >
> >
> > On Tue, May 3, 2016 at 11:33 AM, James Taylor <ja...@apache.org>
> > wrote:
> >
> > > Removing user list (please don't cross post)
> > >
> > > Can you give us a full example of the query you have in mind?
> > >
> > > Thanks,
> > > James
> > >
> > > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <talktoswapna@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm trying to implement aggregate function on multiple columns (as an
> > > > arguments) like:
> > > >
> > > > myaggFunc(col1,col2)
> > > >
> > > > And I would want to return the results by each column after applying
> > > > aggregate operation.
> > > >
> > > > The output would be something like:
> > > >
> > > > col1, count ( aggregate of all records for col1)
> > > > col2, count
> > > >
> > > > Inorder to return the results in the above format, what is the return
> > > data
> > > > type (of the method) should I have to choose?
> > > >
> > > > Thanks
> > > >
> > >
> >
>

Re: Question on aggregate function

Posted by James Taylor <ja...@apache.org>.
Hi Swapna,
The return type is typically derived from looking at the return types of
each of the input arguments and choosing what'll work without losing
precision. For example, take a look at this loop in ExpressionCompiler that
determines this for expressions that are added together:

        new ArithmeticExpressionFactory() {
            @Override
            public Expression create(ArithmeticParseNode node,
List<Expression> children) throws SQLException {
                boolean foundDate = false;
                Determinism determinism = Determinism.ALWAYS;
                PDataType theType = null;
                for(int i = 0; i < children.size(); i++) {

Your probably already doing this, but make sure you don't assume the
arguments are column references, but allow them to be any expression.

Also, it'd be great to see what you've got so far without handling multiple
arguments to your function (in the form of a pull request) so folks can get
you feedback on your work so far.

Thanks, and we appreciate the contributions!

James

On Tue, May 3, 2016 at 12:59 PM, Swapna Swapna <ta...@gmail.com>
wrote:

> Sure,
>
> Hbase data that I have is:
>
> rowkey                us         uk
> 20161001           3            4
> 20161002           1            2
>
>
> select myaggFunc(us) from table :    // this is returning output as :
> 4
> select myaggFunc(uk) from table :    // this is returning output as :
> 6
>
> In similar to that, i'm visualizing the query like: select
> myaggFunc1(us,uk)
> from table;  //with multiple columns
>
> to return output:   (based on the aggregation logic, below results are for
> sum aggregation)
> us   4
> uk   6
>
>
>
> On Tue, May 3, 2016 at 11:33 AM, James Taylor <ja...@apache.org>
> wrote:
>
> > Removing user list (please don't cross post)
> >
> > Can you give us a full example of the query you have in mind?
> >
> > Thanks,
> > James
> >
> > On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <ta...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm trying to implement aggregate function on multiple columns (as an
> > > arguments) like:
> > >
> > > myaggFunc(col1,col2)
> > >
> > > And I would want to return the results by each column after applying
> > > aggregate operation.
> > >
> > > The output would be something like:
> > >
> > > col1, count ( aggregate of all records for col1)
> > > col2, count
> > >
> > > Inorder to return the results in the above format, what is the return
> > data
> > > type (of the method) should I have to choose?
> > >
> > > Thanks
> > >
> >
>

Re: Question on aggregate function

Posted by Swapna Swapna <ta...@gmail.com>.
Sure,

Hbase data that I have is:

rowkey                us         uk
20161001           3            4
20161002           1            2


select myaggFunc(us) from table :    // this is returning output as :
4
select myaggFunc(uk) from table :    // this is returning output as :
6

In similar to that, i'm visualizing the query like: select myaggFunc1(us,uk)
from table;  //with multiple columns

to return output:   (based on the aggregation logic, below results are for
sum aggregation)
us   4
uk   6



On Tue, May 3, 2016 at 11:33 AM, James Taylor <ja...@apache.org>
wrote:

> Removing user list (please don't cross post)
>
> Can you give us a full example of the query you have in mind?
>
> Thanks,
> James
>
> On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <ta...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm trying to implement aggregate function on multiple columns (as an
> > arguments) like:
> >
> > myaggFunc(col1,col2)
> >
> > And I would want to return the results by each column after applying
> > aggregate operation.
> >
> > The output would be something like:
> >
> > col1, count ( aggregate of all records for col1)
> > col2, count
> >
> > Inorder to return the results in the above format, what is the return
> data
> > type (of the method) should I have to choose?
> >
> > Thanks
> >
>

Re: Question on aggregate function

Posted by James Taylor <ja...@apache.org>.
Removing user list (please don't cross post)

Can you give us a full example of the query you have in mind?

Thanks,
James

On Tue, May 3, 2016 at 11:14 AM, Swapna Swapna <ta...@gmail.com>
wrote:

> Hi,
>
> I'm trying to implement aggregate function on multiple columns (as an
> arguments) like:
>
> myaggFunc(col1,col2)
>
> And I would want to return the results by each column after applying
> aggregate operation.
>
> The output would be something like:
>
> col1, count ( aggregate of all records for col1)
> col2, count
>
> Inorder to return the results in the above format, what is the return data
> type (of the method) should I have to choose?
>
> Thanks
>