You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joseph Wonesh <jo...@sticknfind.com> on 2018/10/23 14:46:15 UTC
Aggregation of Set Data Type
Hello all,
I am trying to aggregate rows which each contain a column of Set<BIGINT>.
I would like the result to contain the sum of all sets, where null would be
equivalent to the empty set. I expected a query like: "select
sum(my_set_column) from my_table group by my_key_column" to do this, but
the set type is not supported by this aggregate. Does anyone know of a way
to aggregate this using existing cassandra built-ins? Thanks!
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.
Re: Aggregation of Set Data Type
Posted by Joseph Wonesh <jo...@sticknfind.com>.
I was able to make this work using the following UDF/UDA:
CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val
set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE
java AS
$$
if (val == null) {
return state;
}
Set<Long> s = state.getSet(1, Long.class);
s.addAll(val);
state.setSet(1, s);
return state;
$$;
CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int,
set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS
$$
return state.getSet(1, Long.class);
$$;
CREATE AGGREGATE agg_set(set<bigint>)
SFUNC agg_set_func
STYPE tuple<int, set<bigint>>
FINALFUNC agg_set_func_final
INITCOND (0,{});
On Thu, Oct 25, 2018 at 3:55 PM Joseph Wonesh <jo...@sticknfind.com>
wrote:
> Thank you for your reply. I actually found your blog post regarding this
> topic and browsed through it, but it did not yield the answer I was looking
> for. In fact, it seems impossible to do what I wish to do without defining
> a UDA for this specific use case -- something that is not practical to do
> when all of my queries use 'group by'.
>
> For example, I ave a query like this:
>
> select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table
> group by a;
>
> I would hope that using a UDF for my_set_column would allow me to combine
> all of the my_set_columns passed in via group by, but I cannot pass state
> to the UDF. A UDA can accept state, but that would require me rewriting the
> whole query to be:
>
> select MY_UDA(a, my_set_column) from my_table;
>
> Additionally, I would need a separate UDA for each of the different group
> by clauses. Is there no way around this? I would really like to be able to
> simply add a data column of type set<bigint> and then get all of the unique
> members in this set across an aggregation.
>
> On Tue, Oct 23, 2018 at 1:44 PM DuyHai Doan <do...@gmail.com> wrote:
>
>> You will need to use user defined aggregates for this
>>
>> Le 23 oct. 2018 16:46, "Joseph Wonesh" <jo...@sticknfind.com> a
>> écrit :
>>
>>> Hello all,
>>>
>>> I am trying to aggregate rows which each contain a column of
>>> Set<BIGINT>. I would like the result to contain the sum of all sets, where
>>> null would be equivalent to the empty set. I expected a query like: "select
>>> sum(my_set_column) from my_table group by my_key_column" to do this, but
>>> the set type is not supported by this aggregate. Does anyone know of a way
>>> to aggregate this using existing cassandra built-ins? Thanks!
>>>
>>> This message is private and confidential. If you have received message
>>> in error, please notify us and remove from your system.
>>
>>
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.
Re: Aggregation of Set Data Type
Posted by Joseph Wonesh <jo...@sticknfind.com>.
Thank you for your reply. I actually found your blog post regarding this
topic and browsed through it, but it did not yield the answer I was looking
for. In fact, it seems impossible to do what I wish to do without defining
a UDA for this specific use case -- something that is not practical to do
when all of my queries use 'group by'.
For example, I ave a query like this:
select sum(a), avg(a), min(a), max(a), MY_UDF(my_set_column) from my_table
group by a;
I would hope that using a UDF for my_set_column would allow me to combine
all of the my_set_columns passed in via group by, but I cannot pass state
to the UDF. A UDA can accept state, but that would require me rewriting the
whole query to be:
select MY_UDA(a, my_set_column) from my_table;
Additionally, I would need a separate UDA for each of the different group
by clauses. Is there no way around this? I would really like to be able to
simply add a data column of type set<bigint> and then get all of the unique
members in this set across an aggregation.
On Tue, Oct 23, 2018 at 1:44 PM DuyHai Doan <do...@gmail.com> wrote:
> You will need to use user defined aggregates for this
>
> Le 23 oct. 2018 16:46, "Joseph Wonesh" <jo...@sticknfind.com> a
> écrit :
>
>> Hello all,
>>
>> I am trying to aggregate rows which each contain a column of
>> Set<BIGINT>. I would like the result to contain the sum of all sets, where
>> null would be equivalent to the empty set. I expected a query like: "select
>> sum(my_set_column) from my_table group by my_key_column" to do this, but
>> the set type is not supported by this aggregate. Does anyone know of a way
>> to aggregate this using existing cassandra built-ins? Thanks!
>>
>> This message is private and confidential. If you have received message in
>> error, please notify us and remove from your system.
>
>
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.
Re: Aggregation of Set Data Type
Posted by DuyHai Doan <do...@gmail.com>.
You will need to use user defined aggregates for this
Le 23 oct. 2018 16:46, "Joseph Wonesh" <jo...@sticknfind.com> a
écrit :
> Hello all,
>
> I am trying to aggregate rows which each contain a column of Set<BIGINT>.
> I would like the result to contain the sum of all sets, where null would be
> equivalent to the empty set. I expected a query like: "select
> sum(my_set_column) from my_table group by my_key_column" to do this, but
> the set type is not supported by this aggregate. Does anyone know of a way
> to aggregate this using existing cassandra built-ins? Thanks!
>
> This message is private and confidential. If you have received message in
> error, please notify us and remove from your system.