You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Kavi Kumar <ka...@gmail.com> on 2014/01/31 09:55:39 UTC

Dynamic Pivot using Storm

I have rows in BigData DB (Cassandra in my case) with column names
col1,col2,col3,val1,val2

in SQL approach I can do group by col1,col2 or col2,col1 or any other
possible way also. This way I can form tree hierarchy easily.

But now we are using Cassandra to store the data which doesnt support group
by. So we want to use Storm for doing group by and aggregations.
We wrote some sample code do aggregation and group by, but we are unable to
form an opinion whether we can achieve it or not.

Data looks like this


col1,col2,col3,val1,val2
------------------------
a1,b1,c1,10,20
a1,b1,c2,11,13
a1,b2,c1,9,15
a1,b2,c3,13,88
a2,b1,c1,30,44
a2,b3,c2,22,33
a4,b4,c4,99,66


Like in excel pivot I want to build hierarchy
root->child1->child2->child3-val1,val2 then it may look like this if my
hierarchy is col1->col2->col3


a1            {43,136}
    --b1        {21,33}
        --c1    10,20
        --c2    11,13
    --b2        {22,103}
        --c1    9,15
        --c3    13,88
a2            {52,77}
    --b1        {30,44}
        --c1    30,44
    --b3        {22,33}
        --c2    22,33
a4            {99,66}
    --b4        {99,66}
        --c4    99,66


I want to give user functionality to re-arrange hierarchy elements
something like col3->col1->col2 (or something else also, which is dynamic)
in this case data will look like this

c1            {49,79}
    --a1        {19,35}
        --b1    10,20
        --b2    9,15
    --a2        {30,44}
        --b1    30,44
c2            {11,13}
    --a1        {11,13}
        --b1    11,13
    --a2        {22,33}
        --b3    22,33
c3            {13,88}
    --a1        {13,88}
        --b2    13,88
c4            {99,66}
    --a4        {99,66}
        --b4    99,66



Few lines of my trident code looks like this, which is not working as
expected.


topology.newStream("aggregation", spout).groupBy(new Fields(
"col1","col2","col3","val1","val2")).aggregate(new Fields(
"val1","val2"), new Sum(), new Fields("val1sum","val2sum"))
.each(new Fields("col1","col2","col3","val1sum","val2sum"), new
Utils.PrintFilter());


For doing above transformations I want to use Storm with or without Trident
API support.
Can anyone guide me how to achive it? Any program ideas are very much
appreciated.

Re: Dynamic Pivot using Storm

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
You can have 3 different bolts in storm and give like this
bolt1 group by(col1), bolt2 group by (col1, col2), bolt3 group by
(col1, col2, col3)

bolt1 will give o/p as you are expecting
a1            {43,136}
a2            {52,77}
a4            {99,66}

bolt2 will give o/p as
a1
    --b1        {21,33}
    --b2        {22,103}
a2
    --b1        {30,44}
    --b3        {22,33}
a4
    --b4        {99,66}

Rest individual columns will be bolt3 o/p.

You cannot have dynamic grouping




On 1/31/14, Kavi Kumar <ka...@gmail.com> wrote:
> I have rows in BigData DB (Cassandra in my case) with column names
> col1,col2,col3,val1,val2
>
> in SQL approach I can do group by col1,col2 or col2,col1 or any other
> possible way also. This way I can form tree hierarchy easily.
>
> But now we are using Cassandra to store the data which doesnt support group
> by. So we want to use Storm for doing group by and aggregations.
> We wrote some sample code do aggregation and group by, but we are unable to
> form an opinion whether we can achieve it or not.
>
> Data looks like this
>
>
> col1,col2,col3,val1,val2
> ------------------------
> a1,b1,c1,10,20
> a1,b1,c2,11,13
> a1,b2,c1,9,15
> a1,b2,c3,13,88
> a2,b1,c1,30,44
> a2,b3,c2,22,33
> a4,b4,c4,99,66
>
>
> Like in excel pivot I want to build hierarchy
> root->child1->child2->child3-val1,val2 then it may look like this if my
> hierarchy is col1->col2->col3
>
>
> a1            {43,136}
>     --b1        {21,33}
>         --c1    10,20
>         --c2    11,13
>     --b2        {22,103}
>         --c1    9,15
>         --c3    13,88
> a2            {52,77}
>     --b1        {30,44}
>         --c1    30,44
>     --b3        {22,33}
>         --c2    22,33
> a4            {99,66}
>     --b4        {99,66}
>         --c4    99,66
>
>
> I want to give user functionality to re-arrange hierarchy elements
> something like col3->col1->col2 (or something else also, which is dynamic)
> in this case data will look like this
>
> c1            {49,79}
>     --a1        {19,35}
>         --b1    10,20
>         --b2    9,15
>     --a2        {30,44}
>         --b1    30,44
> c2            {11,13}
>     --a1        {11,13}
>         --b1    11,13
>     --a2        {22,33}
>         --b3    22,33
> c3            {13,88}
>     --a1        {13,88}
>         --b2    13,88
> c4            {99,66}
>     --a4        {99,66}
>         --b4    99,66
>
>
>
> Few lines of my trident code looks like this, which is not working as
> expected.
>
>
> topology.newStream("aggregation", spout).groupBy(new Fields(
> "col1","col2","col3","val1","val2")).aggregate(new Fields(
> "val1","val2"), new Sum(), new Fields("val1sum","val2sum"))
> .each(new Fields("col1","col2","col3","val1sum","val2sum"), new
> Utils.PrintFilter());
>
>
> For doing above transformations I want to use Storm with or without Trident
> API support.
> Can anyone guide me how to achive it? Any program ideas are very much
> appreciated.
>