You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Shuo Chen <ch...@gmail.com> on 2015/11/04 03:46:07 UTC

multiple fields grouping in storm

I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX.
BoltX declares output with multiple fields, each tuple contains 4 strings:

class BoltX implements IBasicBolt {
    ...
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("A","B","C","D"));
    }}

In BoltY:

class BoltX implements IBasicBolt {
    boolean hasReceive = false;
    String A = null;
    String B = null;
    ...
    public void execute(Tuple input, BasicOutputCollector collector) {
        if (!hasReceive) {
            hasReceive = true;
            A = input.getString(0);
            B = input.getString(1);
        }

        if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) {
            LOG.error("group error");
            return;
        }
        ...
    }
    ...}

In Topology:

...
builder.setBolt("x", new BoltX(), 3);
builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new
Fields("A", "B"));...

I think that the output from x with same fields "A" and "B" will go to the
same task of BoltY.

However, the log of topology shows lots of "group error".

So how to group outputs with same fields "A" and "B" to the same task of
BoltY?

The question is also asked in
http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm

-- 
*Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com

Re: multiple fields grouping in storm

Posted by Priyank Shah <ps...@hortonworks.com>.
Direct Grouping should work for you. Check out https://storm.apache.org/javadoc/apidocs/backtype/storm/task/OutputCollector.html#emitDirect-int-java.util.Collection-java.util.List-

From: Shuo Chen
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>"
Date: Tuesday, November 3, 2015 at 7:37 PM
To: "user@storm.apache.org<ma...@storm.apache.org>"
Subject: Re: multiple fields grouping in storm

Thanks Priyank!

Are there any method to guarantee that same task just receives the tuples with same fields?

On Wed, Nov 4, 2015 at 11:21 AM, Priyank Shah <ps...@hortonworks.com>> wrote:
Hi Shuo,

Seeing a lot of group errors in log file is expected. From http://storm.apache.org/documentation/Concepts.html the description of Field Grouping says


  1.  Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.

It means the tuples with same values for field A and B will always go to the same task but it does not mean that tuples with other vales for field A and B cannot go to the same task. For e.g. If your input data has following tuples

1, 2, 3, 4
1, 2, 5, 6
3, 4, 5, 6

In the above scenario the first two tuples are guaranteed to go to the same task but the third tuple can also go to the same task, specially when parallelism hint is set to 1 for BoltY. There is no other task. Think about it like hashcode method in java. Equal objects always have same hash codes but two different objects can have the same hashcode.

From: Shuo Chen
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>"
Date: Tuesday, November 3, 2015 at 6:46 PM
To: "user@storm.apache.org<ma...@storm.apache.org>"
Subject: multiple fields grouping in storm


I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX. BoltX declares output with multiple fields, each tuple contains 4 strings:

class BoltX implements IBasicBolt {
    ...
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("A","B","C","D"));
    }}

In BoltY:

class BoltX implements IBasicBolt {
    boolean hasReceive = false;
    String A = null;
    String B = null;
    ...
    public void execute(Tuple input, BasicOutputCollector collector) {
        if (!hasReceive) {
            hasReceive = true;
            A = input.getString(0);
            B = input.getString(1);
        }

        if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) {
            LOG.error("group error");
            return;
        }
        ...
    }
    ...}

In Topology:

...
builder.setBolt("x", new BoltX(), 3);
builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new Fields("A", "B"));...

I think that the output from x with same fields "A" and "B" will go to the same task of BoltY.

However, the log of topology shows lots of "group error".

So how to group outputs with same fields "A" and "B" to the same task of BoltY?

The question is also asked in http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm

--
Shuo Chen
chenatu2006@gmail.com<ma...@gmail.com>
chenshuo@whaty.com<ma...@whaty.com>



--
陈硕 Shuo Chen
chenatu2006@gmail.com<ma...@gmail.com>
chenshuo@whaty.com<ma...@whaty.com>

Re: multiple fields grouping in storm

Posted by Shuo Chen <ch...@gmail.com>.
Thanks Priyank!

Are there any method to guarantee that same task just receives the tuples
with same fields?

On Wed, Nov 4, 2015 at 11:21 AM, Priyank Shah <ps...@hortonworks.com> wrote:

> Hi Shuo,
>
> Seeing a lot of group errors in log file is expected. From
> http://storm.apache.org/documentation/Concepts.html the description of
> Field Grouping says
>
>
>    1. Fields grouping: The stream is partitioned by the fields specified
>    in the grouping. For example, if the stream is grouped by the "user-id"
>    field, tuples with the same "user-id" will always go to the same task, but
>    tuples with different "user-id"'s may go to different tasks.
>
> It means the tuples with same values for field A and B will always go to
> the same task but it does not mean that tuples with other vales for field A
> and B cannot go to the same task. For e.g. If your input data has following
> tuples
>
> 1, 2, 3, 4
> 1, 2, 5, 6
> 3, 4, 5, 6
>
> In the above scenario the first two tuples are guaranteed to go to the
> same task but the third tuple can also go to the same task, specially when
> parallelism hint is set to 1 for BoltY. There is no other task. Think about
> it like hashcode method in java. Equal objects always have same hash codes
> but two different objects can have the same hashcode.
>
> From: Shuo Chen
> Reply-To: "user@storm.apache.org"
> Date: Tuesday, November 3, 2015 at 6:46 PM
> To: "user@storm.apache.org"
> Subject: multiple fields grouping in storm
>
> I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX.
> BoltX declares output with multiple fields, each tuple contains 4 strings:
>
> class BoltX implements IBasicBolt {
>     ...
>     public void declareOutputFields(OutputFieldsDeclarer declarer) {
>         declarer.declare(new Fields("A","B","C","D"));
>     }}
>
> In BoltY:
>
> class BoltX implements IBasicBolt {
>     boolean hasReceive = false;
>     String A = null;
>     String B = null;
>     ...
>     public void execute(Tuple input, BasicOutputCollector collector) {
>         if (!hasReceive) {
>             hasReceive = true;
>             A = input.getString(0);
>             B = input.getString(1);
>         }
>
>         if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) {
>             LOG.error("group error");
>             return;
>         }
>         ...
>     }
>     ...}
>
> In Topology:
>
> ...
> builder.setBolt("x", new BoltX(), 3);
> builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new Fields("A", "B"));...
>
> I think that the output from x with same fields "A" and "B" will go to the
> same task of BoltY.
>
> However, the log of topology shows lots of "group error".
>
> So how to group outputs with same fields "A" and "B" to the same task of
> BoltY?
>
> The question is also asked in
> http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm
>
> --
> *Shuo Chen*
> chenatu2006@gmail.com
> chenshuo@whaty.com
>



-- 
*陈硕* *Shuo Chen*
chenatu2006@gmail.com
chenshuo@whaty.com

Re: multiple fields grouping in storm

Posted by Priyank Shah <ps...@hortonworks.com>.
Hi Shuo,

Seeing a lot of group errors in log file is expected. From http://storm.apache.org/documentation/Concepts.html the description of Field Grouping says


  1.  Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.

It means the tuples with same values for field A and B will always go to the same task but it does not mean that tuples with other vales for field A and B cannot go to the same task. For e.g. If your input data has following tuples

1, 2, 3, 4
1, 2, 5, 6
3, 4, 5, 6

In the above scenario the first two tuples are guaranteed to go to the same task but the third tuple can also go to the same task, specially when parallelism hint is set to 1 for BoltY. There is no other task. Think about it like hashcode method in java. Equal objects always have same hash codes but two different objects can have the same hashcode.

From: Shuo Chen
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>"
Date: Tuesday, November 3, 2015 at 6:46 PM
To: "user@storm.apache.org<ma...@storm.apache.org>"
Subject: multiple fields grouping in storm


I have two Bolt class BoltX and BoltY. BoltY receives tuples from BoltX. BoltX declares output with multiple fields, each tuple contains 4 strings:

class BoltX implements IBasicBolt {
    ...
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("A","B","C","D"));
    }}

In BoltY:

class BoltX implements IBasicBolt {
    boolean hasReceive = false;
    String A = null;
    String B = null;
    ...
    public void execute(Tuple input, BasicOutputCollector collector) {
        if (!hasReceive) {
            hasReceive = true;
            A = input.getString(0);
            B = input.getString(1);
        }

        if (!input.getString(0).equals(A) || !input.getString(1).equals(B)) {
            LOG.error("group error");
            return;
        }
        ...
    }
    ...}

In Topology:

...
builder.setBolt("x", new BoltX(), 3);
builder.setBolt("y", new Bolty(), 3).fieldsGrouping("x", new Fields("A", "B"));...

I think that the output from x with same fields "A" and "B" will go to the same task of BoltY.

However, the log of topology shows lots of "group error".

So how to group outputs with same fields "A" and "B" to the same task of BoltY?

The question is also asked in http://stackoverflow.com/questions/33512554/multiple-fields-grouping-in-storm

--
Shuo Chen
chenatu2006@gmail.com<ma...@gmail.com>
chenshuo@whaty.com<ma...@whaty.com>