You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by praveen reddy <on...@gmail.com> on 2016/07/14 14:16:41 UTC

help emitting the stream to different bolts

Hi All,

here is my use case

I have a spout which reads JSON data from Kafka Topic, after that spout
will emit the json string to Bolt1. Bolt1 will enrich the data, based on
particular field in the json string, Bolt1 will create 3 different files.
Once any of the file reaches 25 mb size, bolt1 will start emitting the data
from that file. Bolt1 will emit the data to Bolt 2 which is HDFSBolt.
HDFSBolt will store the data onto the file system.



Now the problem, how does HDFS bolt knows from which file Bolt1 is emitting
the data. I need to know whether Bolt1 is emitting the data File1 or File2
or File3. Based on the file, I need to save it in different directory on
the HDFS system.



I was wondering whether I can apply field grouping on HDFS bolt, but I am
not sure how to do it while emitting the data from file. Is there a way I
can do it. Or any other solution which would accomplish this requirement.

Thanks,
Praveen

Re: help emitting the stream to different bolts

Posted by praveen reddy <on...@gmail.com>.
Hi All,

i was able to fix this issue by introducing new Bolt which does routing.
now i am stuck on new issue.

here is my bolt-1 declareOutputFields method

@Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("type","linedata"));
    }
Bolt-1 will emit to HDFSbolt.

i want to get the type value in HDFSbolt. normally i would have got it in
execute method, using tuple. but HDFSBolt how can i get it?

On Thu, Jul 14, 2016 at 10:16 AM, praveen reddy <on...@gmail.com>
wrote:

> Hi All,
>
> here is my use case
>
> I have a spout which reads JSON data from Kafka Topic, after that spout
> will emit the json string to Bolt1. Bolt1 will enrich the data, based on
> particular field in the json string, Bolt1 will create 3 different files.
> Once any of the file reaches 25 mb size, bolt1 will start emitting the data
> from that file. Bolt1 will emit the data to Bolt 2 which is HDFSBolt.
> HDFSBolt will store the data onto the file system.
>
>
>
> Now the problem, how does HDFS bolt knows from which file Bolt1 is
> emitting the data. I need to know whether Bolt1 is emitting the data File1
> or File2 or File3. Based on the file, I need to save it in different
> directory on the HDFS system.
>
>
>
> I was wondering whether I can apply field grouping on HDFS bolt, but I am
> not sure how to do it while emitting the data from file. Is there a way I
> can do it. Or any other solution which would accomplish this requirement.
>
> Thanks,
> Praveen
>