You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by sam mohel <sa...@gmail.com> on 2017/04/04 08:49:40 UTC

python with storm

I need some help from you in this problem . I read that spout is
responsible for reading data or preparing it for processing in Bolt . so i
wrote some code in spout to open the file and read line by line

class SimSpout(storm.Spout):
    # Not much to do here for such a basic spout
    def initialize(self, conf, context):
    ## Open the file with read only permit
        self.f = open('data.txt', 'r')
    ## Read the first line
        self._conf = conf
        self._context = context
        storm.logInfo("Spout instance starting...")
    # Process the next tuple
    def nextTuple(self):
        # check if it reach at the EOF to close it
      for line in self.f.readlines():
        # Emit a random sentence
        storm.logInfo("Emiting %s" % line)
        storm.emit([line])

# Start the spout when it's invoked
SimSpout().run()


Is that right ?
The actual problem with me now , How can i make Bolt take each line from
spout to make the processing on it as the processing on it is to read from
another file some calculations to compute the vector of each word

Re: python with storm

Posted by Dmitry Semenov <dm...@saritasa.com>.
Sam,

Bolts are taking data you emit from your spout (or from other bolt) and
then do what you need (persist data in db, aggregate etc).

In your case - you have a spout which emits sentences, you need to create
another bolt that split the sentence in words and emit each word as a tuple.

Then you should have another bolt - that gets the word as tuple from the
previous bolt and does your processing.

Use fieldsGrouping for your word processing task in topology and
shuffleGrouping for your split sentence bolt.

*I highly recommend Petrel library for python*
https://github.com/AirSage/Petrel

*Take a look at the sample that is very similar to your own task*
https://github.com/AirSage/Petrel/tree/master/samples/wordcount

*topology is defined here*
https://github.com/AirSage/Petrel/blob/master/samples/wordcount/create.py

*If you want to use Python/Storm (with Petrel) read this book*
https://www.packtpub.com/big-data-and-business-intelligence/building-python-real-time-applications-storm

Thanks,
Dmitry
​

On Tue, Apr 4, 2017 at 1:49 AM, sam mohel <sa...@gmail.com> wrote:

> I need some help from you in this problem . I read that spout is
> responsible for reading data or preparing it for processing in Bolt . so i
> wrote some code in spout to open the file and read line by line
>
> class SimSpout(storm.Spout):
>     # Not much to do here for such a basic spout
>     def initialize(self, conf, context):
>     ## Open the file with read only permit
>         self.f = open('data.txt', 'r')
>     ## Read the first line
>         self._conf = conf
>         self._context = context
>         storm.logInfo("Spout instance starting...")
>     # Process the next tuple
>     def nextTuple(self):
>         # check if it reach at the EOF to close it
>       for line in self.f.readlines():
>         # Emit a random sentence
>         storm.logInfo("Emiting %s" % line)
>         storm.emit([line])
>
> # Start the spout when it's invoked
> SimSpout().run()
>
>
> Is that right ?
> The actual problem with me now , How can i make Bolt take each line from
> spout to make the processing on it as the processing on it is to read from
> another file some calculations to compute the vector of each word
>



-- 
------------------------------
<http://www.saritasa.com/>
Dmitry Semenov
dmitry@saritasa.com | 949.200.6839 | www.saritasa.com
20411 Birch St., Suite 330, Newport Beach, CA 92660