You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by sam mohel <sa...@gmail.com> on 2017/04/04 08:49:40 UTC
python with storm
I need some help from you in this problem . I read that spout is
responsible for reading data or preparing it for processing in Bolt . so i
wrote some code in spout to open the file and read line by line
class SimSpout(storm.Spout):
# Not much to do here for such a basic spout
def initialize(self, conf, context):
## Open the file with read only permit
self.f = open('data.txt', 'r')
## Read the first line
self._conf = conf
self._context = context
storm.logInfo("Spout instance starting...")
# Process the next tuple
def nextTuple(self):
# check if it reach at the EOF to close it
for line in self.f.readlines():
# Emit a random sentence
storm.logInfo("Emiting %s" % line)
storm.emit([line])
# Start the spout when it's invoked
SimSpout().run()
Is that right ?
The actual problem with me now , How can i make Bolt take each line from
spout to make the processing on it as the processing on it is to read from
another file some calculations to compute the vector of each word
Re: python with storm
Posted by Dmitry Semenov <dm...@saritasa.com>.
Sam,
Bolts are taking data you emit from your spout (or from other bolt) and
then do what you need (persist data in db, aggregate etc).
In your case - you have a spout which emits sentences, you need to create
another bolt that split the sentence in words and emit each word as a tuple.
Then you should have another bolt - that gets the word as tuple from the
previous bolt and does your processing.
Use fieldsGrouping for your word processing task in topology and
shuffleGrouping for your split sentence bolt.
*I highly recommend Petrel library for python*
https://github.com/AirSage/Petrel
*Take a look at the sample that is very similar to your own task*
https://github.com/AirSage/Petrel/tree/master/samples/wordcount
*topology is defined here*
https://github.com/AirSage/Petrel/blob/master/samples/wordcount/create.py
*If you want to use Python/Storm (with Petrel) read this book*
https://www.packtpub.com/big-data-and-business-intelligence/building-python-real-time-applications-storm
Thanks,
Dmitry
On Tue, Apr 4, 2017 at 1:49 AM, sam mohel <sa...@gmail.com> wrote:
> I need some help from you in this problem . I read that spout is
> responsible for reading data or preparing it for processing in Bolt . so i
> wrote some code in spout to open the file and read line by line
>
> class SimSpout(storm.Spout):
> # Not much to do here for such a basic spout
> def initialize(self, conf, context):
> ## Open the file with read only permit
> self.f = open('data.txt', 'r')
> ## Read the first line
> self._conf = conf
> self._context = context
> storm.logInfo("Spout instance starting...")
> # Process the next tuple
> def nextTuple(self):
> # check if it reach at the EOF to close it
> for line in self.f.readlines():
> # Emit a random sentence
> storm.logInfo("Emiting %s" % line)
> storm.emit([line])
>
> # Start the spout when it's invoked
> SimSpout().run()
>
>
> Is that right ?
> The actual problem with me now , How can i make Bolt take each line from
> spout to make the processing on it as the processing on it is to read from
> another file some calculations to compute the vector of each word
>
--
------------------------------
<http://www.saritasa.com/>
Dmitry Semenov
dmitry@saritasa.com | 949.200.6839 | www.saritasa.com
20411 Birch St., Suite 330, Newport Beach, CA 92660