You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Ashu Goel <As...@shopkick.com> on 2014/06/02 18:37:43 UTC

Writing Bolts in Python

Hi all,

I am experimenting with writing bolts in Python and was wondering how the relationship between the Java and Python code works. For example, I have a Python bolt that looks like this:

class ScanCountBolt(storm.BasicBolt):
  
  def __init__(self):
    #super(ScanCountBolt, self).__init__(script='scancount.py')
    self._count = defaultdict(int)

  def process(self, tup):
    product = tup.values[0]
    self._count[product] += 1
    storm.emit([product, self._count[product]])

ScanCountBolt().run()


And my corresponding Java code looks like this:

 public static class ScanCount extends ShellBolt implements IRichBolt {

    public ScanCount() {
      super("python", "scancount.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
      declarer.declare(new Fields("product", "scans"));
    }

    @Override
     public Map<String, Object> getComponentConfiguration() {
       return null;
     }
  }

Is that all I need to make it work or do I need to declare the data structures in the Java code as well. I am a bit confused…

-Ashu

Re: Writing Bolts in Python

Posted by Andrew Montalenti <an...@parsely.com>.
The ShellBolt looks for "scancount.py" in the resources/ directory in your
JAR, which will be extracted to each worker machine. It then simply invokes
"python scancount.py" in that directory. So you need to make sure the
scancount.py file will be on the classpath under resources/, as well the
storm.py interop library it depends upon.

Based on the official word count example
<https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java#L40-L56>,
your Java ShellBolt definition looks OK.

The storm.py interop library that you're probably using then communicates
with the rest of Storm via the Multi-lang Protocol
<https://storm.incubator.apache.org/documentation/Multilang-protocol.html>.
This means your Python process is really sending JSON messages over stdout
and receiving JSON messages over stdin. That's the relationship between
Python & Java (& Storm) in this case.

A library I'm working on with my team, streamparse
<https://github.com/Parsely/streamparse>, makes this workflow easier by
bundling upon a command-line tool for building/submitting/running Python
topologies. For example, getting a Storm + Python "wordcount" example to
run locally is just a matter of:

    sparse quickstart wordcount
    cd wordcount
    sparse run

It also eliminates the need to write the Java glue code you're putting
together here. It's still in early development but we're already using it
for real Storm 0.8 and 0.9 production clusters & local development.

---
Andrew Montalenti
Co-Founder & CTO
http://parse.ly



On Mon, Jun 2, 2014 at 12:37 PM, Ashu Goel <As...@shopkick.com> wrote:

> Hi all,
>
> I am experimenting with writing bolts in Python and was wondering how the
> relationship between the Java and Python code works. For example, I have a
> Python bolt that looks like this:
>
> class ScanCountBolt(storm.BasicBolt):
>
>   def __init__(self):
>     #super(ScanCountBolt, self).__init__(script='scancount.py')
>     self._count = defaultdict(int)
>
>   def process(self, tup):
>     product = tup.values[0]
>     self._count[product] += 1
>     storm.emit([product, self._count[product]])
>
> ScanCountBolt().run()
>
>
> And my corresponding Java code looks like this:
>
>  public static class ScanCount extends ShellBolt implements IRichBolt {
>
>     public ScanCount() {
>       super("python", "scancount.py");
>     }
>
>     @Override
>     public void declareOutputFields(OutputFieldsDeclarer declarer) {
>       declarer.declare(new Fields("product", "scans"));
>     }
>
>     @Override
>      public Map<String, Object> getComponentConfiguration() {
>        return null;
>      }
>   }
>
> Is that all I need to make it work or do I need to declare the data
> structures in the Java code as well. I am a bit confused...
>
> -Ashu