You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Sergio Fernández <wi...@apache.org> on 2015/05/22 17:51:36 UTC

multilingual tuples via kafka

Hi,

I'm experimenting on feeding the KafkaSpout from another language different
than Jaba, but I guess I have conceptual error...

>From Python I'm sending two values:

producer.send_messages("test", "val1", "val2")

But when from a Java bolt I try to handle it:

execute(Tuple input) {
  String val1 = input.getString(0);
  String val2 = input.getString(1);
  ...
}

I'm getting a IndexOutOfBoundsException: Index: 1, Size: 1.

I'd appreciate any advise how to correctly send tuples.

Thanks!


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: multilingual tuples via kafka

Posted by Sergio Fernández <se...@redlink.co>.
Perfect. Thanks, Taylor. That explains the basics.

So now I'm taking the string and parsing it as json. What should be the
best practice to do it directly in a scheme?

Cheers,

On Tue, May 26, 2015 at 6:12 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> The data coming from Kafka to the Kafka spout is just a byte array
> containing the raw data. To consume it, you need to define a `Scheme`
> implementation that knows how to parse the byte array to produce tuples.
>
> For example, the `StringScheme` class included in storm-kafka just
> converts the byte array to a string and puts that value in the tuple with
> the key “str”:
>
>
> https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/StringScheme.java
>
> -Taylor
>
> On May 22, 2015, at 11:51 AM, Sergio Fernández <wi...@apache.org> wrote:
>
> Hi,
>
> I'm experimenting on feeding the KafkaSpout from another language
> different than Jaba, but I guess I have conceptual error...
>
> From Python I'm sending two values:
>
> producer.send_messages("test", "val1", "val2")
>
> But when from a Java bolt I try to handle it:
>
> execute(Tuple input) {
>   String val1 = input.getString(0);
>   String val2 = input.getString(1);
>   ...
> }
>
> I'm getting a IndexOutOfBoundsException: Index: 1, Size: 1.
>
> I'd appreciate any advise how to correctly send tuples.
>
> Thanks!
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co
>
>
>


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: multilingual tuples via kafka

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
The data coming from Kafka to the Kafka spout is just a byte array containing the raw data. To consume it, you need to define a `Scheme` implementation that knows how to parse the byte array to produce tuples.

For example, the `StringScheme` class included in storm-kafka just converts the byte array to a string and puts that value in the tuple with the key “str”:

https://github.com/apache/storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/StringScheme.java

-Taylor

On May 22, 2015, at 11:51 AM, Sergio Fernández <wi...@apache.org> wrote:

> Hi,
> 
> I'm experimenting on feeding the KafkaSpout from another language different than Jaba, but I guess I have conceptual error...
> 
> From Python I'm sending two values:
> 
> producer.send_messages("test", "val1", "val2")
> 
> But when from a Java bolt I try to handle it:
> 
> execute(Tuple input) {
>   String val1 = input.getString(0);
>   String val2 = input.getString(1);
>   ...
> }
> 
> I'm getting a IndexOutOfBoundsException: Index: 1, Size: 1.
> 
> I'd appreciate any advise how to correctly send tuples. 
> 
> Thanks!
> 
> 
> -- 
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co


Re: multilingual tuples via kafka

Posted by Sergio Fernández <se...@redlink.co>.
Hi,

thanks for your reply.

On Mon, May 25, 2015 at 6:23 AM, Anishek Agarwal <an...@gmail.com> wrote:

> just to understand,
>
>  you have a producer in python sending messages to kafka ? if yes i think
> each of the values "test", "val1", "val2" form a separate message and hence
> would come in to storm as a separate tupple. If you want to send them as a
> single value may be send in a array :
>
> producer.send_messages(["test", "val1", "val2"])
>

In Python the Producer api is:

  send_messages(topic, *msg)

I guess I was confused with type tuple in Python, and the class Tuple in
Java, but know I know they actually represent different things on a message.


> and then you are trying to consume them in java in storm bolt? -- to
> consume the above you can try to first explicitly cast the tuple object
> using getValues, if array serialization is available by default, it will
> work, if not i think you might have to implement a custom serializer for
> array.
>

Well, is that the best practices? Tuple.getValues() returns a List<Object>.
I'd like to be much cleanner on avoiding such castings.

Working more on that issue, since the message is a (key, value) tuple, I'm
thinking to switch to the KeyedProducer in Python. But now I need to figure
out how that is later represented in a Java Tuple...


> or rather just send in a json object and read it as a json.
>

I have to admit I dislike the current trend of using JSON for all purposes.
I could agree that is a good format when the business comes closer to the
browser, but for this purpose I'd rather try to use Protobuf ot Thrift. Any
advice to that respect would be more than welcomed.

Thanks.

Cheers,


On Fri, May 22, 2015 at 9:21 PM, Sergio Fernández <wi...@apache.org> wrote:
>
>> Hi,
>>
>> I'm experimenting on feeding the KafkaSpout from another language
>> different than Jaba, but I guess I have conceptual error...
>>
>> From Python I'm sending two values:
>>
>> producer.send_messages("test", "val1", "val2")
>>
>> But when from a Java bolt I try to handle it:
>>
>> execute(Tuple input) {
>>   String val1 = input.getString(0);
>>   String val2 = input.getString(1);
>>   ...
>> }
>>
>> I'm getting a IndexOutOfBoundsException: Index: 1, Size: 1.
>>
>> I'd appreciate any advise how to correctly send tuples.
>>
>> Thanks!
>>
>>
>> --
>> Sergio Fernández
>> Partner Technology Manager
>> Redlink GmbH
>> m: +43 6602747925
>> e: sergio.fernandez@redlink.co
>> w: http://redlink.co
>>
>
>


-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: sergio.fernandez@redlink.co
w: http://redlink.co

Re: multilingual tuples via kafka

Posted by Anishek Agarwal <an...@gmail.com>.
just to understand,

 you have a producer in python sending messages to kafka ? if yes i think
each of the values "test", "val1", "val2" form a separate message and hence
would come in to storm as a separate tupple. If you want to send them as a
single value may be send in a array :

producer.send_messages(["test", "val1", "val2"])

and then you are trying to consume them in java in storm bolt? -- to
consume the above you can try to first explicitly cast the tuple object
using getValues, if array serialization is available by default, it will
work, if not i think you might have to implement a custom serializer for
array.

or rather just send in a json object and read it as a json.

hope that helps.





On Fri, May 22, 2015 at 9:21 PM, Sergio Fernández <wi...@apache.org> wrote:

> Hi,
>
> I'm experimenting on feeding the KafkaSpout from another language
> different than Jaba, but I guess I have conceptual error...
>
> From Python I'm sending two values:
>
> producer.send_messages("test", "val1", "val2")
>
> But when from a Java bolt I try to handle it:
>
> execute(Tuple input) {
>   String val1 = input.getString(0);
>   String val2 = input.getString(1);
>   ...
> }
>
> I'm getting a IndexOutOfBoundsException: Index: 1, Size: 1.
>
> I'd appreciate any advise how to correctly send tuples.
>
> Thanks!
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co
>