You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Raphael Hsieh <ra...@gmail.com> on 2014/05/28 00:01:56 UTC

Position in Kafka Stream

Is there a way to tell where in the kafka stream my topology is starting
from?
>From my understanding Storm will use zookeeper in order to tell its place
in the Kafka stream. Where can I find metrics on this ?
How can I see how large the stream is? What how much data is sitting in the
stream and what the most recent/oldest position is?

Thanks

-- 
Raphael Hsieh

Re: Position in Kafka Stream

Posted by Tyson Norris <tn...@adobe.com>.

Great thanks. 

I haven’t used LoggingMetrics so will take a look at the output there. 

We use scout for monitoring (via JMX plugin for java apps), so I think it should be possible to implement a JMX based impl of IMetricsConsumer, for exposing storm metrics via JMX

Thanks
Tyson

On May 27, 2014, at 10:11 PM, Harsha <st...@harsha.io> wrote:

> Hi Tyson,
>         Yes kafka trident has offset metric and kafkaFetchAvg,
>         kafkaFetchMax
> https://github.com/apache/incubator-storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/trident/TridentKafkaEmitter.java#L64
> -Harsha
> 
> On Tue, May 27, 2014, at 06:55 PM, Tyson Norris wrote:
>> Do Trident variants of kafka spouts do something similar?
>> Thanks
>> Tyson
>> 
>>> On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>>> 
>>> Raphael,
>>>       kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id you can look at kafka offsets using 
>>> kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>>> -Harsha
>>> 
>>> 
>>>> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>>>> Is there a way to tell where in the kafka stream my topology is starting from?
>>>> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
>>>> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
>>>> 
>>>> Thanks
>>>> 
>>>> -- 
>>>> Raphael Hsieh

Re: Position in Kafka Stream

Posted by Harsha <st...@harsha.io>.

Hi Tyson,
         Yes kafka trident has offset metric and kafkaFetchAvg,
         kafkaFetchMax
https://github.com/apache/incubator-storm/blob/master/external/storm-kafka/src/jvm/storm/kafka/trident/TridentKafkaEmitter.java#L64
-Harsha

On Tue, May 27, 2014, at 06:55 PM, Tyson Norris wrote:
> Do Trident variants of kafka spouts do something similar?
> Thanks
> Tyson
> 
> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
> > 
> > Raphael,
> >        kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id you can look at kafka offsets using 
> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> > -Harsha
> >  
> >  
> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
> >> Is there a way to tell where in the kafka stream my topology is starting from?
> >> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
> >> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
> >>  
> >> Thanks
> >>  
> >> -- 
> >> Raphael Hsieh

Re: Position in Kafka Stream

Posted by Raphael Hsieh <ra...@gmail.com>.

Thanks Tyson!
This blog is super helpful.
I've been able to get LoggingMetrics working to an extent, however if I try
to create multiple CountMetrics in the same function, I only see one show
up in my NimbusUI. Does anybody know why this is ?


On Thu, May 29, 2014 at 8:57 AM, Tyson Norris <tn...@adobe.com> wrote:

>  I found this blog helpful:
> http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to
>
>  Best regards,
> Tyson
>
>  On May 29, 2014, at 8:41 AM, Raphael Hsieh <ra...@gmail.com> wrote:
>
>  Can someone explain to me what LoggingMetrics is ?
> I've heard of it and people have told me to use it, but I can't find any
> documentation on it or any resources on how to use it.
>
> Thanks
>
>
> On Thu, May 29, 2014 at 12:06 AM, Tyson Norris <tn...@adobe.com> wrote:
>
>> Hi -
>> Thanks - it turns out that the JSON parsing is actually fine with HEAD,
>> although inaccurate without the required message format (comments mention
>> expecting an “s” property with timestamp value).
>>
>>  My problem was that I was not specifying the spout root properly,
>> i.e. --spoutroot /transactional/<spout id>/user/  (in my case I had
>> specified a path that was valid, but not a spout)
>>
>>  Now I get offset info properly via monitor.py - Thanks!
>>
>>  Tyson
>>
>>  On May 28, 2014, at 10:12 AM, Cody A. Ray <co...@gmail.com> wrote:
>>
>>  Right, its trying to read your kafka messages and parse as JSON.  See
>> the error:
>>
>>  simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1
>> (char 0)
>>
>>  If you want to use the BrightTag branch, you'll need to go a couple
>> commits back. Try this:
>>
>>  git clone https://github.com/BrightTag/stormkafkamon
>>  git checkout 07eede9ec72329fe2cad893d087541b583e11148
>>
>>  -Cody
>>
>>
>> On Wed, May 28, 2014 at 10:39 AM, Tyson Norris <tn...@adobe.com> wrote:
>>
>>> Thanks Cody -
>>> I tried the BrightTag fork and still have problems with
>>> storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident
>>> topology (haven’t tried non-trident yet):
>>> (venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology
>>> TrendingTagTopology --spoutroot storm --friendly
>>> Traceback (most recent call last):
>>>   File "./monitor.py", line 112, in <module>
>>>     sys.exit(main())
>>>   File "./monitor.py", line 96, in main
>>>     zk_data = process(zc.spouts(options.spoutroot, options.topology))
>>>   File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76,
>>> in spouts
>>>     j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
>>>   File
>>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py",
>>> line 501, in loads
>>>     return _default_decoder.decode(s)
>>>   File
>>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
>>> line 370, in decode
>>>     obj, end = self.raw_decode(s)
>>>   File
>>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
>>> line 389, in raw_decode
>>>     return self.scan_once(s, idx=_w(s, idx).end())
>>> simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1
>>> (char 0)
>>> (venv)tnorris-osx:stormkafkamon tnorris$
>>>
>>>  I’m not too familiar with python but will try to debug it as time
>>> allows - let me know if you have advice.
>>>
>>>  Thanks
>>>  Tyson
>>>
>>>
>>>
>>>
>>>  On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com> wrote:
>>>
>>>  You can also use stormkafkamon to track this stuff. Its not good for
>>> historical analysis like graphite/ganglia, but its good if you just want to
>>> see how things currently stand.
>>>
>>>  The original: https://github.com/otoolep/stormkafkamon
>>>
>>>  This didn't work for us without some updates (incompatibility with the
>>> latest python-kafka dep). Here are those updates:
>>> https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148
>>>
>>>  (Our branch has a couple more things that parse the kafka messages
>>> with our format (which embeds a timestamp) to determine how long (in time)
>>> storm is behind... planning to clean that up soon so it can be a bit more
>>> reusable)
>>>
>>>  https://github.com/BrightTag/stormkafkamon
>>>
>>>  -Cody
>>>
>>>
>>> On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <
>>> danijel@schiavuzzi.com> wrote:
>>>
>>>> Yes, Trident Kafka spouts give you the same metrics. Take a look at the
>>>> code to find out what's available.
>>>>
>>>>
>>>> On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com>
>>>> wrote:
>>>>
>>>>> Do Trident variants of kafka spouts do something similar?
>>>>> Thanks
>>>>> Tyson
>>>>>
>>>>> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>>>>> >
>>>>> > Raphael,
>>>>> >        kafka spout sends metrics for kafkaOffset and kafkaPartition
>>>>> you can look at those by using LoggingMetrics or setting up a ganglia.
>>>>> Kafka uses its own zookeeper to store state info per topic & group.id
>>>>> you can look at kafka offsets using
>>>>> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>>>>> > -Harsha
>>>>> >
>>>>> >
>>>>> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>>>>> >> Is there a way to tell where in the kafka stream my topology is
>>>>> starting from?
>>>>> >> From my understanding Storm will use zookeeper in order to tell its
>>>>> place in the Kafka stream. Where can I find metrics on this ?
>>>>> >> How can I see how large the stream is? What how much data is
>>>>> sitting in the stream and what the most recent/oldest position is?
>>>>> >>
>>>>> >> Thanks
>>>>> >>
>>>>> >> --
>>>>> >> Raphael Hsieh
>>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> Danijel Schiavuzzi
>>>>
>>>> E: danijel@schiavuzzi.com
>>>> W: www.schiavuzzi.com
>>>> T: +385989035562
>>>> Skype: danijels7
>>>>
>>>
>>>
>>>
>>>  --
>>>  Cody A. Ray, LEED AP
>>> cody.a.ray@gmail.com
>>> 215.501.7891
>>>
>>>
>>>
>>
>>
>>  --
>>  Cody A. Ray, LEED AP
>> cody.a.ray@gmail.com
>> 215.501.7891
>>
>>
>>
>
>
>  --
> Raphael Hsieh
>
>
>
>
>
>


-- 
Raphael Hsieh

Re: Position in Kafka Stream

Posted by Tyson Norris <tn...@adobe.com>.

I found this blog helpful:
http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to

Best regards,
Tyson

On May 29, 2014, at 8:41 AM, Raphael Hsieh <ra...@gmail.com>> wrote:

Can someone explain to me what LoggingMetrics is ?
I've heard of it and people have told me to use it, but I can't find any documentation on it or any resources on how to use it.

Thanks

On Thu, May 29, 2014 at 12:06 AM, Tyson Norris <tn...@adobe.com>> wrote:
Hi -
Thanks - it turns out that the JSON parsing is actually fine with HEAD, although inaccurate without the required message format (comments mention expecting an “s” property with timestamp value).

My problem was that I was not specifying the spout root properly, i.e. --spoutroot /transactional/<spout id>/user/  (in my case I had specified a path that was valid, but not a spout)

Now I get offset info properly via monitor.py - Thanks!

Tyson

On May 28, 2014, at 10:12 AM, Cody A. Ray <co...@gmail.com>> wrote:

Right, its trying to read your kafka messages and parse as JSON.  See the error:

simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If you want to use the BrightTag branch, you'll need to go a couple commits back. Try this:

git clone https://github.com/BrightTag/stormkafkamon
git checkout 07eede9ec72329fe2cad893d087541b583e11148

-Cody

On Wed, May 28, 2014 at 10:39 AM, Tyson Norris <tn...@adobe.com>> wrote:
Thanks Cody -
I tried the BrightTag fork and still have problems with storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident topology (haven’t tried non-trident yet):
(venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology TrendingTagTopology --spoutroot storm --friendly
Traceback (most recent call last):
  File "./monitor.py", line 112, in <module>
    sys.exit(main())
  File "./monitor.py", line 96, in main
    zk_data = process(zc.spouts(options.spoutroot, options.topology))
  File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in spouts
    j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py", line 501, in loads
    return _default_decoder.decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 389, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(venv)tnorris-osx:stormkafkamon tnorris$

I’m not too familiar with python but will try to debug it as time allows - let me know if you have advice.

Thanks
Tyson

On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com>> wrote:

You can also use stormkafkamon to track this stuff. Its not good for historical analysis like graphite/ganglia, but its good if you just want to see how things currently stand.

The original: https://github.com/otoolep/stormkafkamon

This didn't work for us without some updates (incompatibility with the latest python-kafka dep). Here are those updates: https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148

(Our branch has a couple more things that parse the kafka messages with our format (which embeds a timestamp) to determine how long (in time) storm is behind... planning to clean that up soon so it can be a bit more reusable)

https://github.com/BrightTag/stormkafkamon

-Cody

On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <da...@schiavuzzi.com>> wrote:
Yes, Trident Kafka spouts give you the same metrics. Take a look at the code to find out what's available.

On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com>> wrote:
Do Trident variants of kafka spouts do something similar?
Thanks
Tyson

> On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io>> wrote:
>
> Raphael,
>        kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id<http://group.id/> you can look at kafka offsets using
> kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> -Harsha
>
>
>> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>> Is there a way to tell where in the kafka stream my topology is starting from?
>> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
>> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
>>
>> Thanks
>>
>> --
>> Raphael Hsieh

--
Danijel Schiavuzzi

E: danijel@schiavuzzi.com<ma...@schiavuzzi.com>
W: www.schiavuzzi.com<http://www.schiavuzzi.com/>
T: +385989035562
Skype: danijels7

--
Cody A. Ray, LEED AP
cody.a.ray@gmail.com<ma...@gmail.com>
215.501.7891<tel:215.501.7891>

--
Cody A. Ray, LEED AP
cody.a.ray@gmail.com<ma...@gmail.com>
215.501.7891<tel:215.501.7891>

--
Raphael Hsieh

Re: Position in Kafka Stream

Posted by Raphael Hsieh <ra...@gmail.com>.

Can someone explain to me what LoggingMetrics is ?
I've heard of it and people have told me to use it, but I can't find any
documentation on it or any resources on how to use it.

Thanks


On Thu, May 29, 2014 at 12:06 AM, Tyson Norris <tn...@adobe.com> wrote:

>  Hi -
> Thanks - it turns out that the JSON parsing is actually fine with HEAD,
> although inaccurate without the required message format (comments mention
> expecting an “s” property with timestamp value).
>
>  My problem was that I was not specifying the spout root properly,
> i.e. --spoutroot /transactional/<spout id>/user/  (in my case I had
> specified a path that was valid, but not a spout)
>
>  Now I get offset info properly via monitor.py - Thanks!
>
>  Tyson
>
>  On May 28, 2014, at 10:12 AM, Cody A. Ray <co...@gmail.com> wrote:
>
>  Right, its trying to read your kafka messages and parse as JSON.  See
> the error:
>
>  simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1
> (char 0)
>
>  If you want to use the BrightTag branch, you'll need to go a couple
> commits back. Try this:
>
>  git clone https://github.com/BrightTag/stormkafkamon
>  git checkout 07eede9ec72329fe2cad893d087541b583e11148
>
>  -Cody
>
>
> On Wed, May 28, 2014 at 10:39 AM, Tyson Norris <tn...@adobe.com> wrote:
>
>> Thanks Cody -
>> I tried the BrightTag fork and still have problems with
>> storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident
>> topology (haven’t tried non-trident yet):
>> (venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology
>> TrendingTagTopology --spoutroot storm --friendly
>> Traceback (most recent call last):
>>   File "./monitor.py", line 112, in <module>
>>     sys.exit(main())
>>   File "./monitor.py", line 96, in main
>>     zk_data = process(zc.spouts(options.spoutroot, options.topology))
>>   File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in
>> spouts
>>     j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
>>   File
>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py",
>> line 501, in loads
>>     return _default_decoder.decode(s)
>>   File
>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
>> line 370, in decode
>>     obj, end = self.raw_decode(s)
>>   File
>> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
>> line 389, in raw_decode
>>     return self.scan_once(s, idx=_w(s, idx).end())
>> simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1
>> (char 0)
>> (venv)tnorris-osx:stormkafkamon tnorris$
>>
>>  I’m not too familiar with python but will try to debug it as time
>> allows - let me know if you have advice.
>>
>>  Thanks
>>  Tyson
>>
>>
>>
>>
>>  On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com> wrote:
>>
>>  You can also use stormkafkamon to track this stuff. Its not good for
>> historical analysis like graphite/ganglia, but its good if you just want to
>> see how things currently stand.
>>
>>  The original: https://github.com/otoolep/stormkafkamon
>>
>>  This didn't work for us without some updates (incompatibility with the
>> latest python-kafka dep). Here are those updates:
>> https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148
>>
>>  (Our branch has a couple more things that parse the kafka messages with
>> our format (which embeds a timestamp) to determine how long (in time) storm
>> is behind... planning to clean that up soon so it can be a bit more
>> reusable)
>>
>>  https://github.com/BrightTag/stormkafkamon
>>
>>  -Cody
>>
>>
>> On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <
>> danijel@schiavuzzi.com> wrote:
>>
>>> Yes, Trident Kafka spouts give you the same metrics. Take a look at the
>>> code to find out what's available.
>>>
>>>
>>> On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com> wrote:
>>>
>>>> Do Trident variants of kafka spouts do something similar?
>>>> Thanks
>>>> Tyson
>>>>
>>>> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>>>> >
>>>> > Raphael,
>>>> >        kafka spout sends metrics for kafkaOffset and kafkaPartition
>>>> you can look at those by using LoggingMetrics or setting up a ganglia.
>>>> Kafka uses its own zookeeper to store state info per topic & group.id
>>>> you can look at kafka offsets using
>>>> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>>>> > -Harsha
>>>> >
>>>> >
>>>> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>>>> >> Is there a way to tell where in the kafka stream my topology is
>>>> starting from?
>>>> >> From my understanding Storm will use zookeeper in order to tell its
>>>> place in the Kafka stream. Where can I find metrics on this ?
>>>> >> How can I see how large the stream is? What how much data is sitting
>>>> in the stream and what the most recent/oldest position is?
>>>> >>
>>>> >> Thanks
>>>> >>
>>>> >> --
>>>> >> Raphael Hsieh
>>>>
>>>
>>>
>>>
>>>  --
>>> Danijel Schiavuzzi
>>>
>>> E: danijel@schiavuzzi.com
>>> W: www.schiavuzzi.com
>>> T: +385989035562
>>> Skype: danijels7
>>>
>>
>>
>>
>>  --
>>  Cody A. Ray, LEED AP
>> cody.a.ray@gmail.com
>> 215.501.7891
>>
>>
>>
>
>
>  --
>  Cody A. Ray, LEED AP
> cody.a.ray@gmail.com
> 215.501.7891
>
>
>


-- 
Raphael Hsieh

Re: Position in Kafka Stream

Posted by Tyson Norris <tn...@adobe.com>.

Hi -
Thanks - it turns out that the JSON parsing is actually fine with HEAD, although inaccurate without the required message format (comments mention expecting an “s” property with timestamp value).

My problem was that I was not specifying the spout root properly, i.e. --spoutroot /transactional/<spout id>/user/  (in my case I had specified a path that was valid, but not a spout)

Now I get offset info properly via monitor.py - Thanks!

Tyson

On May 28, 2014, at 10:12 AM, Cody A. Ray <co...@gmail.com>> wrote:

Right, its trying to read your kafka messages and parse as JSON.  See the error:

simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If you want to use the BrightTag branch, you'll need to go a couple commits back. Try this:

git clone https://github.com/BrightTag/stormkafkamon
git checkout 07eede9ec72329fe2cad893d087541b583e11148

-Cody

On Wed, May 28, 2014 at 10:39 AM, Tyson Norris <tn...@adobe.com>> wrote:
Thanks Cody -
I tried the BrightTag fork and still have problems with storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident topology (haven’t tried non-trident yet):
(venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology TrendingTagTopology --spoutroot storm --friendly
Traceback (most recent call last):
  File "./monitor.py", line 112, in <module>
    sys.exit(main())
  File "./monitor.py", line 96, in main
    zk_data = process(zc.spouts(options.spoutroot, options.topology))
  File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in spouts
    j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py", line 501, in loads
    return _default_decoder.decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 389, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(venv)tnorris-osx:stormkafkamon tnorris$

I’m not too familiar with python but will try to debug it as time allows - let me know if you have advice.

Thanks
Tyson

On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com>> wrote:

You can also use stormkafkamon to track this stuff. Its not good for historical analysis like graphite/ganglia, but its good if you just want to see how things currently stand.

The original: https://github.com/otoolep/stormkafkamon

This didn't work for us without some updates (incompatibility with the latest python-kafka dep). Here are those updates: https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148

(Our branch has a couple more things that parse the kafka messages with our format (which embeds a timestamp) to determine how long (in time) storm is behind... planning to clean that up soon so it can be a bit more reusable)

https://github.com/BrightTag/stormkafkamon

-Cody

On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <da...@schiavuzzi.com>> wrote:
Yes, Trident Kafka spouts give you the same metrics. Take a look at the code to find out what's available.

On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com>> wrote:
Do Trident variants of kafka spouts do something similar?
Thanks
Tyson

> On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io>> wrote:
>
> Raphael,
>        kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id<http://group.id/> you can look at kafka offsets using
> kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> -Harsha
>
>
>> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>> Is there a way to tell where in the kafka stream my topology is starting from?
>> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
>> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
>>
>> Thanks
>>
>> --
>> Raphael Hsieh

--
Danijel Schiavuzzi

E: danijel@schiavuzzi.com<ma...@schiavuzzi.com>
W: www.schiavuzzi.com<http://www.schiavuzzi.com/>
T: +385989035562
Skype: danijels7

--
Cody A. Ray, LEED AP
cody.a.ray@gmail.com<ma...@gmail.com>
215.501.7891<tel:215.501.7891>

--
Cody A. Ray, LEED AP
cody.a.ray@gmail.com<ma...@gmail.com>
215.501.7891

Re: Position in Kafka Stream

Posted by "Cody A. Ray" <co...@gmail.com>.

Right, its trying to read your kafka messages and parse as JSON.  See the
error:

simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char
0)

If you want to use the BrightTag branch, you'll need to go a couple commits
back. Try this:

git clone https://github.com/BrightTag/stormkafkamon
git checkout 07eede9ec72329fe2cad893d087541b583e11148

-Cody


On Wed, May 28, 2014 at 10:39 AM, Tyson Norris <tn...@adobe.com> wrote:

>  Thanks Cody -
> I tried the BrightTag fork and still have problems with
> storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident
> topology (haven’t tried non-trident yet):
> (venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology
> TrendingTagTopology --spoutroot storm --friendly
> Traceback (most recent call last):
>   File "./monitor.py", line 112, in <module>
>     sys.exit(main())
>   File "./monitor.py", line 96, in main
>     zk_data = process(zc.spouts(options.spoutroot, options.topology))
>   File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in
> spouts
>     j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py",
> line 501, in loads
>     return _default_decoder.decode(s)
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
> line 370, in decode
>     obj, end = self.raw_decode(s)
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
> line 389, in raw_decode
>     return self.scan_once(s, idx=_w(s, idx).end())
> simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char
> 0)
> (venv)tnorris-osx:stormkafkamon tnorris$
>
>  I’m not too familiar with python but will try to debug it as time allows
> - let me know if you have advice.
>
>  Thanks
> Tyson
>
>
>
>
>  On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com> wrote:
>
>  You can also use stormkafkamon to track this stuff. Its not good for
> historical analysis like graphite/ganglia, but its good if you just want to
> see how things currently stand.
>
>  The original: https://github.com/otoolep/stormkafkamon
>
>  This didn't work for us without some updates (incompatibility with the
> latest python-kafka dep). Here are those updates:
> https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148
>
>  (Our branch has a couple more things that parse the kafka messages with
> our format (which embeds a timestamp) to determine how long (in time) storm
> is behind... planning to clean that up soon so it can be a bit more
> reusable)
>
>  https://github.com/BrightTag/stormkafkamon
>
>  -Cody
>
>
> On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <
> danijel@schiavuzzi.com> wrote:
>
>> Yes, Trident Kafka spouts give you the same metrics. Take a look at the
>> code to find out what's available.
>>
>>
>> On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com> wrote:
>>
>>> Do Trident variants of kafka spouts do something similar?
>>> Thanks
>>> Tyson
>>>
>>> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>>> >
>>> > Raphael,
>>> >        kafka spout sends metrics for kafkaOffset and kafkaPartition
>>> you can look at those by using LoggingMetrics or setting up a ganglia.
>>> Kafka uses its own zookeeper to store state info per topic & group.idyou can look at kafka offsets using
>>> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>>> > -Harsha
>>> >
>>> >
>>> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>>> >> Is there a way to tell where in the kafka stream my topology is
>>> starting from?
>>> >> From my understanding Storm will use zookeeper in order to tell its
>>> place in the Kafka stream. Where can I find metrics on this ?
>>> >> How can I see how large the stream is? What how much data is sitting
>>> in the stream and what the most recent/oldest position is?
>>> >>
>>> >> Thanks
>>> >>
>>> >> --
>>> >> Raphael Hsieh
>>>
>>
>>
>>
>>  --
>> Danijel Schiavuzzi
>>
>> E: danijel@schiavuzzi.com
>> W: www.schiavuzzi.com
>> T: +385989035562
>> Skype: danijels7
>>
>
>
>
>  --
>  Cody A. Ray, LEED AP
> cody.a.ray@gmail.com
> 215.501.7891
>
>
>


-- 
Cody A. Ray, LEED AP
cody.a.ray@gmail.com
215.501.7891

Re: Position in Kafka Stream

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

On Wed, May 28, 2014 at 11:39 AM, Tyson Norris <tn...@adobe.com> wrote:

>  Thanks Cody -
> I tried the BrightTag fork and still have problems with
> storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident
> topology (haven’t tried non-trident yet):
> (venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology
> TrendingTagTopology --spoutroot storm --friendly
> Traceback (most recent call last):
>   File "./monitor.py", line 112, in <module>
>     sys.exit(main())
>   File "./monitor.py", line 96, in main
>     zk_data = process(zc.spouts(options.spoutroot, options.topology))
>   File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in
> spouts
>     j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py",
> line 501, in loads
>     return _default_decoder.decode(s)
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
> line 370, in decode
>     obj, end = self.raw_decode(s)
>   File
> "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py",
> line 389, in raw_decode
>     return self.scan_once(s, idx=_w(s, idx).end())
> simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char
> 0)
> (venv)tnorris-osx:stormkafkamon tnorris$
>
>  I’m not too familiar with python but will try to debug it as time allows
> - let me know if you have advice.
>


I see you're using Kafka+Storm.  Have a look at SPM
<http://sematext.com/spm/>, which can show you all kinds of metrics for
both Kafka and Storm side by side + their logs if you feed them into Logsene
<http://sematext.com/logsene/>.  Have a look at the demo, we have a small
Kafka cluster with its metrics exposed.  v072, but we're actually in
process of migrating to v0811 ourselves and will be adding a pile of graphs
for all the new Kafka 0.8x metrics..... just looked at them closely today,
actually.... there is a ton of them. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/



>
>
>
>  On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com> wrote:
>
>  You can also use stormkafkamon to track this stuff. Its not good for
> historical analysis like graphite/ganglia, but its good if you just want to
> see how things currently stand.
>
>  The original: https://github.com/otoolep/stormkafkamon
>
>  This didn't work for us without some updates (incompatibility with the
> latest python-kafka dep). Here are those updates:
> https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148
>
>  (Our branch has a couple more things that parse the kafka messages with
> our format (which embeds a timestamp) to determine how long (in time) storm
> is behind... planning to clean that up soon so it can be a bit more
> reusable)
>
>  https://github.com/BrightTag/stormkafkamon
>
>  -Cody
>
>
> On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <
> danijel@schiavuzzi.com> wrote:
>
>> Yes, Trident Kafka spouts give you the same metrics. Take a look at the
>> code to find out what's available.
>>
>>
>> On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com> wrote:
>>
>>> Do Trident variants of kafka spouts do something similar?
>>> Thanks
>>> Tyson
>>>
>>> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>>> >
>>> > Raphael,
>>> >        kafka spout sends metrics for kafkaOffset and kafkaPartition
>>> you can look at those by using LoggingMetrics or setting up a ganglia.
>>> Kafka uses its own zookeeper to store state info per topic & group.id
>>> you can look at kafka offsets using
>>> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>>> > -Harsha
>>> >
>>> >
>>> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>>> >> Is there a way to tell where in the kafka stream my topology is
>>> starting from?
>>> >> From my understanding Storm will use zookeeper in order to tell its
>>> place in the Kafka stream. Where can I find metrics on this ?
>>> >> How can I see how large the stream is? What how much data is sitting
>>> in the stream and what the most recent/oldest position is?
>>> >>
>>> >> Thanks
>>> >>
>>> >> --
>>> >> Raphael Hsieh
>>>
>>
>>
>>
>>  --
>> Danijel Schiavuzzi
>>
>> E: danijel@schiavuzzi.com
>> W: www.schiavuzzi.com
>> T: +385989035562
>> Skype: danijels7
>>
>
>
>
>  --
>  Cody A. Ray, LEED AP
> cody.a.ray@gmail.com
> 215.501.7891
>
>
>

Re: Position in Kafka Stream

Posted by Tyson Norris <tn...@adobe.com>.

Thanks Cody -
I tried the BrightTag fork and still have problems with storm 0.9.1-incubating and kafka 0.8.1, I get an error with my trident topology (haven’t tried non-trident yet):
(venv)tnorris-osx:stormkafkamon tnorris$ ./monitor.py --topology TrendingTagTopology --spoutroot storm --friendly
Traceback (most recent call last):
  File "./monitor.py", line 112, in <module>
    sys.exit(main())
  File "./monitor.py", line 96, in main
    zk_data = process(zc.spouts(options.spoutroot, options.topology))
  File "/git/github/stormkafkamon/stormkafkamon/zkclient.py", line 76, in spouts
    j = json.loads(self.client.get(self._zjoin([spout_root, c, p]))[0])
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/__init__.py", line 501, in loads
    return _default_decoder.decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/git/github/stormkafkamon/venv/lib/python2.7/site-packages/simplejson/decoder.py", line 389, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(venv)tnorris-osx:stormkafkamon tnorris$

I’m not too familiar with python but will try to debug it as time allows - let me know if you have advice.

Thanks
Tyson




On May 28, 2014, at 7:20 AM, Cody A. Ray <co...@gmail.com>> wrote:

You can also use stormkafkamon to track this stuff. Its not good for historical analysis like graphite/ganglia, but its good if you just want to see how things currently stand.

The original: https://github.com/otoolep/stormkafkamon

This didn't work for us without some updates (incompatibility with the latest python-kafka dep). Here are those updates: https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148

(Our branch has a couple more things that parse the kafka messages with our format (which embeds a timestamp) to determine how long (in time) storm is behind... planning to clean that up soon so it can be a bit more reusable)

https://github.com/BrightTag/stormkafkamon

-Cody


On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi <da...@schiavuzzi.com>> wrote:
Yes, Trident Kafka spouts give you the same metrics. Take a look at the code to find out what's available.


On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com>> wrote:
Do Trident variants of kafka spouts do something similar?
Thanks
Tyson

> On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io>> wrote:
>
> Raphael,
>        kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id<http://group.id/> you can look at kafka offsets using
> kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> -Harsha
>
>
>> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>> Is there a way to tell where in the kafka stream my topology is starting from?
>> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
>> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
>>
>> Thanks
>>
>> --
>> Raphael Hsieh



--
Danijel Schiavuzzi

E: danijel@schiavuzzi.com<ma...@schiavuzzi.com>
W: www.schiavuzzi.com<http://www.schiavuzzi.com/>
T: +385989035562
Skype: danijels7



--
Cody A. Ray, LEED AP
cody.a.ray@gmail.com<ma...@gmail.com>
215.501.7891<tel:215.501.7891>

Re: Position in Kafka Stream

Posted by "Cody A. Ray" <co...@gmail.com>.

You can also use stormkafkamon to track this stuff. Its not good for
historical analysis like graphite/ganglia, but its good if you just want to
see how things currently stand.

The original: https://github.com/otoolep/stormkafkamon

This didn't work for us without some updates (incompatibility with the
latest python-kafka dep). Here are those updates:
https://github.com/BrightTag/stormkafkamon/commit/07eede9ec72329fe2cad893d087541b583e11148

(Our branch has a couple more things that parse the kafka messages with our
format (which embeds a timestamp) to determine how long (in time) storm is
behind... planning to clean that up soon so it can be a bit more reusable)

https://github.com/BrightTag/stormkafkamon

-Cody

On Wed, May 28, 2014 at 4:50 AM, Danijel Schiavuzzi
<da...@schiavuzzi.com>wrote:

> Yes, Trident Kafka spouts give you the same metrics. Take a look at the
> code to find out what's available.
>
>
> On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com> wrote:
>
>> Do Trident variants of kafka spouts do something similar?
>> Thanks
>> Tyson
>>
>> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
>> >
>> > Raphael,
>> >        kafka spout sends metrics for kafkaOffset and kafkaPartition you
>> can look at those by using LoggingMetrics or setting up a ganglia. Kafka
>> uses its own zookeeper to store state info per topic & group.id you can
>> look at kafka offsets using
>> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
>> > -Harsha
>> >
>> >
>> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>> >> Is there a way to tell where in the kafka stream my topology is
>> starting from?
>> >> From my understanding Storm will use zookeeper in order to tell its
>> place in the Kafka stream. Where can I find metrics on this ?
>> >> How can I see how large the stream is? What how much data is sitting
>> in the stream and what the most recent/oldest position is?
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Raphael Hsieh
>>
>
>
>
> --
> Danijel Schiavuzzi
>
> E: danijel@schiavuzzi.com
> W: www.schiavuzzi.com
> T: +385989035562
> Skype: danijels7
>

-- 
Cody A. Ray, LEED AP
cody.a.ray@gmail.com
215.501.7891

Re: Position in Kafka Stream

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.

Yes, Trident Kafka spouts give you the same metrics. Take a look at the
code to find out what's available.


On Wed, May 28, 2014 at 3:55 AM, Tyson Norris <tn...@adobe.com> wrote:

> Do Trident variants of kafka spouts do something similar?
> Thanks
> Tyson
>
> > On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
> >
> > Raphael,
> >        kafka spout sends metrics for kafkaOffset and kafkaPartition you
> can look at those by using LoggingMetrics or setting up a ganglia. Kafka
> uses its own zookeeper to store state info per topic & group.id you can
> look at kafka offsets using
> > kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> > -Harsha
> >
> >
> >> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
> >> Is there a way to tell where in the kafka stream my topology is
> starting from?
> >> From my understanding Storm will use zookeeper in order to tell its
> place in the Kafka stream. Where can I find metrics on this ?
> >> How can I see how large the stream is? What how much data is sitting in
> the stream and what the most recent/oldest position is?
> >>
> >> Thanks
> >>
> >> --
> >> Raphael Hsieh
>



-- 
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Re: Position in Kafka Stream

Posted by Tyson Norris <tn...@adobe.com>.

Do Trident variants of kafka spouts do something similar?
Thanks
Tyson

> On May 27, 2014, at 3:19 PM, "Harsha" <st...@harsha.io> wrote:
> 
> Raphael,
>        kafka spout sends metrics for kafkaOffset and kafkaPartition you can look at those by using LoggingMetrics or setting up a ganglia. Kafka uses its own zookeeper to store state info per topic & group.id you can look at kafka offsets using 
> kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker
> -Harsha
>  
>  
>> On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:
>> Is there a way to tell where in the kafka stream my topology is starting from?
>> From my understanding Storm will use zookeeper in order to tell its place in the Kafka stream. Where can I find metrics on this ?
>> How can I see how large the stream is? What how much data is sitting in the stream and what the most recent/oldest position is?
>>  
>> Thanks
>>  
>> -- 
>> Raphael Hsieh

Re: Position in Kafka Stream

Posted by Harsha <st...@harsha.io>.

Raphael,

       kafka spout sends metrics for kafkaOffset and kafkaPartition you
can look at those by using LoggingMetrics or setting up a ganglia.
Kafka uses its own zookeeper to store state info per topic & group.id
you can look at kafka offsets using

kafka/bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker

-Harsha





On Tue, May 27, 2014, at 03:01 PM, Raphael Hsieh wrote:

Is there a way to tell where in the kafka stream my topology is
starting from?
>From my understanding Storm will use zookeeper in order to tell its
place in the Kafka stream. Where can I find metrics on this ?
How can I see how large the stream is? What how much data is sitting in
the stream and what the most recent/oldest position is?

Thanks

--
Raphael Hsieh