You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@chukwa.apache.org by Ellen Strnod <es...@annealsoft.com> on 2010/03/11 02:43:50 UTC

Chunk sequence IDs other than byte counts?

I am a new user, contemplating using Chukwa with data which will be  
read from a JMS queue.  I expect to write an adapter which will read  
from the queue and create text records which will be chunked and sent  
to the collector.  My question - does anyone know if the chunk  
sequence ID, which the Chukwa architecture document says is the  
number of bytes the adapter has sent, could be any other repeatable  
sequential number, or does it have to be a byte count?  (To return to  
this number is a little problematic in case of a restart,  but the  
records coming off the queue have id's which I would like to use.)

I was also looking for a streaming adapter implementation and ran  
across this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102  
- it seems that this adapter may have the same problem (maybe even  
more so, since it is intended to read from a stream rather than a  
queue) so maybe someone in the project has already given this some  
thought.

Thanks in advance,
Ellen

Re: Chunk sequence IDs other than byte counts?

Posted by Ariel Rabkin <as...@gmail.com>.

Howdy!

The agent and collector code assumes that IDs go up monotonically.  So
data from your adaptor should get to HDFS correctly.

The archiver, if unmodified, will make a mess of de-duplication, since
it relies on chunk IDs being byte offsets in order to detect
overlapping chunks.

--Ari

On Wed, Mar 10, 2010 at 5:43 PM, Ellen Strnod <es...@annealsoft.com> wrote:
> I am a new user, contemplating using Chukwa with data which will be read
> from a JMS queue.  I expect to write an adapter which will read from the
> queue and create text records which will be chunked and sent to the
> collector.  My question - does anyone know if the chunk sequence ID, which
> the Chukwa architecture document says is the number of bytes the adapter has
> sent, could be any other repeatable sequential number, or does it have to be
> a byte count?  (To return to this number is a little problematic in case of
> a restart,  but the records coming off the queue have id's which I would
> like to use.)
>
> I was also looking for a streaming adapter implementation and ran across
> this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102 - it seems
> that this adapter may have the same problem (maybe even more so, since it is
> intended to read from a stream rather than a queue) so maybe someone in the
> project has already given this some thought.
>
> Thanks in advance,
> Ellen
>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: Chunk sequence IDs other than byte counts?

Posted by Eric Yang <ey...@yahoo-inc.com>.

Hi Ellen,

Welcome to Chukwa.  The sequence number shouldn't be the byte count.  It is
just a unique number to identify the chunk.  Byte count might not be a good
presentation of the id for various reasons, like log rotate, etc.  It's best
to keep this id generated outside of Chukwa Agent to prevent error case like
a restart.  Hope this helps.

Regards,
Eric


On 3/10/10 5:43 PM, "Ellen Strnod" <es...@annealsoft.com> wrote:

> I am a new user, contemplating using Chukwa with data which will be
> read from a JMS queue.  I expect to write an adapter which will read
> from the queue and create text records which will be chunked and sent
> to the collector.  My question - does anyone know if the chunk
> sequence ID, which the Chukwa architecture document says is the
> number of bytes the adapter has sent, could be any other repeatable
> sequential number, or does it have to be a byte count?  (To return to
> this number is a little problematic in case of a restart,  but the
> records coming off the queue have id's which I would like to use.)
> 
> I was also looking for a streaming adapter implementation and ran
> across this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102
> - it seems that this adapter may have the same problem (maybe even
> more so, since it is intended to read from a stream rather than a
> queue) so maybe someone in the project has already given this some
> thought.
> 
> Thanks in advance,
> Ellen
> 
>