You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Ellen Strnod <es...@annealsoft.com> on 2010/03/11 02:43:50 UTC
Chunk sequence IDs other than byte counts?
I am a new user, contemplating using Chukwa with data which will be
read from a JMS queue. I expect to write an adapter which will read
from the queue and create text records which will be chunked and sent
to the collector. My question - does anyone know if the chunk
sequence ID, which the Chukwa architecture document says is the
number of bytes the adapter has sent, could be any other repeatable
sequential number, or does it have to be a byte count? (To return to
this number is a little problematic in case of a restart, but the
records coming off the queue have id's which I would like to use.)
I was also looking for a streaming adapter implementation and ran
across this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102
- it seems that this adapter may have the same problem (maybe even
more so, since it is intended to read from a stream rather than a
queue) so maybe someone in the project has already given this some
thought.
Thanks in advance,
Ellen
Re: Chunk sequence IDs other than byte counts?
Posted by Ariel Rabkin <as...@gmail.com>.
Howdy!
The agent and collector code assumes that IDs go up monotonically. So
data from your adaptor should get to HDFS correctly.
The archiver, if unmodified, will make a mess of de-duplication, since
it relies on chunk IDs being byte offsets in order to detect
overlapping chunks.
--Ari
On Wed, Mar 10, 2010 at 5:43 PM, Ellen Strnod <es...@annealsoft.com> wrote:
> I am a new user, contemplating using Chukwa with data which will be read
> from a JMS queue. I expect to write an adapter which will read from the
> queue and create text records which will be chunked and sent to the
> collector. My question - does anyone know if the chunk sequence ID, which
> the Chukwa architecture document says is the number of bytes the adapter has
> sent, could be any other repeatable sequential number, or does it have to be
> a byte count? (To return to this number is a little problematic in case of
> a restart, but the records coming off the queue have id's which I would
> like to use.)
>
> I was also looking for a streaming adapter implementation and ran across
> this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102 - it seems
> that this adapter may have the same problem (maybe even more so, since it is
> intended to read from a stream rather than a queue) so maybe someone in the
> project has already given this some thought.
>
> Thanks in advance,
> Ellen
>
>
>
--
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department
Re: Chunk sequence IDs other than byte counts?
Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi Ellen,
Welcome to Chukwa. The sequence number shouldn't be the byte count. It is
just a unique number to identify the chunk. Byte count might not be a good
presentation of the id for various reasons, like log rotate, etc. It's best
to keep this id generated outside of Chukwa Agent to prevent error case like
a restart. Hope this helps.
Regards,
Eric
On 3/10/10 5:43 PM, "Ellen Strnod" <es...@annealsoft.com> wrote:
> I am a new user, contemplating using Chukwa with data which will be
> read from a JMS queue. I expect to write an adapter which will read
> from the queue and create text records which will be chunked and sent
> to the collector. My question - does anyone know if the chunk
> sequence ID, which the Chukwa architecture document says is the
> number of bytes the adapter has sent, could be any other repeatable
> sequential number, or does it have to be a byte count? (To return to
> this number is a little problematic in case of a restart, but the
> records coming off the queue have id's which I would like to use.)
>
> I was also looking for a streaming adapter implementation and ran
> across this in Jira: http://issues.apache.org/jira/browse/CHUKWA-102
> - it seems that this adapter may have the same problem (maybe even
> more so, since it is intended to read from a stream rather than a
> queue) so maybe someone in the project has already given this some
> thought.
>
> Thanks in advance,
> Ellen
>
>