You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Jaydeep Ayachit <ja...@persistent.co.in> on 2010/10/28 16:58:39 UTC

Data loss on collector side

As per the collector design, the collector accepts multiple chunks and writes each chunk to hdfs. If all the chunks are written to hdfs, collector sends back 200 status to agent
If hdfs write fails in between, the collector aborts entire processing and sends exception. This could mean that the data is partially written to hdfs. I have a couple of questions


1.       The agent does not receive response 200. Does it resend the same data to another collector? How does checkpointing works in this case?

2.       If the agent sends same data to another collector and it goes to hdfs, there is a duplication of some records. Are those duplicates filtered when preprocessor runs?

In summary what data loss happens when hdfs goes down from collector perspective?

Thanks,
Jaydeep

Jaydeep Ayachit | Persistent Systems Ltd
Cell: +91 9822393963 | Desk: +91 712 3986747


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Data loss on collector side

Posted by Ariel Rabkin <as...@gmail.com>.
Yes. the Agent will resend. The checkpoint state will not be advanced
until an 200 is received from a collector.

Yes, the demux processing is intended to remove duplicates; if it
doesn't, that's a bug.


On Thu, Oct 28, 2010 at 7:58 AM, Jaydeep Ayachit
<ja...@persistent.co.in> wrote:
> As per the collector design, the collector accepts multiple chunks and
> writes each chunk to hdfs. If all the chunks are written to hdfs, collector
> sends back 200 status to agent
>
> If hdfs write fails in between, the collector aborts entire processing and
> sends exception. This could mean that the data is partially written to hdfs.
> I have a couple of questions
>
>
>
> 1.       The agent does not receive response 200. Does it resend the same
> data to another collector? How does checkpointing works in this case?
>
> 2.       If the agent sends same data to another collector and it goes to
> hdfs, there is a duplication of some records. Are those duplicates filtered
> when preprocessor runs?
>
>
>
> In summary what data loss happens when hdfs goes down from collector
> perspective?
>
>
>
> Thanks,
>
> Jaydeep
>
>
>
> Jaydeep Ayachit | Persistent Systems Ltd
>
> Cell: +91 9822393963 | Desk: +91 712 3986747
>
>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: Data loss on collector side

Posted by Eric Yang <ey...@yahoo-inc.com>.


On 10/28/10 7:58 AM, "Jaydeep Ayachit" <ja...@persistent.co.in>
wrote:

> As per the collector design, the collector accepts multiple chunks and writes
> each chunk to hdfs. If all the chunks are written to hdfs, collector sends
> back 200 status to agent
> If hdfs write fails in between, the collector aborts entire processing and
> sends exception. This could mean that the data is partially written to hdfs. I
> have a couple of questions
>  
> 1.      The agent does not receive response 200. Does it resend the same data
> to another collector? How does checkpointing works in this case?
> 

Agent check for response HTTP 200, if it doesn't receive OK status, it will
send to another collector from it's list.  Checkpoint is updated after HTTP
200 status is received.

> 2.      If the agent sends same data to another collector and it goes to hdfs,
> there is a duplication of some records. Are those duplicates filtered when
> preprocessor runs?

It is possible to build a preprocessor filter to remove duplicate data for
small time window.  However, it doesn't guarantee to remove 100% of
duplicates because duplicated data can be received in different batch of the
Archive/Demux process.  I recommend to remove duplicates when data is being
indexed where the down stream program like hbase or mysql has view of all
the data.

> In summary what data loss happens when hdfs goes down from collector
> perspective?

When HDFS goes down, then collector exits.  Hence, it is possible to lose up
to 15 second data if the last flush to HDFS did not store data to datanode.
In this case, collector will not send HTTP code 200 to agent, and data is
resent by the agent.  There is also a localWriter which writes data locally
on collector node, then upload to HDFS.  This assumes collector local disk
is more reliable than HDFS.  I don't think this is a common scenario.

Regards,
Eric

>  
> Thanks,
> Jaydeep
>  
> Jaydeep Ayachit | Persistent Systems Ltd
> Cell: +91 9822393963 | Desk: +91 712 3986747
>  
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient, you are not authorized to read, retain, copy,
> print, distribute or use this message. If you have received this communication
> in error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>