You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/05/23 20:04:20 UTC

SequenceFile sync marker uniqueness

How does SequenceFile guarantee that the sync marker does not appear in the data?
John


Re: SequenceFile sync marker uniqueness

Posted by Harsh J <ha...@cloudera.com>.
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init
time, so pretty random). For the rest of my answer, I'll prefer not to
repeat what Martin's already said very well here:
http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for
the Avro DataFile format which uses a similar technique.


On Thu, May 23, 2013 at 11:34 PM, John Lilley <jo...@redpoint.net>wrote:

>  How does SequenceFile guarantee that the sync marker does not appear in
> the data?****
>
> John****
>
> ** **
>



-- 
Harsh J

Re: SequenceFile sync marker uniqueness

Posted by Harsh J <ha...@cloudera.com>.
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init
time, so pretty random). For the rest of my answer, I'll prefer not to
repeat what Martin's already said very well here:
http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for
the Avro DataFile format which uses a similar technique.


On Thu, May 23, 2013 at 11:34 PM, John Lilley <jo...@redpoint.net>wrote:

>  How does SequenceFile guarantee that the sync marker does not appear in
> the data?****
>
> John****
>
> ** **
>



-- 
Harsh J

Re: SequenceFile sync marker uniqueness

Posted by Harsh J <ha...@cloudera.com>.
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init
time, so pretty random). For the rest of my answer, I'll prefer not to
repeat what Martin's already said very well here:
http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for
the Avro DataFile format which uses a similar technique.


On Thu, May 23, 2013 at 11:34 PM, John Lilley <jo...@redpoint.net>wrote:

>  How does SequenceFile guarantee that the sync marker does not appear in
> the data?****
>
> John****
>
> ** **
>



-- 
Harsh J

Re: SequenceFile sync marker uniqueness

Posted by Harsh J <ha...@cloudera.com>.
SequenceFiles use a 16 digit MD5 (computed based on a UID and writer ~init
time, so pretty random). For the rest of my answer, I'll prefer not to
repeat what Martin's already said very well here:
http://search-hadoop.com/m/VYVra2krg5t1 (point #2) over the Avro lists for
the Avro DataFile format which uses a similar technique.


On Thu, May 23, 2013 at 11:34 PM, John Lilley <jo...@redpoint.net>wrote:

>  How does SequenceFile guarantee that the sync marker does not appear in
> the data?****
>
> John****
>
> ** **
>



-- 
Harsh J