You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Larsen <da...@connexity.com> on 2013/12/11 19:25:57 UTC

Is it valid to call SequenceFile.Reader `sync` after calling `next`?

Reading Tom White's excellent book I see that you can find a record 
boundary in a SequenceFile with the `sync` method.

What'd I'd really like to do is read the first record of the file and 
then sync forward into another part of the file.  Going even further, 
I'd like to sync multiple times in a large file, reading along the way.

Depending on how the SequenceFile is written and its size, this 
sometimes works.  If anyone's interested, I can describe what I've found 
so far, but my initial question is high level.  What I want to understand is

A) Is `next` then `sync` a valid use case?
B) When working with a block-compressed Seq file, will the sync be much 
more efficient than just paging through results on the client?

Here's the link to SO in case anyone wants fake internet points:
http://stackoverflow.com/questions/20508323/is-it-valid-to-call-sequencefile-reader-sync-after-calling-next

Kind Regards,
David Larsen