You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brian Stempin <bs...@rightaction.com> on 2014/02/19 20:39:00 UTC

Question about the usage of Seekable within the LineRecordReader

Hi List,
In order to write my own record reader, I'm taking a look at the
*LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
to tell where it is in the file when using something other than an
*InputStream*.  As far as I can see, the only reason its used is to get the
current position within the file (within *getFilePosition()* ).

My question is:  Why?  It looks like the file position is already tracked
by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
of looking at *pos*?

Thanks for the help,
Brian

Re: Question about the usage of Seekable within the LineRecordReader

Posted by Brian Stempin <bs...@rightaction.com>.
Hi Yong,
The *LineRecordReader* has a *FSDataInputStream* named *fileIn.*  It then
has a separate *Seekable* named *filePosition*, which is set equal to
*fileIn.*  *filePosition.seek()* is never called.  In the constructor,
*fileIn.seek()* is called, but never again.  For the rest of the class, the
only call made to *filePosition* is *getPos()*.  As I mentioned in the
first email, this seems redundant.

The question comes from this bit of code:

>   private long getFilePosition() throws IOException {
>     long retVal;
>     if (isCompressedInput() && null != filePosition) {
>       retVal = filePosition.getPos();
>     } else {
>       retVal = pos;
>     }
>     return retVal;
>   }


That's the only place *filePosition* is used.  If there's also a field name
*pos* that tracks the same thing, then why use the *filePosition* at all?
 Isn't that just duplicate work?

Thanks for giving me time,
Brian


On Wed, Feb 19, 2014 at 2:55 PM, java8964 <ja...@hotmail.com> wrote:

> Hi, Brian:
>
> I hope I understand your question correctly. Here is my view what provided
> from the Seekable interface.
>
> The Seekable interface also defines the "seek(long pos)" method, which
> allows the client to seek to a specified position in the underline
> InputStream.
>
> In the RecordReader, it will get the start position and an instance of the
> inputSplit, but the underline input stream is not open or available yet.
>
> The RecordReader will find the correct start position of the stream, and
> use Seekable interface to "seek" the specified start position, and start to
> read the bytes from there, to translates following bytes data into  <K, V>
> pairs.
>
> Without Seekable interface, there is no way to "seek" to the correct
> starting position.
>
> Yong
>
> ------------------------------
> Date: Wed, 19 Feb 2014 14:39:00 -0500
> Subject: Question about the usage of Seekable within the LineRecordReader
> From: bstempin@rightaction.com
> To: user@hadoop.apache.org
>
>
> Hi List,
> In order to write my own record reader, I'm taking a look at the
> *LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
> to tell where it is in the file when using something other than an
> *InputStream*.  As far as I can see, the only reason its used is to get
> the current position within the file (within *getFilePosition()* ).
>
> My question is:  Why?  It looks like the file position is already tracked
> by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
> of looking at *pos*?
>
> Thanks for the help,
> Brian
>

Re: Question about the usage of Seekable within the LineRecordReader

Posted by Brian Stempin <bs...@rightaction.com>.
Hi Yong,
The *LineRecordReader* has a *FSDataInputStream* named *fileIn.*  It then
has a separate *Seekable* named *filePosition*, which is set equal to
*fileIn.*  *filePosition.seek()* is never called.  In the constructor,
*fileIn.seek()* is called, but never again.  For the rest of the class, the
only call made to *filePosition* is *getPos()*.  As I mentioned in the
first email, this seems redundant.

The question comes from this bit of code:

>   private long getFilePosition() throws IOException {
>     long retVal;
>     if (isCompressedInput() && null != filePosition) {
>       retVal = filePosition.getPos();
>     } else {
>       retVal = pos;
>     }
>     return retVal;
>   }


That's the only place *filePosition* is used.  If there's also a field name
*pos* that tracks the same thing, then why use the *filePosition* at all?
 Isn't that just duplicate work?

Thanks for giving me time,
Brian


On Wed, Feb 19, 2014 at 2:55 PM, java8964 <ja...@hotmail.com> wrote:

> Hi, Brian:
>
> I hope I understand your question correctly. Here is my view what provided
> from the Seekable interface.
>
> The Seekable interface also defines the "seek(long pos)" method, which
> allows the client to seek to a specified position in the underline
> InputStream.
>
> In the RecordReader, it will get the start position and an instance of the
> inputSplit, but the underline input stream is not open or available yet.
>
> The RecordReader will find the correct start position of the stream, and
> use Seekable interface to "seek" the specified start position, and start to
> read the bytes from there, to translates following bytes data into  <K, V>
> pairs.
>
> Without Seekable interface, there is no way to "seek" to the correct
> starting position.
>
> Yong
>
> ------------------------------
> Date: Wed, 19 Feb 2014 14:39:00 -0500
> Subject: Question about the usage of Seekable within the LineRecordReader
> From: bstempin@rightaction.com
> To: user@hadoop.apache.org
>
>
> Hi List,
> In order to write my own record reader, I'm taking a look at the
> *LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
> to tell where it is in the file when using something other than an
> *InputStream*.  As far as I can see, the only reason its used is to get
> the current position within the file (within *getFilePosition()* ).
>
> My question is:  Why?  It looks like the file position is already tracked
> by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
> of looking at *pos*?
>
> Thanks for the help,
> Brian
>

Re: Question about the usage of Seekable within the LineRecordReader

Posted by Brian Stempin <bs...@rightaction.com>.
Hi Yong,
The *LineRecordReader* has a *FSDataInputStream* named *fileIn.*  It then
has a separate *Seekable* named *filePosition*, which is set equal to
*fileIn.*  *filePosition.seek()* is never called.  In the constructor,
*fileIn.seek()* is called, but never again.  For the rest of the class, the
only call made to *filePosition* is *getPos()*.  As I mentioned in the
first email, this seems redundant.

The question comes from this bit of code:

>   private long getFilePosition() throws IOException {
>     long retVal;
>     if (isCompressedInput() && null != filePosition) {
>       retVal = filePosition.getPos();
>     } else {
>       retVal = pos;
>     }
>     return retVal;
>   }


That's the only place *filePosition* is used.  If there's also a field name
*pos* that tracks the same thing, then why use the *filePosition* at all?
 Isn't that just duplicate work?

Thanks for giving me time,
Brian


On Wed, Feb 19, 2014 at 2:55 PM, java8964 <ja...@hotmail.com> wrote:

> Hi, Brian:
>
> I hope I understand your question correctly. Here is my view what provided
> from the Seekable interface.
>
> The Seekable interface also defines the "seek(long pos)" method, which
> allows the client to seek to a specified position in the underline
> InputStream.
>
> In the RecordReader, it will get the start position and an instance of the
> inputSplit, but the underline input stream is not open or available yet.
>
> The RecordReader will find the correct start position of the stream, and
> use Seekable interface to "seek" the specified start position, and start to
> read the bytes from there, to translates following bytes data into  <K, V>
> pairs.
>
> Without Seekable interface, there is no way to "seek" to the correct
> starting position.
>
> Yong
>
> ------------------------------
> Date: Wed, 19 Feb 2014 14:39:00 -0500
> Subject: Question about the usage of Seekable within the LineRecordReader
> From: bstempin@rightaction.com
> To: user@hadoop.apache.org
>
>
> Hi List,
> In order to write my own record reader, I'm taking a look at the
> *LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
> to tell where it is in the file when using something other than an
> *InputStream*.  As far as I can see, the only reason its used is to get
> the current position within the file (within *getFilePosition()* ).
>
> My question is:  Why?  It looks like the file position is already tracked
> by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
> of looking at *pos*?
>
> Thanks for the help,
> Brian
>

Re: Question about the usage of Seekable within the LineRecordReader

Posted by Brian Stempin <bs...@rightaction.com>.
Hi Yong,
The *LineRecordReader* has a *FSDataInputStream* named *fileIn.*  It then
has a separate *Seekable* named *filePosition*, which is set equal to
*fileIn.*  *filePosition.seek()* is never called.  In the constructor,
*fileIn.seek()* is called, but never again.  For the rest of the class, the
only call made to *filePosition* is *getPos()*.  As I mentioned in the
first email, this seems redundant.

The question comes from this bit of code:

>   private long getFilePosition() throws IOException {
>     long retVal;
>     if (isCompressedInput() && null != filePosition) {
>       retVal = filePosition.getPos();
>     } else {
>       retVal = pos;
>     }
>     return retVal;
>   }


That's the only place *filePosition* is used.  If there's also a field name
*pos* that tracks the same thing, then why use the *filePosition* at all?
 Isn't that just duplicate work?

Thanks for giving me time,
Brian


On Wed, Feb 19, 2014 at 2:55 PM, java8964 <ja...@hotmail.com> wrote:

> Hi, Brian:
>
> I hope I understand your question correctly. Here is my view what provided
> from the Seekable interface.
>
> The Seekable interface also defines the "seek(long pos)" method, which
> allows the client to seek to a specified position in the underline
> InputStream.
>
> In the RecordReader, it will get the start position and an instance of the
> inputSplit, but the underline input stream is not open or available yet.
>
> The RecordReader will find the correct start position of the stream, and
> use Seekable interface to "seek" the specified start position, and start to
> read the bytes from there, to translates following bytes data into  <K, V>
> pairs.
>
> Without Seekable interface, there is no way to "seek" to the correct
> starting position.
>
> Yong
>
> ------------------------------
> Date: Wed, 19 Feb 2014 14:39:00 -0500
> Subject: Question about the usage of Seekable within the LineRecordReader
> From: bstempin@rightaction.com
> To: user@hadoop.apache.org
>
>
> Hi List,
> In order to write my own record reader, I'm taking a look at the
> *LineRecordReader* in v 2.2.0.  I notice that it uses *Seekable* in order
> to tell where it is in the file when using something other than an
> *InputStream*.  As far as I can see, the only reason its used is to get
> the current position within the file (within *getFilePosition()* ).
>
> My question is:  Why?  It looks like the file position is already tracked
> by the *pos* field.  Is there a reason to use *Seekable.getPos()* instead
> of looking at *pos*?
>
> Thanks for the help,
> Brian
>

RE: Question about the usage of Seekable within the LineRecordReader

Posted by java8964 <ja...@hotmail.com>.
Hi, Brian:
I hope I understand your question correctly. Here is my view what provided from the Seekable interface.
The Seekable interface also defines the "seek(long pos)" method, which allows the client to seek to a specified position in the underline InputStream.
In the RecordReader, it will get the start position and an instance of the inputSplit, but the underline input stream is not open or available yet.
The RecordReader will find the correct start position of the stream, and use Seekable interface to "seek" the specified start position, and start to read the bytes from there, to translates following bytes data into  <K, V> pairs.
Without Seekable interface, there is no way to "seek" to the correct starting position.
Yong 

Date: Wed, 19 Feb 2014 14:39:00 -0500
Subject: Question about the usage of Seekable within the LineRecordReader
From: bstempin@rightaction.com
To: user@hadoop.apache.org

Hi List,In order to write my own record reader, I'm taking a look at the LineRecordReader in v 2.2.0.  I notice that it uses Seekable in order to tell where it is in the file when using something other than an InputStream.  As far as I can see, the only reason its used is to get the current position within the file (within getFilePosition() ).

My question is:  Why?  It looks like the file position is already tracked by the pos field.  Is there a reason to use Seekable.getPos() instead of looking at pos?

Thanks for the help,Brian 		 	   		  

RE: Question about the usage of Seekable within the LineRecordReader

Posted by java8964 <ja...@hotmail.com>.
Hi, Brian:
I hope I understand your question correctly. Here is my view what provided from the Seekable interface.
The Seekable interface also defines the "seek(long pos)" method, which allows the client to seek to a specified position in the underline InputStream.
In the RecordReader, it will get the start position and an instance of the inputSplit, but the underline input stream is not open or available yet.
The RecordReader will find the correct start position of the stream, and use Seekable interface to "seek" the specified start position, and start to read the bytes from there, to translates following bytes data into  <K, V> pairs.
Without Seekable interface, there is no way to "seek" to the correct starting position.
Yong 

Date: Wed, 19 Feb 2014 14:39:00 -0500
Subject: Question about the usage of Seekable within the LineRecordReader
From: bstempin@rightaction.com
To: user@hadoop.apache.org

Hi List,In order to write my own record reader, I'm taking a look at the LineRecordReader in v 2.2.0.  I notice that it uses Seekable in order to tell where it is in the file when using something other than an InputStream.  As far as I can see, the only reason its used is to get the current position within the file (within getFilePosition() ).

My question is:  Why?  It looks like the file position is already tracked by the pos field.  Is there a reason to use Seekable.getPos() instead of looking at pos?

Thanks for the help,Brian 		 	   		  

RE: Question about the usage of Seekable within the LineRecordReader

Posted by java8964 <ja...@hotmail.com>.
Hi, Brian:
I hope I understand your question correctly. Here is my view what provided from the Seekable interface.
The Seekable interface also defines the "seek(long pos)" method, which allows the client to seek to a specified position in the underline InputStream.
In the RecordReader, it will get the start position and an instance of the inputSplit, but the underline input stream is not open or available yet.
The RecordReader will find the correct start position of the stream, and use Seekable interface to "seek" the specified start position, and start to read the bytes from there, to translates following bytes data into  <K, V> pairs.
Without Seekable interface, there is no way to "seek" to the correct starting position.
Yong 

Date: Wed, 19 Feb 2014 14:39:00 -0500
Subject: Question about the usage of Seekable within the LineRecordReader
From: bstempin@rightaction.com
To: user@hadoop.apache.org

Hi List,In order to write my own record reader, I'm taking a look at the LineRecordReader in v 2.2.0.  I notice that it uses Seekable in order to tell where it is in the file when using something other than an InputStream.  As far as I can see, the only reason its used is to get the current position within the file (within getFilePosition() ).

My question is:  Why?  It looks like the file position is already tracked by the pos field.  Is there a reason to use Seekable.getPos() instead of looking at pos?

Thanks for the help,Brian 		 	   		  

RE: Question about the usage of Seekable within the LineRecordReader

Posted by java8964 <ja...@hotmail.com>.
Hi, Brian:
I hope I understand your question correctly. Here is my view what provided from the Seekable interface.
The Seekable interface also defines the "seek(long pos)" method, which allows the client to seek to a specified position in the underline InputStream.
In the RecordReader, it will get the start position and an instance of the inputSplit, but the underline input stream is not open or available yet.
The RecordReader will find the correct start position of the stream, and use Seekable interface to "seek" the specified start position, and start to read the bytes from there, to translates following bytes data into  <K, V> pairs.
Without Seekable interface, there is no way to "seek" to the correct starting position.
Yong 

Date: Wed, 19 Feb 2014 14:39:00 -0500
Subject: Question about the usage of Seekable within the LineRecordReader
From: bstempin@rightaction.com
To: user@hadoop.apache.org

Hi List,In order to write my own record reader, I'm taking a look at the LineRecordReader in v 2.2.0.  I notice that it uses Seekable in order to tell where it is in the file when using something other than an InputStream.  As far as I can see, the only reason its used is to get the current position within the file (within getFilePosition() ).

My question is:  Why?  It looks like the file position is already tracked by the pos field.  Is there a reason to use Seekable.getPos() instead of looking at pos?

Thanks for the help,Brian