You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by hequn cheng <ch...@gmail.com> on 2014/03/07 06:32:34 UTC

why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Hi~
First, i use FileSystem to open a file in hdfs.
         FSDataInputStream m_dis = fs.open(...);

Second, read the data in m_dis to a byte array.
          byte[] inputdata = new byte[m_dis.available()];
 //m_dis.available = 47185920
          m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);

the value returned by m_dis.read() is 131072(2^17), so the data after
131072 is missing. It seems that FSDataInputStream use short to manage it's
data which confused me a lot. The same code run well in hadoop1.2.1.

thank you~

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by hequn cheng <ch...@gmail.com>.
yep that did the job :)
I use readFully instead and it works well~~thank you~


2014-03-07 13:48 GMT+08:00 Binglin Chang <de...@gmail.com>:

> the semantic of read does not guarantee read as much as possible. you need
> to call read() many times or use readFully
>
>
> On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:
>
>> Hi~
>> First, i use FileSystem to open a file in hdfs.
>>          FSDataInputStream m_dis = fs.open(...);
>>
>> Second, read the data in m_dis to a byte array.
>>           byte[] inputdata = new byte[m_dis.available()];
>>  //m_dis.available = 47185920
>>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>>
>> the value returned by m_dis.read() is 131072(2^17), so the data after
>> 131072 is missing. It seems that FSDataInputStream use short to manage it's
>> data which confused me a lot. The same code run well in hadoop1.2.1.
>>
>> thank you~
>>
>
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by hequn cheng <ch...@gmail.com>.
yep that did the job :)
I use readFully instead and it works well~~thank you~


2014-03-07 13:48 GMT+08:00 Binglin Chang <de...@gmail.com>:

> the semantic of read does not guarantee read as much as possible. you need
> to call read() many times or use readFully
>
>
> On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:
>
>> Hi~
>> First, i use FileSystem to open a file in hdfs.
>>          FSDataInputStream m_dis = fs.open(...);
>>
>> Second, read the data in m_dis to a byte array.
>>           byte[] inputdata = new byte[m_dis.available()];
>>  //m_dis.available = 47185920
>>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>>
>> the value returned by m_dis.read() is 131072(2^17), so the data after
>> 131072 is missing. It seems that FSDataInputStream use short to manage it's
>> data which confused me a lot. The same code run well in hadoop1.2.1.
>>
>> thank you~
>>
>
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by hequn cheng <ch...@gmail.com>.
yep that did the job :)
I use readFully instead and it works well~~thank you~


2014-03-07 13:48 GMT+08:00 Binglin Chang <de...@gmail.com>:

> the semantic of read does not guarantee read as much as possible. you need
> to call read() many times or use readFully
>
>
> On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:
>
>> Hi~
>> First, i use FileSystem to open a file in hdfs.
>>          FSDataInputStream m_dis = fs.open(...);
>>
>> Second, read the data in m_dis to a byte array.
>>           byte[] inputdata = new byte[m_dis.available()];
>>  //m_dis.available = 47185920
>>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>>
>> the value returned by m_dis.read() is 131072(2^17), so the data after
>> 131072 is missing. It seems that FSDataInputStream use short to manage it's
>> data which confused me a lot. The same code run well in hadoop1.2.1.
>>
>> thank you~
>>
>
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by hequn cheng <ch...@gmail.com>.
yep that did the job :)
I use readFully instead and it works well~~thank you~


2014-03-07 13:48 GMT+08:00 Binglin Chang <de...@gmail.com>:

> the semantic of read does not guarantee read as much as possible. you need
> to call read() many times or use readFully
>
>
> On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:
>
>> Hi~
>> First, i use FileSystem to open a file in hdfs.
>>          FSDataInputStream m_dis = fs.open(...);
>>
>> Second, read the data in m_dis to a byte array.
>>           byte[] inputdata = new byte[m_dis.available()];
>>  //m_dis.available = 47185920
>>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>>
>> the value returned by m_dis.read() is 131072(2^17), so the data after
>> 131072 is missing. It seems that FSDataInputStream use short to manage it's
>> data which confused me a lot. The same code run well in hadoop1.2.1.
>>
>> thank you~
>>
>
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by Binglin Chang <de...@gmail.com>.
the semantic of read does not guarantee read as much as possible. you need
to call read() many times or use readFully


On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:

> Hi~
> First, i use FileSystem to open a file in hdfs.
>          FSDataInputStream m_dis = fs.open(...);
>
> Second, read the data in m_dis to a byte array.
>           byte[] inputdata = new byte[m_dis.available()];
>  //m_dis.available = 47185920
>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>
> the value returned by m_dis.read() is 131072(2^17), so the data after
> 131072 is missing. It seems that FSDataInputStream use short to manage it's
> data which confused me a lot. The same code run well in hadoop1.2.1.
>
> thank you~
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by Binglin Chang <de...@gmail.com>.
the semantic of read does not guarantee read as much as possible. you need
to call read() many times or use readFully


On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:

> Hi~
> First, i use FileSystem to open a file in hdfs.
>          FSDataInputStream m_dis = fs.open(...);
>
> Second, read the data in m_dis to a byte array.
>           byte[] inputdata = new byte[m_dis.available()];
>  //m_dis.available = 47185920
>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>
> the value returned by m_dis.read() is 131072(2^17), so the data after
> 131072 is missing. It seems that FSDataInputStream use short to manage it's
> data which confused me a lot. The same code run well in hadoop1.2.1.
>
> thank you~
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by Binglin Chang <de...@gmail.com>.
the semantic of read does not guarantee read as much as possible. you need
to call read() many times or use readFully


On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:

> Hi~
> First, i use FileSystem to open a file in hdfs.
>          FSDataInputStream m_dis = fs.open(...);
>
> Second, read the data in m_dis to a byte array.
>           byte[] inputdata = new byte[m_dis.available()];
>  //m_dis.available = 47185920
>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>
> the value returned by m_dis.read() is 131072(2^17), so the data after
> 131072 is missing. It seems that FSDataInputStream use short to manage it's
> data which confused me a lot. The same code run well in hadoop1.2.1.
>
> thank you~
>

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

Posted by Binglin Chang <de...@gmail.com>.
the semantic of read does not guarantee read as much as possible. you need
to call read() many times or use readFully


On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng <ch...@gmail.com> wrote:

> Hi~
> First, i use FileSystem to open a file in hdfs.
>          FSDataInputStream m_dis = fs.open(...);
>
> Second, read the data in m_dis to a byte array.
>           byte[] inputdata = new byte[m_dis.available()];
>  //m_dis.available = 47185920
>           m_dis.read(inputdata, 0, 20 * 1024 * 768 * 3);
>
> the value returned by m_dis.read() is 131072(2^17), so the data after
> 131072 is missing. It seems that FSDataInputStream use short to manage it's
> data which confused me a lot. The same code run well in hadoop1.2.1.
>
> thank you~
>