You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2009/11/10 05:47:38 UTC

Re: how to read file in hadoop

Since no response to this question up to now, I'd like to discribe more details about it. 

I try to read a file in HDFS and copy it to another file. It works well and I can see the content by 'cat' is what it supposed to be. The only problems is that, when I read it to Bytes[] and print it out to stdout, it is NOT what it should be. Thus, I cannot do anything (e.g. comparison) except write it directely to another file. 

I guess this problem may due to the setting of file format (text or binary) or coding (e.g.utf-8). Can someone give me some ideas?

 
--Gang



----- 原始邮件 ----
发件人: Gang Luo <lg...@yahoo.com.cn>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/9 (周一) 11:47:02 上午
主   题: how to read file in hadoop

Hi all 
I want to use HDFS IO api to read a result file of the previous mapreduce job. But what I read is not the things in that file, say the content I print to stdout is different from what I get from the console by command 'cat'. I guese there maybe some problem about the file format (binary or text). Can anyone give me some hints?  


Gang Luo



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Re: Re: how to read file in hadoop

Posted by Gang Luo <lg...@yahoo.com.cn>.
It is because the content I read from the file is encoded in UTF8, I
use Text.decode to decode it back to plain text string, the problem is
gone now.

-Gang


----- 原始邮件 ----
发件人: Gang Luo <lg...@yahoo.com.cn>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/10 (周二) 12:14:44 上午
主   题: Re: Re: how to read file in hadoop

I download it to my local filesystem. The content is correct, I can see it either by command or by texteditor. So, I think the file itself has no problem.

--Gang



----- 原始邮件 ----
发件人: Jeff Zhang <zj...@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/9 (周一) 11:58:22 下午
主   题: Re: Re: how to read file in hadoop

Maybe you can download the file to local to see what content is there.


Jeff Zhang


2009/11/10 Gang Luo <lg...@yahoo.com.cn>

> Since no response to this question up to now, I'd like to discribe more
> details about it.
>
> I try to read a file in HDFS and copy it to another file. It works well and
> I can see the content by 'cat' is what it supposed to be. The only problems
> is that, when I read it to Bytes[] and print it out to stdout, it is NOT
> what it should be. Thus, I cannot do anything (e.g. comparison) except write
> it directely to another file.
>
> I guess this problem may due to the setting of file format (text or binary)
> or coding (e.g.utf-8). Can someone give me some ideas?
>
>
> --Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Gang Luo <lg...@yahoo.com.cn>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2009/11/9 (周一) 11:47:02 上午
> 主   题: how to read file in hadoop
>
> Hi all
> I want to use HDFS IO api to read a result file of the previous mapreduce
> job. But what I read is not the things in that file, say the content I print
> to stdout is different from what I get from the console by command 'cat'. I
> guese there maybe some problem about the file format (binary or text). Can
> anyone give me some hints?
>
>
> Gang Luo
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Re: Re: how to read file in hadoop

Posted by Gang Luo <lg...@yahoo.com.cn>.
I download it to my local filesystem. The content is correct, I can see it either by command or by texteditor. So, I think the file itself has no problem.

--Gang



----- 原始邮件 ----
发件人: Jeff Zhang <zj...@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/11/9 (周一) 11:58:22 下午
主   题: Re: Re: how to read file in hadoop

Maybe you can download the file to local to see what content is there.


Jeff Zhang


2009/11/10 Gang Luo <lg...@yahoo.com.cn>

> Since no response to this question up to now, I'd like to discribe more
> details about it.
>
> I try to read a file in HDFS and copy it to another file. It works well and
> I can see the content by 'cat' is what it supposed to be. The only problems
> is that, when I read it to Bytes[] and print it out to stdout, it is NOT
> what it should be. Thus, I cannot do anything (e.g. comparison) except write
> it directely to another file.
>
> I guess this problem may due to the setting of file format (text or binary)
> or coding (e.g.utf-8). Can someone give me some ideas?
>
>
> --Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Gang Luo <lg...@yahoo.com.cn>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2009/11/9 (周一) 11:47:02 上午
> 主   题: how to read file in hadoop
>
> Hi all
> I want to use HDFS IO api to read a result file of the previous mapreduce
> job. But what I read is not the things in that file, say the content I print
> to stdout is different from what I get from the console by command 'cat'. I
> guese there maybe some problem about the file format (binary or text). Can
> anyone give me some hints?
>
>
> Gang Luo
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Re: Re: how to read file in hadoop

Posted by Jeff Zhang <zj...@gmail.com>.
Maybe you can download the file to local to see what content is there.


Jeff Zhang


2009/11/10 Gang Luo <lg...@yahoo.com.cn>

> Since no response to this question up to now, I'd like to discribe more
> details about it.
>
> I try to read a file in HDFS and copy it to another file. It works well and
> I can see the content by 'cat' is what it supposed to be. The only problems
> is that, when I read it to Bytes[] and print it out to stdout, it is NOT
> what it should be. Thus, I cannot do anything (e.g. comparison) except write
> it directely to another file.
>
> I guess this problem may due to the setting of file format (text or binary)
> or coding (e.g.utf-8). Can someone give me some ideas?
>
>
> --Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Gang Luo <lg...@yahoo.com.cn>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2009/11/9 (周一) 11:47:02 上午
> 主   题: how to read file in hadoop
>
> Hi all
> I want to use HDFS IO api to read a result file of the previous mapreduce
> job. But what I read is not the things in that file, say the content I print
> to stdout is different from what I get from the console by command 'cat'. I
> guese there maybe some problem about the file format (binary or text). Can
> anyone give me some hints?
>
>
> Gang Luo
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
>