You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/06/27 22:44:50 UTC

load files

Hi all,
when we specify the path of input to a load operator, is it a file or a directory? Similarly, when we use store-load to connect two MR operators, is the path specified in the store and load a directory?

Thanks,
-Gang



      

Re: load files

Posted by Jeff Zhang <zj...@gmail.com>.
part-xxxxx for is old hadoop mapred api, and part-m-xxxxx and
part-r-xxxxx is for new hadoop mapred api
You can use hadoop's globstatus("part-*") to handle both of these cases.



2010/6/28 Gang Luo <lg...@yahoo.com.cn>:
> Thanks, Jeff.
> In pig, the file name look like this: part-m-xxxxx(for map result) or part-r-xxxxx(for reduce result), which are different from the hadoop style (part-xxxxx). So, can we control the name of each generated file? How?
>
> Thanks,
> -Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Jeff Zhang <zj...@gmail.com>
> 收件人: pig-dev@hadoop.apache.org
> 发送日期: 2010/6/27 (周日) 9:22:30 下午
> 主   题: Re: load files
>
> Hi Gang,
>
> The path specified in load can be both file or directory, besides you
> can also leverage hadoop's globstatus.  The path specified in store is
> a directory.
>
>
>
> On Mon, Jun 28, 2010 at 4:44 AM, Gang Luo <lg...@yahoo.com.cn> wrote:
>> Hi all,
>> when we specify the path of input to a load operator, is it a file or a directory? Similarly, when we use store-load to connect two MR operators, is the path specified in the store and load a directory?
>>
>> Thanks,
>> -Gang
>>
>>
>>
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>
>
>



-- 
Best Regards

Jeff Zhang

Re: load files

Posted by Gang Luo <lg...@yahoo.com.cn>.
Thanks, Jeff.
In pig, the file name look like this: part-m-xxxxx(for map result) or part-r-xxxxx(for reduce result), which are different from the hadoop style (part-xxxxx). So, can we control the name of each generated file? How?

Thanks,
-Gang



----- 原始邮件 ----
发件人: Jeff Zhang <zj...@gmail.com>
收件人: pig-dev@hadoop.apache.org
发送日期: 2010/6/27 (周日) 9:22:30 下午
主   题: Re: load files

Hi Gang,

The path specified in load can be both file or directory, besides you
can also leverage hadoop's globstatus.  The path specified in store is
a directory.



On Mon, Jun 28, 2010 at 4:44 AM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> when we specify the path of input to a load operator, is it a file or a directory? Similarly, when we use store-load to connect two MR operators, is the path specified in the store and load a directory?
>
> Thanks,
> -Gang
>
>
>
>
>



-- 
Best Regards

Jeff Zhang



      

Re: load files

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Gang,

The path specified in load can be both file or directory, besides you
can also leverage hadoop's globstatus.  The path specified in store is
a directory.



On Mon, Jun 28, 2010 at 4:44 AM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> when we specify the path of input to a load operator, is it a file or a directory? Similarly, when we use store-load to connect two MR operators, is the path specified in the store and load a directory?
>
> Thanks,
> -Gang
>
>
>
>
>



-- 
Best Regards

Jeff Zhang