You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Mix Nin <pi...@gmail.com> on 2013/06/11 23:26:23 UTC

Reading multiple files of a directory using a Single LOAD Command in PIG

I have a directory "Output2. It has file names as below

-----------------
_SUCCESS
part-m-00000
part-m-00001
part-m-00002
part-m-00003
.
.
.
.
part-m-00100
-----------------

The above files are produced by PIG output STORE command .

I want to read the files starting with "part-m-" using PIG command

When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
It does not work and it throws error.

How do I read these files in a single LOAD statement?

Thanks

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Harsh J <ha...@cloudera.com>.
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pi...@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Harsh J <ha...@cloudera.com>.
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pi...@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Harsh J <ha...@cloudera.com>.
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pi...@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Harsh J <ha...@cloudera.com>.
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pi...@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Harsh J <ha...@cloudera.com>.
Yes, you can do that - it will still apply the filter to the globbed results.

On Wed, Jun 12, 2013 at 3:45 AM, Mix Nin <pi...@gmail.com> wrote:
> Hi,
>
> My mistake, I gave backward slashes and so was getting error. I gave
> forward slashes and it is working fine.
>
> Good to know that LOAD ignores filenames that begin with "_" or a period
> ".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
>  /Output/part-m*?
>
> Thanks
>
>
>
>
> On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:
>
>> What is the error?
>>
>> The LoadFunc should be ignoring any filenames that begin with "_" or a
>> period "."
>> If you are trying to skip the _SUCCESS file, the loader you are using
>> (PigStorage) already handles that.
>>
>> Also, can you double check your path is not "/Output/part-m* as opposed to
>> backward slashes?
>>
>>
>> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>>
>> > I have a directory "Output2. It has file names as below
>> >
>> > -----------------
>> > _SUCCESS
>> > part-m-00000
>> > part-m-00001
>> > part-m-00002
>> > part-m-00003
>> > .
>> > .
>> > .
>> > .
>> > part-m-00100
>> > -----------------
>> >
>> > The above files are produced by PIG output STORE command .
>> >
>> > I want to read the files starting with "part-m-" using PIG command
>> >
>> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
>> > It does not work and it throws error.
>> >
>> > How do I read these files in a single LOAD statement?
>> >
>> > Thanks
>> >
>> >
>>



-- 
Harsh J

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Mix Nin <pi...@gmail.com>.
Hi,

My mistake, I gave backward slashes and so was getting error. I gave
forward slashes and it is working fine.

Good to know that LOAD ignores filenames that begin with "_" or a period
".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
 /Output/part-m*?

Thanks




On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> What is the error?
>
> The LoadFunc should be ignoring any filenames that begin with "_" or a
> period "."
> If you are trying to skip the _SUCCESS file, the loader you are using
> (PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > I have a directory "Output2. It has file names as below
> >
> > -----------------
> > _SUCCESS
> > part-m-00000
> > part-m-00001
> > part-m-00002
> > part-m-00003
> > .
> > .
> > .
> > .
> > part-m-00100
> > -----------------
> >
> > The above files are produced by PIG output STORE command .
> >
> > I want to read the files starting with "part-m-" using PIG command
> >
> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> > It does not work and it throws error.
> >
> > How do I read these files in a single LOAD statement?
> >
> > Thanks
> >
> >
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Mix Nin <pi...@gmail.com>.
Hi,

My mistake, I gave backward slashes and so was getting error. I gave
forward slashes and it is working fine.

Good to know that LOAD ignores filenames that begin with "_" or a period
".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
 /Output/part-m*?

Thanks




On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> What is the error?
>
> The LoadFunc should be ignoring any filenames that begin with "_" or a
> period "."
> If you are trying to skip the _SUCCESS file, the loader you are using
> (PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > I have a directory "Output2. It has file names as below
> >
> > -----------------
> > _SUCCESS
> > part-m-00000
> > part-m-00001
> > part-m-00002
> > part-m-00003
> > .
> > .
> > .
> > .
> > part-m-00100
> > -----------------
> >
> > The above files are produced by PIG output STORE command .
> >
> > I want to read the files starting with "part-m-" using PIG command
> >
> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> > It does not work and it throws error.
> >
> > How do I read these files in a single LOAD statement?
> >
> > Thanks
> >
> >
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Mix Nin <pi...@gmail.com>.
Hi,

My mistake, I gave backward slashes and so was getting error. I gave
forward slashes and it is working fine.

Good to know that LOAD ignores filenames that begin with "_" or a period
".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
 /Output/part-m*?

Thanks




On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> What is the error?
>
> The LoadFunc should be ignoring any filenames that begin with "_" or a
> period "."
> If you are trying to skip the _SUCCESS file, the loader you are using
> (PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > I have a directory "Output2. It has file names as below
> >
> > -----------------
> > _SUCCESS
> > part-m-00000
> > part-m-00001
> > part-m-00002
> > part-m-00003
> > .
> > .
> > .
> > .
> > part-m-00100
> > -----------------
> >
> > The above files are produced by PIG output STORE command .
> >
> > I want to read the files starting with "part-m-" using PIG command
> >
> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> > It does not work and it throws error.
> >
> > How do I read these files in a single LOAD statement?
> >
> > Thanks
> >
> >
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Mix Nin <pi...@gmail.com>.
Hi,

My mistake, I gave backward slashes and so was getting error. I gave
forward slashes and it is working fine.

Good to know that LOAD ignores filenames that begin with "_" or a period
".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
 /Output/part-m*?

Thanks




On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> What is the error?
>
> The LoadFunc should be ignoring any filenames that begin with "_" or a
> period "."
> If you are trying to skip the _SUCCESS file, the loader you are using
> (PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > I have a directory "Output2. It has file names as below
> >
> > -----------------
> > _SUCCESS
> > part-m-00000
> > part-m-00001
> > part-m-00002
> > part-m-00003
> > .
> > .
> > .
> > .
> > part-m-00100
> > -----------------
> >
> > The above files are produced by PIG output STORE command .
> >
> > I want to read the files starting with "part-m-" using PIG command
> >
> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> > It does not work and it throws error.
> >
> > How do I read these files in a single LOAD statement?
> >
> > Thanks
> >
> >
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Mix Nin <pi...@gmail.com>.
Hi,

My mistake, I gave backward slashes and so was getting error. I gave
forward slashes and it is working fine.

Good to know that LOAD ignores filenames that begin with "_" or a period
".". So , in that case can I directly give LOAD /Output/* instead of   LOAD
 /Output/part-m*?

Thanks




On Tue, Jun 11, 2013 at 2:32 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> What is the error?
>
> The LoadFunc should be ignoring any filenames that begin with "_" or a
> period "."
> If you are trying to skip the _SUCCESS file, the loader you are using
> (PigStorage) already handles that.
>
> Also, can you double check your path is not "/Output/part-m* as opposed to
> backward slashes?
>
>
> On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:
>
> > I have a directory "Output2. It has file names as below
> >
> > -----------------
> > _SUCCESS
> > part-m-00000
> > part-m-00001
> > part-m-00002
> > part-m-00003
> > .
> > .
> > .
> > .
> > part-m-00100
> > -----------------
> >
> > The above files are produced by PIG output STORE command .
> >
> > I want to read the files starting with "part-m-" using PIG command
> >
> > When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> > It does not work and it throws error.
> >
> > How do I read these files in a single LOAD statement?
> >
> > Thanks
> >
> >
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Prashant Kommireddi <pr...@gmail.com>.
What is the error?

The LoadFunc should be ignoring any filenames that begin with "_" or a
period "."
If you are trying to skip the _SUCCESS file, the loader you are using
(PigStorage) already handles that.

Also, can you double check your path is not "/Output/part-m* as opposed to
backward slashes?


On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks
>
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Prashant Kommireddi <pr...@gmail.com>.
What is the error?

The LoadFunc should be ignoring any filenames that begin with "_" or a
period "."
If you are trying to skip the _SUCCESS file, the loader you are using
(PigStorage) already handles that.

Also, can you double check your path is not "/Output/part-m* as opposed to
backward slashes?


On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks
>
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Prashant Kommireddi <pr...@gmail.com>.
What is the error?

The LoadFunc should be ignoring any filenames that begin with "_" or a
period "."
If you are trying to skip the _SUCCESS file, the loader you are using
(PigStorage) already handles that.

Also, can you double check your path is not "/Output/part-m* as opposed to
backward slashes?


On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks
>
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Prashant Kommireddi <pr...@gmail.com>.
What is the error?

The LoadFunc should be ignoring any filenames that begin with "_" or a
period "."
If you are trying to skip the _SUCCESS file, the loader you are using
(PigStorage) already handles that.

Also, can you double check your path is not "/Output/part-m* as opposed to
backward slashes?


On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks
>
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Prashant Kommireddi <pr...@gmail.com>.
What is the error?

The LoadFunc should be ignoring any filenames that begin with "_" or a
period "."
If you are trying to skip the _SUCCESS file, the loader you are using
(PigStorage) already handles that.

Also, can you double check your path is not "/Output/part-m* as opposed to
backward slashes?


On Tue, Jun 11, 2013 at 2:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks
>
>

Re: Reading multiple files of a directory using a Single LOAD Command in PIG

Posted by Alan Crosswell <al...@crosswell.us>.
Try forward slashes?

On Jun 11, 2013, at 5:26 PM, Mix Nin <pi...@gmail.com> wrote:

> I have a directory "Output2. It has file names as below
>
> -----------------
> _SUCCESS
> part-m-00000
> part-m-00001
> part-m-00002
> part-m-00003
> .
> .
> .
> .
> part-m-00100
> -----------------
>
> The above files are produced by PIG output STORE command .
>
> I want to read the files starting with "part-m-" using PIG command
>
> When I tried using Data= LOAD '\Output2\part-m-*' AS ( );
> It does not work and it throws error.
>
> How do I read these files in a single LOAD statement?
>
> Thanks