You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jumsheed <ju...@gmail.com> on 2015/01/06 14:51:02 UTC

split data into multiple files

Hi,

I have a file with data in below format,

A
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
3
B
abcdefghijklmnop
abcdefghijklmnop
2
C
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
4

i need to create three files like

file1:
A
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
3

file2:
B
abcdefghijklmnop
abcdefghijklmnop
2

file3:
C
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
4

is there any way you can suggest?

Thanks
Jumsheed

Re: split data into multiple files

Posted by Rodrigo Ferreira <we...@gmail.com>.
Would this help you in any way?

http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html

It seems that if you could create a column with the specified group, you
could save them in separate files.

Rodrigo.

2015-01-06 21:04 GMT-02:00 Jumsheed Kottachery <ju...@gmail.com>:

> Dave,
>
> so i should create three files , A,B and C with contents like
> below(without headers and trailers)
> file A:
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
>
> file B:
> abcdefghijklmnop
> abcdefghijklmnop
>
> file C:
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
>
> is there any way to split file with  the line number? or how i can split
> into 3 files?
>
> Thanks
> Jumsheed
>
>
>
> > On Jan 6, 2015, at 9:54 AM, David Warshaw <da...@cobrain.com> wrote:
> >
> > Carrying headers and trailers through Pig (or really any ETL pipeline) as
> > data rows will be awkward.
> > De-concatenated (or pre-concatenated) files with the metadata already
> > stripped out could be loaded using the PigStorage loader with the tag
> path
> > setting. This would allow you to differentiate the records by source in
> > your script.
> >
> > On Tue, Jan 6, 2015 at 9:29 AM, Jumsheed <ju...@gmail.com> wrote:
> >
> >> Yes i checked SPLIT and MultiStorage , but i didn't find find any way to
> >> group each section.
> >>
> >> On Tue, Jan 6, 2015 at 8:55 AM, Shahab Yunus <sh...@gmail.com>
> >> wrote:
> >>
> >>> Have you looked at the SPLIT operator in Pig? Does that help?
> >>> http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT
> >>>
> >>> Regards,
> >>> Shahab
> >>>
> >>> On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed <ju...@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a file with data in below format,
> >>>>
> >>>> A
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 3
> >>>> B
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 2
> >>>> C
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 4
> >>>>
> >>>> i need to create three files like
> >>>>
> >>>> file1:
> >>>> A
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 3
> >>>>
> >>>> file2:
> >>>> B
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 2
> >>>>
> >>>> file3:
> >>>> C
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> abcdefghijklmnop
> >>>> 4
> >>>>
> >>>> is there any way you can suggest?
> >>>>
> >>>> Thanks
> >>>> Jumsheed
> >>>>
> >>>
> >>
>
>

Re: split data into multiple files

Posted by Jumsheed Kottachery <ju...@gmail.com>.
Dave,

so i should create three files , A,B and C with contents like below(without headers and trailers)
file A:
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop

file B:
abcdefghijklmnop
abcdefghijklmnop

file C:
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop
abcdefghijklmnop

is there any way to split file with  the line number? or how i can split into 3 files?

Thanks
Jumsheed



> On Jan 6, 2015, at 9:54 AM, David Warshaw <da...@cobrain.com> wrote:
> 
> Carrying headers and trailers through Pig (or really any ETL pipeline) as
> data rows will be awkward.
> De-concatenated (or pre-concatenated) files with the metadata already
> stripped out could be loaded using the PigStorage loader with the tag path
> setting. This would allow you to differentiate the records by source in
> your script.
> 
> On Tue, Jan 6, 2015 at 9:29 AM, Jumsheed <ju...@gmail.com> wrote:
> 
>> Yes i checked SPLIT and MultiStorage , but i didn't find find any way to
>> group each section.
>> 
>> On Tue, Jan 6, 2015 at 8:55 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>> 
>>> Have you looked at the SPLIT operator in Pig? Does that help?
>>> http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT
>>> 
>>> Regards,
>>> Shahab
>>> 
>>> On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed <ju...@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have a file with data in below format,
>>>> 
>>>> A
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 3
>>>> B
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 2
>>>> C
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 4
>>>> 
>>>> i need to create three files like
>>>> 
>>>> file1:
>>>> A
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 3
>>>> 
>>>> file2:
>>>> B
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 2
>>>> 
>>>> file3:
>>>> C
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> abcdefghijklmnop
>>>> 4
>>>> 
>>>> is there any way you can suggest?
>>>> 
>>>> Thanks
>>>> Jumsheed
>>>> 
>>> 
>> 


Re: split data into multiple files

Posted by David Warshaw <da...@cobrain.com>.
Carrying headers and trailers through Pig (or really any ETL pipeline) as
data rows will be awkward.
De-concatenated (or pre-concatenated) files with the metadata already
stripped out could be loaded using the PigStorage loader with the tag path
setting. This would allow you to differentiate the records by source in
your script.

On Tue, Jan 6, 2015 at 9:29 AM, Jumsheed <ju...@gmail.com> wrote:

> Yes i checked SPLIT and MultiStorage , but i didn't find find any way to
> group each section.
>
> On Tue, Jan 6, 2015 at 8:55 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Have you looked at the SPLIT operator in Pig? Does that help?
> > http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT
> >
> > Regards,
> > Shahab
> >
> > On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed <ju...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have a file with data in below format,
> > >
> > > A
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 3
> > > B
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 2
> > > C
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 4
> > >
> > > i need to create three files like
> > >
> > > file1:
> > > A
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 3
> > >
> > > file2:
> > > B
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 2
> > >
> > > file3:
> > > C
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > abcdefghijklmnop
> > > 4
> > >
> > > is there any way you can suggest?
> > >
> > > Thanks
> > > Jumsheed
> > >
> >
>

Re: split data into multiple files

Posted by Jumsheed <ju...@gmail.com>.
Yes i checked SPLIT and MultiStorage , but i didn't find find any way to
group each section.

On Tue, Jan 6, 2015 at 8:55 AM, Shahab Yunus <sh...@gmail.com> wrote:

> Have you looked at the SPLIT operator in Pig? Does that help?
> http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT
>
> Regards,
> Shahab
>
> On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed <ju...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a file with data in below format,
> >
> > A
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 3
> > B
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 2
> > C
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 4
> >
> > i need to create three files like
> >
> > file1:
> > A
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 3
> >
> > file2:
> > B
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 2
> >
> > file3:
> > C
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > abcdefghijklmnop
> > 4
> >
> > is there any way you can suggest?
> >
> > Thanks
> > Jumsheed
> >
>

Re: split data into multiple files

Posted by Shahab Yunus <sh...@gmail.com>.
Have you looked at the SPLIT operator in Pig? Does that help?
http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT

Regards,
Shahab

On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed <ju...@gmail.com> wrote:

> Hi,
>
> I have a file with data in below format,
>
> A
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> 3
> B
> abcdefghijklmnop
> abcdefghijklmnop
> 2
> C
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> 4
>
> i need to create three files like
>
> file1:
> A
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> 3
>
> file2:
> B
> abcdefghijklmnop
> abcdefghijklmnop
> 2
>
> file3:
> C
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> abcdefghijklmnop
> 4
>
> is there any way you can suggest?
>
> Thanks
> Jumsheed
>