You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Margus Roo <ma...@roo.ee> on 2014/12/15 18:41:56 UTC

Store lines in to separate files

Hi

I have files contain timestamp. I'd like to parse row by row and put 
them into file by timestamp.
in example

original file:
20140801,...,...,...,...,...
20140802,...,...,...,...,...
20140801,...,...,...,...,...
...

So I'd like to parse this rows to separate files 20140801 and 20140802 
so that file
20140801.csv contains:
20140801,...,...,...,...,...
20140801,...,...,...,...,...

and 20140802.csv contains
20140802,...,...,...,...,...

I tried to write my own custom StorageFunc but as much I understand I 
can not do it there.
I read about MultyStorage maybe this is the right tool to try? Or Pig 
totally wrong tool for that problem?

-- 
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480


Re: Store lines in to separate files

Posted by Margus Roo <ma...@roo.ee>.
Hmm, nice function. I'll play with it a little to get a feeling is it 
suitable for me, because this is only part of my problem :)
But thanks for replay!

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480

On 15/12/14 19:45, Alex Nastetsky wrote:
> Check out the SPLIT function:
> https://pig.apache.org/docs/r0.14.0/basic.html#SPLIT
>
> Split your input into two projections and store them into different files.
>
> On Mon, Dec 15, 2014 at 12:41 PM, Margus Roo <ma...@roo.ee> wrote:
>
>> Hi
>>
>> I have files contain timestamp. I'd like to parse row by row and put them
>> into file by timestamp.
>> in example
>>
>> original file:
>> 20140801,...,...,...,...,...
>> 20140802,...,...,...,...,...
>> 20140801,...,...,...,...,...
>> ...
>>
>> So I'd like to parse this rows to separate files 20140801 and 20140802 so
>> that file
>> 20140801.csv contains:
>> 20140801,...,...,...,...,...
>> 20140801,...,...,...,...,...
>>
>> and 20140802.csv contains
>> 20140802,...,...,...,...,...
>>
>> I tried to write my own custom StorageFunc but as much I understand I can
>> not do it there.
>> I read about MultyStorage maybe this is the right tool to try? Or Pig
>> totally wrong tool for that problem?
>>
>> --
>> Margus (margusja) Roo
>> http://margus.roo.ee
>> skype: margusja
>> +372 51 480
>>
>>


Re: Store lines in to separate files

Posted by Alex Nastetsky <al...@vervemobile.com>.
Check out the SPLIT function:
https://pig.apache.org/docs/r0.14.0/basic.html#SPLIT

Split your input into two projections and store them into different files.

On Mon, Dec 15, 2014 at 12:41 PM, Margus Roo <ma...@roo.ee> wrote:

> Hi
>
> I have files contain timestamp. I'd like to parse row by row and put them
> into file by timestamp.
> in example
>
> original file:
> 20140801,...,...,...,...,...
> 20140802,...,...,...,...,...
> 20140801,...,...,...,...,...
> ...
>
> So I'd like to parse this rows to separate files 20140801 and 20140802 so
> that file
> 20140801.csv contains:
> 20140801,...,...,...,...,...
> 20140801,...,...,...,...,...
>
> and 20140802.csv contains
> 20140802,...,...,...,...,...
>
> I tried to write my own custom StorageFunc but as much I understand I can
> not do it there.
> I read about MultyStorage maybe this is the right tool to try? Or Pig
> totally wrong tool for that problem?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 480
>
>

Re: Store lines in to separate files

Posted by Arvind S <ar...@gmail.com>.
You can use multi storage to write out separate files based on a grouping
column ..

you would need to first make a unified data set with one of the columns
(may be the 1st one ..as below) having the grouping/file name needed ..

e.g.

C1

20140801,{..some content},{..some content}...,

20140802,{..some content},{..some content}....

:

:

then use

STORE slias INTO '$path' USING
org.apache.pig.piggybank.storage.MultiStorage('path','0', 'none', '|');

http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java

Cheers !!!
Arvind
On 15-Dec-2014 11:12 pm, "Margus Roo" <ma...@roo.ee> wrote:

> Hi
>
> I have files contain timestamp. I'd like to parse row by row and put them
> into file by timestamp.
> in example
>
> original file:
> 20140801,...,...,...,...,...
> 20140802,...,...,...,...,...
> 20140801,...,...,...,...,...
> ...
>
> So I'd like to parse this rows to separate files 20140801 and 20140802 so
> that file
> 20140801.csv contains:
> 20140801,...,...,...,...,...
> 20140801,...,...,...,...,...
>
> and 20140802.csv contains
> 20140802,...,...,...,...,...
>
> I tried to write my own custom StorageFunc but as much I understand I can
> not do it there.
> I read about MultyStorage maybe this is the right tool to try? Or Pig
> totally wrong tool for that problem?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 480
>
>