You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Margus Roo <ma...@roo.ee> on 2014/12/15 18:41:56 UTC
Store lines in to separate files
Hi
I have files contain timestamp. I'd like to parse row by row and put
them into file by timestamp.
in example
original file:
20140801,...,...,...,...,...
20140802,...,...,...,...,...
20140801,...,...,...,...,...
...
So I'd like to parse this rows to separate files 20140801 and 20140802
so that file
20140801.csv contains:
20140801,...,...,...,...,...
20140801,...,...,...,...,...
and 20140802.csv contains
20140802,...,...,...,...,...
I tried to write my own custom StorageFunc but as much I understand I
can not do it there.
I read about MultyStorage maybe this is the right tool to try? Or Pig
totally wrong tool for that problem?
--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
Re: Store lines in to separate files
Posted by Margus Roo <ma...@roo.ee>.
Hmm, nice function. I'll play with it a little to get a feeling is it
suitable for me, because this is only part of my problem :)
But thanks for replay!
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 480
On 15/12/14 19:45, Alex Nastetsky wrote:
> Check out the SPLIT function:
> https://pig.apache.org/docs/r0.14.0/basic.html#SPLIT
>
> Split your input into two projections and store them into different files.
>
> On Mon, Dec 15, 2014 at 12:41 PM, Margus Roo <ma...@roo.ee> wrote:
>
>> Hi
>>
>> I have files contain timestamp. I'd like to parse row by row and put them
>> into file by timestamp.
>> in example
>>
>> original file:
>> 20140801,...,...,...,...,...
>> 20140802,...,...,...,...,...
>> 20140801,...,...,...,...,...
>> ...
>>
>> So I'd like to parse this rows to separate files 20140801 and 20140802 so
>> that file
>> 20140801.csv contains:
>> 20140801,...,...,...,...,...
>> 20140801,...,...,...,...,...
>>
>> and 20140802.csv contains
>> 20140802,...,...,...,...,...
>>
>> I tried to write my own custom StorageFunc but as much I understand I can
>> not do it there.
>> I read about MultyStorage maybe this is the right tool to try? Or Pig
>> totally wrong tool for that problem?
>>
>> --
>> Margus (margusja) Roo
>> http://margus.roo.ee
>> skype: margusja
>> +372 51 480
>>
>>
Re: Store lines in to separate files
Posted by Alex Nastetsky <al...@vervemobile.com>.
Check out the SPLIT function:
https://pig.apache.org/docs/r0.14.0/basic.html#SPLIT
Split your input into two projections and store them into different files.
On Mon, Dec 15, 2014 at 12:41 PM, Margus Roo <ma...@roo.ee> wrote:
> Hi
>
> I have files contain timestamp. I'd like to parse row by row and put them
> into file by timestamp.
> in example
>
> original file:
> 20140801,...,...,...,...,...
> 20140802,...,...,...,...,...
> 20140801,...,...,...,...,...
> ...
>
> So I'd like to parse this rows to separate files 20140801 and 20140802 so
> that file
> 20140801.csv contains:
> 20140801,...,...,...,...,...
> 20140801,...,...,...,...,...
>
> and 20140802.csv contains
> 20140802,...,...,...,...,...
>
> I tried to write my own custom StorageFunc but as much I understand I can
> not do it there.
> I read about MultyStorage maybe this is the right tool to try? Or Pig
> totally wrong tool for that problem?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 480
>
>
Re: Store lines in to separate files
Posted by Arvind S <ar...@gmail.com>.
You can use multi storage to write out separate files based on a grouping
column ..
you would need to first make a unified data set with one of the columns
(may be the 1st one ..as below) having the grouping/file name needed ..
e.g.
C1
20140801,{..some content},{..some content}...,
20140802,{..some content},{..some content}....
:
:
then use
STORE slias INTO '$path' USING
org.apache.pig.piggybank.storage.MultiStorage('path','0', 'none', '|');
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java
Cheers !!!
Arvind
On 15-Dec-2014 11:12 pm, "Margus Roo" <ma...@roo.ee> wrote:
> Hi
>
> I have files contain timestamp. I'd like to parse row by row and put them
> into file by timestamp.
> in example
>
> original file:
> 20140801,...,...,...,...,...
> 20140802,...,...,...,...,...
> 20140801,...,...,...,...,...
> ...
>
> So I'd like to parse this rows to separate files 20140801 and 20140802 so
> that file
> 20140801.csv contains:
> 20140801,...,...,...,...,...
> 20140801,...,...,...,...,...
>
> and 20140802.csv contains
> 20140802,...,...,...,...,...
>
> I tried to write my own custom StorageFunc but as much I understand I can
> not do it there.
> I read about MultyStorage maybe this is the right tool to try? Or Pig
> totally wrong tool for that problem?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 480
>
>