You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jimmy <ji...@gmail.com> on 2014/07/11 20:38:11 UTC

using MultiStorage

 have a directory with files with somewhat mailformatted logs (NEWLINE
delimited).

I would like to select specific position in each row and use it as a
directory/file name, then store the original content as-is in the files.
Basically re-partition files based on the content.

code below works just fine and does almost what I expect, the problem is
that the substring called "myfile" is now inside of the new file because B
is a tuple, is there a way to store the original relation, in my case A in
the file and use "myfile" as a file name meaning preserve the original
files content as is?

thank you

REGISTER /lib/pig/piggybank.jar;

A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
B = FOREACH A GENERATE SUBSTRING(mytext,5,7) as myfile, mytext;
STORE B INTO '/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/outpu

Re: using MultiStorage

Posted by Jimmy <ji...@gmail.com>.
I apologize, the code got cut off


REGISTER /lib/pig/piggybank.jar;

A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
B = FOREACH A GENERATE SUBSTRING(mytext,5,7), mytext;
STORE B INTO '/output' USING
org.apache.pig.piggybank.storage.MultiStorage('/output', '0', 'none', ' ') ;





On Fri, Jul 11, 2014 at 11:38 AM, Jimmy <ji...@gmail.com> wrote:

>  have a directory with files with somewhat mailformatted logs (NEWLINE
> delimited).
>
> I would like to select specific position in each row and use it as a
> directory/file name, then store the original content as-is in the files.
> Basically re-partition files based on the content.
>
> code below works just fine and does almost what I expect, the problem is
> that the substring called "myfile" is now inside of the new file because B
> is a tuple, is there a way to store the original relation, in my case A in
> the file and use "myfile" as a file name meaning preserve the original
> files content as is?
>
> thank you
>
> REGISTER /lib/pig/piggybank.jar;
>
> A = LOAD '/raw/*' USING PigStorage('\n') AS (mytext:chararray);
> B = FOREACH A GENERATE SUBSTRING(mytext,5,7) as myfile, mytext;
> STORE B INTO '/output' USING org.apache.pig.piggybank.storage.MultiStorage('/outpu
>
>