You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jianting Cao <be...@gmail.com> on 2011/05/13 19:08:52 UTC

input into pig

Hi,

 

Is there only one way to load data into pig, i.e. using load command to load
data from files? Can I load data from memory, for example in embedded code
create a table and store data into it?

 

Thanks,

Jianting Cao


Re: input into pig

Posted by Mark Laczin <ma...@gmail.com>.
I'm not sure if Pig can do this.  It's designed to follow the
MapReduce/Hadoop paradigm which typically involves data on disk ->
MapReduce Jobs -> data on disk.

You could try to create a custom InputSplit/RecordReader to read from
a program's standard output or something but this is kind of hacky.
There are RecordReaders which read from SQL databases.  There's also
something like this:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/streaming/StreamBaseRecordReader.html
Which can be used with Hadoop streaming.

But this is all somewhat intensive and would require a bit of work (if
it's even possible) - I don't think Pig has direct support yet for the
kind of interface you're looking for.

That being said, I'm somewhat new to Pig/Hadoop so if there's anyone
else who can chime in with comments or agreements/disagreements, I'd
appreciate it.


On Fri, May 13, 2011 at 1:32 PM, Jianting Cao <be...@gmail.com> wrote:
> Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
> are some program running and generating a lot of data, instead of putting
> these data to a relational database, I want to directly output them to Pig
> and do some analysis along the way or afterwards. So I'm asking if there is
> a JDBC-like interface with which I could load these newly generated data
> into Pig and do analytic. all of this is happening within a Java process.
>
> Jianting
>
> On Fri, May 13, 2011 at 10:14 AM, Mark Laczin <ma...@gmail.com> wrote:
>
>> Technically speaking, yes you could store data in memory and keep it
>> there, then have your program present some interface to store data
>> (shared memory or reading from the stdin or something) but I'm not
>> sure why you'd want to do this.
>>
>> Maybe I'm misunderstanding your question, but it sounds like you want
>> to run using a filesystem that's in memory as opposed to on disk.
>>
>> -Mark
>>
>> On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <be...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> >
>> >
>> > Is there only one way to load data into pig, i.e. using load command to
>> load
>> > data from files? Can I load data from memory, for example in embedded
>> code
>> > create a table and store data into it?
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Jianting Cao
>> >
>> >
>>
>

Re: input into pig

Posted by Jianting Cao <be...@gmail.com>.
Thank you Mark. Sorry that I'm not clear enough. What I want is this, there
are some program running and generating a lot of data, instead of putting
these data to a relational database, I want to directly output them to Pig
and do some analysis along the way or afterwards. So I'm asking if there is
a JDBC-like interface with which I could load these newly generated data
into Pig and do analytic. all of this is happening within a Java process.

Jianting

On Fri, May 13, 2011 at 10:14 AM, Mark Laczin <ma...@gmail.com> wrote:

> Technically speaking, yes you could store data in memory and keep it
> there, then have your program present some interface to store data
> (shared memory or reading from the stdin or something) but I'm not
> sure why you'd want to do this.
>
> Maybe I'm misunderstanding your question, but it sounds like you want
> to run using a filesystem that's in memory as opposed to on disk.
>
> -Mark
>
> On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <be...@gmail.com>
> wrote:
> > Hi,
> >
> >
> >
> > Is there only one way to load data into pig, i.e. using load command to
> load
> > data from files? Can I load data from memory, for example in embedded
> code
> > create a table and store data into it?
> >
> >
> >
> > Thanks,
> >
> > Jianting Cao
> >
> >
>

Re: input into pig

Posted by Mark Laczin <ma...@gmail.com>.
Technically speaking, yes you could store data in memory and keep it
there, then have your program present some interface to store data
(shared memory or reading from the stdin or something) but I'm not
sure why you'd want to do this.

Maybe I'm misunderstanding your question, but it sounds like you want
to run using a filesystem that's in memory as opposed to on disk.

-Mark

On Fri, May 13, 2011 at 1:08 PM, Jianting Cao <be...@gmail.com> wrote:
> Hi,
>
>
>
> Is there only one way to load data into pig, i.e. using load command to load
> data from files? Can I load data from memory, for example in embedded code
> create a table and store data into it?
>
>
>
> Thanks,
>
> Jianting Cao
>
>