You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/07/07 23:56:44 UTC

Pig for MongoDB

I want Pig for MongoDB, for acting on smaller datasets in realtime. Is that
crazy? Given that the MR code is just JSON, isn't this easier than creating
Hadoop MapReduce?

Crazy idea, I'm just curious if this might not be too hard owing to the
json interface to Mongo MapReduce.

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Pig for MongoDB

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
You'd have to do something about shipping the UDFs to Mongo. But try
it -- the generalization code that was pulled was stuff like fs
abstraction (want to work on something that's not HDFS? just implement
the FileSystem interface from hadoop, like S3 and Cassandra did) and
"slices" (just use an InputFormat!). You'd have to essentially write a
parallel MRCompiler and switch to using it if mongo mode is set. There
may be other problems, of course, but see how far you can get, it'd be
interesting.

.. also, for smaller datasets in realtime, I just use local mode. It
can read from remote file systems, and is fast again.

D

On Sat, Jul 7, 2012 at 5:53 PM, Russell Jurney <ru...@gmail.com> wrote:
> I'm actually talking about implementing another system underneath Pig,
> MongoDB along with Hadoop. Write a pig script, pig translates it to Mongo
> MapReduce instead of Hadoop MapReduce, if you so desire.  I know the
> generalization code was pulled a long time ago (for multiple engines
> underneath Pig, Hadoop + some), so I'm wondering how hard Pig/MongoDB would
> be to implement.
>
> I'd like to see Pig spread beyond Hadoop, and MongoDB's simple json
> MapReduce system might make this easy?
>
> On Sat, Jul 7, 2012 at 5:39 PM, Alan Gates <ga...@hortonworks.com> wrote:
>
>> There are mongo load and store functions for pig at
>> https://github.com/mongodb/mongo-hadoop/ Is this what you were looking
>> for or were you more asking if pig and mongo play well together?
>>
>> Alan.
>>
>> On Jul 7, 2012, at 2:56 PM, Russell Jurney wrote:
>>
>> > I want Pig for MongoDB, for acting on smaller datasets in realtime. Is
>> that
>> > crazy? Given that the MR code is just JSON, isn't this easier than
>> creating
>> > Hadoop MapReduce?
>> >
>> > Crazy idea, I'm just curious if this might not be too hard owing to the
>> > json interface to Mongo MapReduce.
>> >
>> > --
>> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Pig for MongoDB

Posted by Russell Jurney <ru...@gmail.com>.
I'm actually talking about implementing another system underneath Pig,
MongoDB along with Hadoop. Write a pig script, pig translates it to Mongo
MapReduce instead of Hadoop MapReduce, if you so desire.  I know the
generalization code was pulled a long time ago (for multiple engines
underneath Pig, Hadoop + some), so I'm wondering how hard Pig/MongoDB would
be to implement.

I'd like to see Pig spread beyond Hadoop, and MongoDB's simple json
MapReduce system might make this easy?

On Sat, Jul 7, 2012 at 5:39 PM, Alan Gates <ga...@hortonworks.com> wrote:

> There are mongo load and store functions for pig at
> https://github.com/mongodb/mongo-hadoop/ Is this what you were looking
> for or were you more asking if pig and mongo play well together?
>
> Alan.
>
> On Jul 7, 2012, at 2:56 PM, Russell Jurney wrote:
>
> > I want Pig for MongoDB, for acting on smaller datasets in realtime. Is
> that
> > crazy? Given that the MR code is just JSON, isn't this easier than
> creating
> > Hadoop MapReduce?
> >
> > Crazy idea, I'm just curious if this might not be too hard owing to the
> > json interface to Mongo MapReduce.
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Pig for MongoDB

Posted by Alan Gates <ga...@hortonworks.com>.
There are mongo load and store functions for pig at https://github.com/mongodb/mongo-hadoop/ Is this what you were looking for or were you more asking if pig and mongo play well together?

Alan.

On Jul 7, 2012, at 2:56 PM, Russell Jurney wrote:

> I want Pig for MongoDB, for acting on smaller datasets in realtime. Is that
> crazy? Given that the MR code is just JSON, isn't this easier than creating
> Hadoop MapReduce?
> 
> Crazy idea, I'm just curious if this might not be too hard owing to the
> json interface to Mongo MapReduce.
> 
> -- 
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com