You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cesar Pumar García <ce...@beeva.com> on 2014/12/03 09:20:34 UTC

Store in MongoDB with a Pig UDF

Hi there,

We are given a text file containing several lines, where each one
corresponds a mongo document, and we load it as follows:

DEFINE PigToMongo com.beeva.PigToMongo.PigToMongo();

A = LOAD '/home/hduser/pigfiles/input.txt' USING TextLoader() AS
(line:chararray);

B = FOREACH A GENERATE PigToMongo(line);

DUMP B

By using PigToMongo(line), we connect to mongo, map A, write and close the
connection.

PigToMongo creates a connection for each line as follows (which implies our
MongoDB is down*):

    MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
    DB db = mongoClient.getDB( "hadoopDB" );
    DBCollection coll = db.getCollection("output0");

I wonder whether it is possible to open and close the connection only once,
outside the UDF.

   - By the way, does MongoDB support multiple connections at the same
   time? (from several reducers storing data during a map/reduce job, for
   example)


Thank you,


*CÉSAR PUMAR GARCÍA*

*BEEVA FOR GRADUATES*



*cesar.pumar@beeva.com <ce...@beeva.com>[image: www.beeva.com]
<http://www.beeva.com>*

Re: Store in MongoDB with a Pig UDF

Posted by Russell Jurney <ru...@gmail.com>.
May I ask why you wrote your own MongoDB UDF when one exists?
https://github.com/mongodb/mongo-hadoop/blob/master/pig/README.md
ᐧ

On Thu, Dec 4, 2014 at 3:58 PM, Suraj Nayak M <sn...@gmail.com> wrote:

> Hi Cesar,
>
> UDF is good for processing data. For writing data you should write custom
> Storer. Also, for writing data into MongoDB there is already Storer written
> available in GitHub with more rich features.
>
> Take a look at : https://github.com/mongodb/mongo-hadoop/tree/master/pig
>
> MongoStorage code : https://github.com/mongodb/
> mongo-hadoop/blob/master/pig/src/main/java/com/mongodb/
> hadoop/pig/MongoStorage.java
>
> Thanks
> Suraj Nayak
>
>
> On Wednesday 03 December 2014 01:50 PM, Cesar Pumar García wrote:
>
>> Hi there,
>>
>> We are given a text file containing several lines, where each one
>> corresponds a mongo document, and we load it as follows:
>>
>> DEFINE PigToMongo com.beeva.PigToMongo.PigToMongo();
>>
>> A = LOAD '/home/hduser/pigfiles/input.txt' USING TextLoader() AS
>> (line:chararray);
>>
>> B = FOREACH A GENERATE PigToMongo(line);
>>
>> DUMP B
>>
>> By using PigToMongo(line), we connect to mongo, map A, write and close the
>> connection.
>>
>> PigToMongo creates a connection for each line as follows (which implies
>> our
>> MongoDB is down*):
>>
>>      MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
>>      DB db = mongoClient.getDB( "hadoopDB" );
>>      DBCollection coll = db.getCollection("output0");
>>
>> I wonder whether it is possible to open and close the connection only
>> once,
>> outside the UDF.
>>
>>     - By the way, does MongoDB support multiple connections at the same
>>     time? (from several reducers storing data during a map/reduce job, for
>>     example)
>>
>>
>> Thank you,
>>
>>
>> *CÉSAR PUMAR GARCÍA*
>>
>> *BEEVA FOR GRADUATES*
>>
>>
>>
>> *cesar.pumar@beeva.com <ce...@beeva.com>[image: www.beeva.com]
>> <http://www.beeva.com>*
>>
>>
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Store in MongoDB with a Pig UDF

Posted by Suraj Nayak M <sn...@gmail.com>.
Hi Cesar,

UDF is good for processing data. For writing data you should write 
custom Storer. Also, for writing data into MongoDB there is already 
Storer written available in GitHub with more rich features.

Take a look at : https://github.com/mongodb/mongo-hadoop/tree/master/pig

MongoStorage code : 
https://github.com/mongodb/mongo-hadoop/blob/master/pig/src/main/java/com/mongodb/hadoop/pig/MongoStorage.java

Thanks
Suraj Nayak

On Wednesday 03 December 2014 01:50 PM, Cesar Pumar García wrote:
> Hi there,
>
> We are given a text file containing several lines, where each one
> corresponds a mongo document, and we load it as follows:
>
> DEFINE PigToMongo com.beeva.PigToMongo.PigToMongo();
>
> A = LOAD '/home/hduser/pigfiles/input.txt' USING TextLoader() AS
> (line:chararray);
>
> B = FOREACH A GENERATE PigToMongo(line);
>
> DUMP B
>
> By using PigToMongo(line), we connect to mongo, map A, write and close the
> connection.
>
> PigToMongo creates a connection for each line as follows (which implies our
> MongoDB is down*):
>
>      MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
>      DB db = mongoClient.getDB( "hadoopDB" );
>      DBCollection coll = db.getCollection("output0");
>
> I wonder whether it is possible to open and close the connection only once,
> outside the UDF.
>
>     - By the way, does MongoDB support multiple connections at the same
>     time? (from several reducers storing data during a map/reduce job, for
>     example)
>
>
> Thank you,
>
>
> *CÉSAR PUMAR GARCÍA*
>
> *BEEVA FOR GRADUATES*
>
>
>
> *cesar.pumar@beeva.com <ce...@beeva.com>[image: www.beeva.com]
> <http://www.beeva.com>*
>