You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/05/28 21:57:21 UTC

Not saving any output

Hi,
  I want to process some text files and then save the output in a db.
I am using python (hadoop streaming).
I am using mongo as backend server.
Is it possible to run hadoop streaming jobs without specifying any output?
What is the best way to deal with this.

Re: Not saving any output

Posted by Pramod N <np...@gmail.com>.
*Sqoop* is often used in this scenario.

You might also want to look at https://github.com/mongodb/mongo-hadoop
*MongoDBHadoop
Connector*.
More on streaming support can be found here
http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html
There are pros and cons. Choose what suits you the best.



Pramod N <http://atmachinelearner.blogspot.in>
Bruce Wayne of web
@machinelearner <https://twitter.com/machinelearner>

--


On Wed, May 29, 2013 at 2:13 AM, Kai Voigt <k...@123.org> wrote:

> You can have your python streaming script simply not write any key/value
> pairs to stdout, so you'll get an empty job output.
>
> Independently, your script could do anything external, such as connecting
> to a remote database and store data in those. You probably want to avoid
> too many tasks doing this in parallel.
>
> But more common would be a regular job which writes data to HDFS, and then
> use Sqoop to store that data into a RDBMS. But it's your choice.
>
> Kai
>
> Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:
>
> > Hi,
> >   I want to process some text files and then save the output in a db.
> > I am using python (hadoop streaming).
> > I am using mongo as backend server.
> > Is it possible to run hadoop streaming jobs without specifying any
> output?
> > What is the best way to deal with this.
> >
>
> --
> Kai Voigt
> k@123.org
>
>
>
>
>

Re: Not saving any output

Posted by Pramod N <np...@gmail.com>.
*Sqoop* is often used in this scenario.

You might also want to look at https://github.com/mongodb/mongo-hadoop
*MongoDBHadoop
Connector*.
More on streaming support can be found here
http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html
There are pros and cons. Choose what suits you the best.



Pramod N <http://atmachinelearner.blogspot.in>
Bruce Wayne of web
@machinelearner <https://twitter.com/machinelearner>

--


On Wed, May 29, 2013 at 2:13 AM, Kai Voigt <k...@123.org> wrote:

> You can have your python streaming script simply not write any key/value
> pairs to stdout, so you'll get an empty job output.
>
> Independently, your script could do anything external, such as connecting
> to a remote database and store data in those. You probably want to avoid
> too many tasks doing this in parallel.
>
> But more common would be a regular job which writes data to HDFS, and then
> use Sqoop to store that data into a RDBMS. But it's your choice.
>
> Kai
>
> Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:
>
> > Hi,
> >   I want to process some text files and then save the output in a db.
> > I am using python (hadoop streaming).
> > I am using mongo as backend server.
> > Is it possible to run hadoop streaming jobs without specifying any
> output?
> > What is the best way to deal with this.
> >
>
> --
> Kai Voigt
> k@123.org
>
>
>
>
>

Re: Not saving any output

Posted by Pramod N <np...@gmail.com>.
*Sqoop* is often used in this scenario.

You might also want to look at https://github.com/mongodb/mongo-hadoop
*MongoDBHadoop
Connector*.
More on streaming support can be found here
http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html
There are pros and cons. Choose what suits you the best.



Pramod N <http://atmachinelearner.blogspot.in>
Bruce Wayne of web
@machinelearner <https://twitter.com/machinelearner>

--


On Wed, May 29, 2013 at 2:13 AM, Kai Voigt <k...@123.org> wrote:

> You can have your python streaming script simply not write any key/value
> pairs to stdout, so you'll get an empty job output.
>
> Independently, your script could do anything external, such as connecting
> to a remote database and store data in those. You probably want to avoid
> too many tasks doing this in parallel.
>
> But more common would be a regular job which writes data to HDFS, and then
> use Sqoop to store that data into a RDBMS. But it's your choice.
>
> Kai
>
> Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:
>
> > Hi,
> >   I want to process some text files and then save the output in a db.
> > I am using python (hadoop streaming).
> > I am using mongo as backend server.
> > Is it possible to run hadoop streaming jobs without specifying any
> output?
> > What is the best way to deal with this.
> >
>
> --
> Kai Voigt
> k@123.org
>
>
>
>
>

Re: Not saving any output

Posted by Pramod N <np...@gmail.com>.
*Sqoop* is often used in this scenario.

You might also want to look at https://github.com/mongodb/mongo-hadoop
*MongoDBHadoop
Connector*.
More on streaming support can be found here
http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html
There are pros and cons. Choose what suits you the best.



Pramod N <http://atmachinelearner.blogspot.in>
Bruce Wayne of web
@machinelearner <https://twitter.com/machinelearner>

--


On Wed, May 29, 2013 at 2:13 AM, Kai Voigt <k...@123.org> wrote:

> You can have your python streaming script simply not write any key/value
> pairs to stdout, so you'll get an empty job output.
>
> Independently, your script could do anything external, such as connecting
> to a remote database and store data in those. You probably want to avoid
> too many tasks doing this in parallel.
>
> But more common would be a regular job which writes data to HDFS, and then
> use Sqoop to store that data into a RDBMS. But it's your choice.
>
> Kai
>
> Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:
>
> > Hi,
> >   I want to process some text files and then save the output in a db.
> > I am using python (hadoop streaming).
> > I am using mongo as backend server.
> > Is it possible to run hadoop streaming jobs without specifying any
> output?
> > What is the best way to deal with this.
> >
>
> --
> Kai Voigt
> k@123.org
>
>
>
>
>

Re: Not saving any output

Posted by Kai Voigt <k...@123.org>.
You can have your python streaming script simply not write any key/value pairs to stdout, so you'll get an empty job output.

Independently, your script could do anything external, such as connecting to a remote database and store data in those. You probably want to avoid too many tasks doing this in parallel.

But more common would be a regular job which writes data to HDFS, and then use Sqoop to store that data into a RDBMS. But it's your choice.

Kai

Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:

> Hi,
>   I want to process some text files and then save the output in a db.
> I am using python (hadoop streaming).
> I am using mongo as backend server.
> Is it possible to run hadoop streaming jobs without specifying any output?
> What is the best way to deal with this.
> 

-- 
Kai Voigt
k@123.org





Re: Not saving any output

Posted by Kai Voigt <k...@123.org>.
You can have your python streaming script simply not write any key/value pairs to stdout, so you'll get an empty job output.

Independently, your script could do anything external, such as connecting to a remote database and store data in those. You probably want to avoid too many tasks doing this in parallel.

But more common would be a regular job which writes data to HDFS, and then use Sqoop to store that data into a RDBMS. But it's your choice.

Kai

Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:

> Hi,
>   I want to process some text files and then save the output in a db.
> I am using python (hadoop streaming).
> I am using mongo as backend server.
> Is it possible to run hadoop streaming jobs without specifying any output?
> What is the best way to deal with this.
> 

-- 
Kai Voigt
k@123.org





Re: Not saving any output

Posted by Kai Voigt <k...@123.org>.
You can have your python streaming script simply not write any key/value pairs to stdout, so you'll get an empty job output.

Independently, your script could do anything external, such as connecting to a remote database and store data in those. You probably want to avoid too many tasks doing this in parallel.

But more common would be a regular job which writes data to HDFS, and then use Sqoop to store that data into a RDBMS. But it's your choice.

Kai

Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:

> Hi,
>   I want to process some text files and then save the output in a db.
> I am using python (hadoop streaming).
> I am using mongo as backend server.
> Is it possible to run hadoop streaming jobs without specifying any output?
> What is the best way to deal with this.
> 

-- 
Kai Voigt
k@123.org





Re: Not saving any output

Posted by Kai Voigt <k...@123.org>.
You can have your python streaming script simply not write any key/value pairs to stdout, so you'll get an empty job output.

Independently, your script could do anything external, such as connecting to a remote database and store data in those. You probably want to avoid too many tasks doing this in parallel.

But more common would be a regular job which writes data to HDFS, and then use Sqoop to store that data into a RDBMS. But it's your choice.

Kai

Am 28.05.2013 um 20:57 schrieb jamal sasha <ja...@gmail.com>:

> Hi,
>   I want to process some text files and then save the output in a db.
> I am using python (hadoop streaming).
> I am using mongo as backend server.
> Is it possible to run hadoop streaming jobs without specifying any output?
> What is the best way to deal with this.
> 

-- 
Kai Voigt
k@123.org