You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Chengi Liu <ch...@gmail.com> on 2013/05/02 23:03:26 UTC

Saving data in db instead of hdfs

Hi,
 I am using hadoop streaming api (python) for some processing.
While I want the data to be processed via hadoop but I want to pipe it to
db instead of hdfs.
How do I do this?
THanks

Re: Saving data in db instead of hdfs

Posted by Ahmed Radwan <ah...@apache.org>.

You can use the DBOutputFormat to directly write your job output to a
DB, see: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html

I'd also recommend looking into sqoop (http://sqoop.apache.org/) for
more capabilities.

On Thu, May 2, 2013 at 2:03 PM, Chengi Liu <ch...@gmail.com> wrote:
> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to db
> instead of hdfs.
> How do I do this?
> THanks

Re: Saving data in db instead of hdfs

Posted by Mirko Kämpf <mi...@gmail.com>.

Hi,

just use Sqoop to push the data from HDFS to a database via JDBC.

Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/

Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.

You can also create your own OutputFormat (with Java API), which writes
data directly to the database, but be careful
with large result sets or even with a large number of reducers. This could
be a scalability issue, but a small
dataset coming out from one reducer can be handled that way.

OutputFormat and Streaming API:
http://blog.aggregateknowledge.com/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Best wishes
Mirko


2013/5/2 Chengi Liu <ch...@gmail.com>

> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to
> db instead of hdfs.
> How do I do this?
> THanks
>

Re: Saving data in db instead of hdfs

Posted by Mirko Kämpf <mi...@gmail.com>.

Hi,

just use Sqoop to push the data from HDFS to a database via JDBC.

Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/

Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.

You can also create your own OutputFormat (with Java API), which writes
data directly to the database, but be careful
with large result sets or even with a large number of reducers. This could
be a scalability issue, but a small
dataset coming out from one reducer can be handled that way.

OutputFormat and Streaming API:
http://blog.aggregateknowledge.com/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Best wishes
Mirko


2013/5/2 Chengi Liu <ch...@gmail.com>

> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to
> db instead of hdfs.
> How do I do this?
> THanks
>

Re: Saving data in db instead of hdfs

Posted by Mirko Kämpf <mi...@gmail.com>.

Hi,

just use Sqoop to push the data from HDFS to a database via JDBC.

Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/

Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.

You can also create your own OutputFormat (with Java API), which writes
data directly to the database, but be careful
with large result sets or even with a large number of reducers. This could
be a scalability issue, but a small
dataset coming out from one reducer can be handled that way.

OutputFormat and Streaming API:
http://blog.aggregateknowledge.com/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Best wishes
Mirko


2013/5/2 Chengi Liu <ch...@gmail.com>

> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to
> db instead of hdfs.
> How do I do this?
> THanks
>

Re: Saving data in db instead of hdfs

Posted by Ahmed Radwan <ah...@apache.org>.

You can use the DBOutputFormat to directly write your job output to a
DB, see: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html

I'd also recommend looking into sqoop (http://sqoop.apache.org/) for
more capabilities.

On Thu, May 2, 2013 at 2:03 PM, Chengi Liu <ch...@gmail.com> wrote:
> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to db
> instead of hdfs.
> How do I do this?
> THanks

Re: Saving data in db instead of hdfs

Posted by Mirko Kämpf <mi...@gmail.com>.

Hi,

just use Sqoop to push the data from HDFS to a database via JDBC.

Intro to Sqoop:
http://blog.cloudera.com/blog/2009/06/introducing-sqoop/

Or even use Hive-JDBC to connect to your result data from outside the
hadoop cluster.

You can also create your own OutputFormat (with Java API), which writes
data directly to the database, but be careful
with large result sets or even with a large number of reducers. This could
be a scalability issue, but a small
dataset coming out from one reducer can be handled that way.

OutputFormat and Streaming API:
http://blog.aggregateknowledge.com/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Best wishes
Mirko


2013/5/2 Chengi Liu <ch...@gmail.com>

> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to
> db instead of hdfs.
> How do I do this?
> THanks
>

Re: Saving data in db instead of hdfs

Posted by Ahmed Radwan <ah...@apache.org>.

You can use the DBOutputFormat to directly write your job output to a
DB, see: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html

I'd also recommend looking into sqoop (http://sqoop.apache.org/) for
more capabilities.

On Thu, May 2, 2013 at 2:03 PM, Chengi Liu <ch...@gmail.com> wrote:
> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to db
> instead of hdfs.
> How do I do this?
> THanks

Re: Saving data in db instead of hdfs

Posted by Ahmed Radwan <ah...@apache.org>.

You can use the DBOutputFormat to directly write your job output to a
DB, see: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/db/DBOutputFormat.html

I'd also recommend looking into sqoop (http://sqoop.apache.org/) for
more capabilities.

On Thu, May 2, 2013 at 2:03 PM, Chengi Liu <ch...@gmail.com> wrote:
> Hi,
>  I am using hadoop streaming api (python) for some processing.
> While I want the data to be processed via hadoop but I want to pipe it to db
> instead of hdfs.
> How do I do this?
> THanks