You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by HY <ja...@163.com> on 2011/08/18 06:14:30 UTC

Questions about Hadoop

Hello.
 I have an idea which is to run the common java application(in the form of jar) on hadoop framework.Is there anyone to tell me whether it till be implemented and which subproject should I use to implement this idea?


eg: I wrote a set of codes to get some datas from one database then store them to another database,all the process will packeged to a common jar.And I want to deploy it in hadoop, and what's more I want to run it by using the clusters which integrated in hadoop.
Can I implement it?
I read the documentation of mapreduce so I thought I can use this project but I'm not very sure about it.
Can anyone give some advice to me??
                                                                                                                                                             Thanks in advance

Re: Questions about Hadoop

Posted by Harsh J <ha...@cloudera.com>.
Do you need any form of distributed processing at all for your work?
MapReduce works best with direct files and not databases in most
cases. Looking at your description, am unclear as to what your total
goal is (just DB to DB tx? or you want to process as well?). Generally
speaking (since I have no idea on the specific things you use), Hadoop
would not scale very well with usual RDBMS usage over MapReduce, but
is tremendously good with raw file processing with
HDFS/other-distributed-fs.

If your question pertains to simply running ad-hoc java programs (of
non MR nature) on a Hadoop cluster to help with your workflow(s), take
a look at Apache Oozie
(http://incubator.apache.org/projects/oozie.html) which is one such
tool that lets you do that.

2011/8/18 HY <ja...@163.com>:
> Hello.
>  I have an idea which is to run the common java application(in the form of jar) on hadoop framework.Is there anyone to tell me whether it till be implemented and which subproject should I use to implement this idea?
>
>
> eg: I wrote a set of codes to get some datas from one database then store them to another database,all the process will packeged to a common jar.And I want to deploy it in hadoop, and what's more I want to run it by using the clusters which integrated in hadoop.
> Can I implement it?
> I read the documentation of mapreduce so I thought I can use this project but I'm not very sure about it.
> Can anyone give some advice to me??
>                                                                                                                                                             Thanks in advance



-- 
Harsh J