You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by pgaurav <pg...@gmail.com> on 2012/09/05 10:42:44 UTC

Using hadoop for analytics

Hi Guys,
I’m 5 days old in hadoop world and trying to analyse this as a long term
solution to our client.
I could do some r&d on Amazon EC2 / EMR:
	Load the data, text / csv, to S3
	Write your mapper / reducer / Jobclient and upload the jar to s3
	Start a job flow 
I tried 2 sample code, word count and csv data process.
My question is that to further analyse the data / reporting / search, what
should be done? Do I need to implement in Mapper class itself? Do I need to
dump the data to the database and then write some custom application? What
is the standard way to analysing the data?

Thanks
Prashant

-- 
View this message in context: http://old.nabble.com/Using-hadoop-for-analytics-tp34391246p34391246.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Using hadoop for analytics

Posted by Bejoy Ks <be...@gmail.com>.

Hi Prashant

Welcome to Hadoop Community. :)

Hadoop is meant for processing large data volumes. Saying that, for your
custom requirements you should write your own mapper and reducer that
contains your business logic for processing the input data. Also you can
have a look at hive and pig, which are tools built on top of map reduce
that is highly used for data analysis. Hive supports SQL like queries. If
your requirements could be satisfied with Hive or Pig, it is highly
 recommend to go with those.

On Wed, Sep 5, 2012 at 2:12 PM, pgaurav <pg...@gmail.com> wrote:

>
> Hi Guys,
> I’m 5 days old in hadoop world and trying to analyse this as a long term
> solution to our client.
> I could do some r&d on Amazon EC2 / EMR:
>         Load the data, text / csv, to S3
>         Write your mapper / reducer / Jobclient and upload the jar to s3
>         Start a job flow
> I tried 2 sample code, word count and csv data process.
> My question is that to further analyse the data / reporting / search, what
> should be done? Do I need to implement in Mapper class itself? Do I need to
> dump the data to the database and then write some custom application? What
> is the standard way to analysing the data?
>
> Thanks
> Prashant
>
> --
> View this message in context:
> http://old.nabble.com/Using-hadoop-for-analytics-tp34391246p34391246.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: Using hadoop for analytics

Posted by Nitin Pawar <ni...@gmail.com>.

there are few sql database like products built on hadoop for large scale data
 you may want to take a look at hbase or hive or combined both on how
they suit your needs

you want to do more on data analytics and machine learning then you
can take a look at mahout as well

On Wed, Sep 5, 2012 at 2:12 PM, pgaurav <pg...@gmail.com> wrote:
>
> Hi Guys,
> I’m 5 days old in hadoop world and trying to analyse this as a long term
> solution to our client.
> I could do some r&d on Amazon EC2 / EMR:
>         Load the data, text / csv, to S3
>         Write your mapper / reducer / Jobclient and upload the jar to s3
>         Start a job flow
> I tried 2 sample code, word count and csv data process.
> My question is that to further analyse the data / reporting / search, what
> should be done? Do I need to implement in Mapper class itself? Do I need to
> dump the data to the database and then write some custom application? What
> is the standard way to analysing the data?
>
> Thanks
> Prashant
>
> --
> View this message in context: http://old.nabble.com/Using-hadoop-for-analytics-tp34391246p34391246.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>



-- 
Nitin Pawar

Re: Using hadoop for analytics

Posted by Bertrand Dechoux <de...@gmail.com>.

Hi,

It would depend on the data volume mainly. Hadoop can be used to refine the
data before inserting into a traditional architecture (like a database).

If you want to write jobs, several solutions have emerged :
* plain Mapred/Mapreduce APIs (former is older than the latter but both are
plain default java APIs)
* use python or other languages with Hadoop streaming
* Cascading/Crunch... provides a more high level java APIs (and you have
scalding/cascalog as scala/clojure 'wrapper')
* pig / hive if you want a specific high level language  (hive ql is
sql-ish)
* and then you have commercial products too...

So it depends really on what you want to use it for and what competencies
you (your team, your company) has.

Regards

Bertrand

On Wed, Sep 5, 2012 at 10:42 AM, pgaurav <pg...@gmail.com> wrote:

>
> Hi Guys,
> I’m 5 days old in hadoop world and trying to analyse this as a long term
> solution to our client.
> I could do some r&d on Amazon EC2 / EMR:
>         Load the data, text / csv, to S3
>         Write your mapper / reducer / Jobclient and upload the jar to s3
>         Start a job flow
> I tried 2 sample code, word count and csv data process.
> My question is that to further analyse the data / reporting / search, what
> should be done? Do I need to implement in Mapper class itself? Do I need to
> dump the data to the database and then write some custom application? What
> is the standard way to analysing the data?
>
> Thanks
> Prashant
>
> --
> View this message in context:
> http://old.nabble.com/Using-hadoop-for-analytics-tp34391246p34391246.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

-- 
Bertrand Dechoux