You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Christian Schäfer <sy...@yahoo.de> on 2011/08/19 15:06:10 UTC

Hadoop Case Studies with interactive applications...an antagonism?

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the 
white papers posted on cloudera.com as follows: 


  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By 
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons 
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to 
complete)
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able 
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid 
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop 
Use Cases":
        "Fast detection allows the bank to protect itself from considerable 
losses."

If there is a better non-commercial place to ask this questions please let me 
know.

Background: I'm intending to set up a system for another domain where lots of 
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards
Christian

Re: AW: Hadoop Case Studies with interactive applications...an antagonism?

Posted by Robert Evans <ev...@yahoo-inc.com>.
Christian,

I am really not an expert on Hive, but from what I know Hive translates the SQL into 1 or more Map/Reduce jobs to execute the query.  It does optimizations to try and reduce the number of jobs that it launches and to try and speed things up.  I also know that Pig has looked into supporting some alternative paradigms similar to spark to speed up processing of small jobs, but that is still in the discussion stages and I have no idea where that will end up. I can only assume that Hive is also looking into similar things.

--
Bobby

On 8/19/11 1:44 PM, "Christian Schäfer" <sy...@yahoo.de> wrote:

Hi Bobby,

thanks for the information provided :)

I'm glad there are some possibilities to use hadoop+hbase....was a bit afraid a
had to discard that mighty tool (in my project)

As I'm still at the beginning of learning hadoop I just got one basic question:
Is every query i send via hive to hbase in the background realized as a
map/reduce-job or does it work in another (more efficient) kind? (I know RTFM
would be an appropriate answer...but it still searched...and did not find the
"answer" yet.

the mesos and storm stuff looks interesting..will take it into account for my
evaluation if possible.

somehow I think pig + hive + cloudera tools will be implemented later because of
proven tech, high level, tooling and possibility of getting support.

But I will check out the spark and storm as they seem to have some interesting
concepts :)

regards
Christian







________________________________
Von: Robert Evans <ev...@yahoo-inc.com>
An: "general@hadoop.apache.org" <ge...@hadoop.apache.org>
Gesendet: Freitag, den 19. August 2011, 17:35:08 Uhr
Betreff: Re: Hadoop Case Studies with interactive applications...an antagonism?

Christian,

Hadoop is best for batch processing because it is optimized for that use case.
It is not that it cannot handle small jobs.  Those jobs tend to be some what
slower then other systems and also not as consistent in their processing time as
some use cases really need.  You can get around this some what by over
provisioning your grid.

If you want to do monitoring of sensor data Hadoop should be able to handle it,
so long as your SLAs are not extremely tight.  This is especially true as the
size of your data grows.  You might want to look at HBase.  It can be very fast
and interactive, and because it stores the data in HDFS you can process it with
Map/Reduce if you need to.  There are a number of interactive/fast processing
solutions on top of HDFS too that are either available now or should be soon
once MRV2 stabilizes some more.  Look at Spark which is part of the mesos
project at Berkley (www.mesosproject.org).  Another thing to look at is Hive or
Pig if you want to be able to query the data with a higher level language.

Another solution that looks very interesting once it is released as open source
is storm
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
 It looks like it could be modified a bit to run under YARN (MRV2) and then you
can store your modules state in HBase.  That would compliment Hadoop's MapReduce
processing very nicely and do a lot of what you are looking at doing in real
time.

--Bobby

On 8/19/11 8:06 AM, "Christian Schäfer" <sy...@yahoo.de> wrote:

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the
white papers posted on cloudera.com as follows:


  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to
complete)
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop
Use Cases":
        "Fast detection allows the bank to protect itself from considerable
losses."

If there is a better non-commercial place to ask this questions please let me
know.

Background: I'm intending to set up a system for another domain where lots of
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards
Christian


AW: Hadoop Case Studies with interactive applications...an antagonism?

Posted by Christian Schäfer <sy...@yahoo.de>.
Hi Bobby,

thanks for the information provided :)

I'm glad there are some possibilities to use hadoop+hbase....was a bit afraid a 
had to discard that mighty tool (in my project)

As I'm still at the beginning of learning hadoop I just got one basic question: 
Is every query i send via hive to hbase in the background realized as a 
map/reduce-job or does it work in another (more efficient) kind? (I know RTFM 
would be an appropriate answer...but it still searched...and did not find the 
"answer" yet.

the mesos and storm stuff looks interesting..will take it into account for my 
evaluation if possible.

somehow I think pig + hive + cloudera tools will be implemented later because of 
proven tech, high level, tooling and possibility of getting support.

But I will check out the spark and storm as they seem to have some interesting 
concepts :)

regards
Christian







________________________________
Von: Robert Evans <ev...@yahoo-inc.com>
An: "general@hadoop.apache.org" <ge...@hadoop.apache.org>
Gesendet: Freitag, den 19. August 2011, 17:35:08 Uhr
Betreff: Re: Hadoop Case Studies with interactive applications...an antagonism?

Christian,

Hadoop is best for batch processing because it is optimized for that use case.  
It is not that it cannot handle small jobs.  Those jobs tend to be some what 
slower then other systems and also not as consistent in their processing time as 
some use cases really need.  You can get around this some what by over 
provisioning your grid.

If you want to do monitoring of sensor data Hadoop should be able to handle it, 
so long as your SLAs are not extremely tight.  This is especially true as the 
size of your data grows.  You might want to look at HBase.  It can be very fast 
and interactive, and because it stores the data in HDFS you can process it with 
Map/Reduce if you need to.  There are a number of interactive/fast processing 
solutions on top of HDFS too that are either available now or should be soon 
once MRV2 stabilizes some more.  Look at Spark which is part of the mesos 
project at Berkley (www.mesosproject.org).  Another thing to look at is Hive or 
Pig if you want to be able to query the data with a higher level language.

Another solution that looks very interesting once it is released as open source 
is storm 
http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html
 It looks like it could be modified a bit to run under YARN (MRV2) and then you 
can store your modules state in HBase.  That would compliment Hadoop's MapReduce 
processing very nicely and do a lot of what you are looking at doing in real 
time.

--Bobby

On 8/19/11 8:06 AM, "Christian Schäfer" <sy...@yahoo.de> wrote:

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the
white papers posted on cloudera.com as follows:


  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to
complete)
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop
Use Cases":
        "Fast detection allows the bank to protect itself from considerable
losses."

If there is a better non-commercial place to ask this questions please let me
know.

Background: I'm intending to set up a system for another domain where lots of
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards
Christian

Re: Hadoop Case Studies with interactive applications...an antagonism?

Posted by Robert Evans <ev...@yahoo-inc.com>.
Christian,

Hadoop is best for batch processing because it is optimized for that use case.  It is not that it cannot handle small jobs.  Those jobs tend to be some what slower then other systems and also not as consistent in their processing time as some use cases really need.  You can get around this some what by over provisioning your grid.

If you want to do monitoring of sensor data Hadoop should be able to handle it, so long as your SLAs are not extremely tight.  This is especially true as the size of your data grows.  You might want to look at HBase.  It can be very fast and interactive, and because it stores the data in HDFS you can process it with Map/Reduce if you need to.  There are a number of interactive/fast processing solutions on top of HDFS too that are either available now or should be soon once MRV2 stabilizes some more.  Look at Spark which is part of the mesos project at Berkley (www.mesosproject.org).  Another thing to look at is Hive or Pig if you want to be able to query the data with a higher level language.

Another solution that looks very interesting once it is released as open source is storm http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html It looks like it could be modified a bit to run under YARN (MRV2) and then you can store your modules state in HBase.  That would compliment Hadoop's MapReduce processing very nicely and do a lot of what you are looking at doing in real time.

--Bobby

On 8/19/11 8:06 AM, "Christian Schäfer" <sy...@yahoo.de> wrote:

Hi Hadoopians,

I'm a noob in hadoop (what a rhyme) ....and got some questions relating to the
white papers posted on cloudera.com as follows:


  in IQT  QUARTERLY: HADOOP: Scalable, Flexible Data Storage  and Analysis - By
Mike Olson

    I got an antagonism when comparing case studies and the following pros&cons
of hadoop.

    pros: hadoop(M/R) mostly used in batch operation (running mins or hours to
complete)
    cons: hadoop(M/R) not usable for interactive applications

    and the case study: OpenPDC where it is used for monitoring and to be able
to react quickly:
        "Close monitoring and rapid response to changes in the state of the grid
allow utilities to minimize or prevent blackouts,"

    another case study from "Ten Common Hadoopable Problems - Real-World Hadoop
Use Cases":
        "Fast detection allows the bank to protect itself from considerable
losses."

If there is a better non-commercial place to ask this questions please let me
know.

Background: I'm intending to set up a system for another domain where lots of
sensordata need to be stored
and queried to implement monitoring an detect problem situations

kind regards
Christian