You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by mahout user <ma...@gmail.com> on 2012/08/19 17:44:09 UTC

Hadoop Real time help

Hello folks,


   I am new to hadoop, I just want to get information that how hadoop
framework is usefull for real time service.?can any one explain me..?

Thanks.

Re: Hadoop Real time help

Posted by Niels Basjes <Ni...@basjes.nl>.

Thanks for the pointers, I have stuff to read now :)

On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>>
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
>> volgende:
>>
>>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about MapReduce
>>> or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>
>
>
>
> --
> Bertrand Dechoux



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Hadoop Real time help

Posted by Niels Basjes <Ni...@basjes.nl>.

Thanks for the pointers, I have stuff to read now :)

On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>>
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
>> volgende:
>>
>>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about MapReduce
>>> or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>
>
>
>
> --
> Bertrand Dechoux



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into
HBase or Solr for quick access. But if your dataset is small that fits into
memory then you could also cache it in memory. There are various options
depending on your requirements. Some of them Bertrand has already
highlighted below.

On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com>
>> het volgende:
>>
>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about
>>> MapReduce or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Niels Basjes <Ni...@basjes.nl>.

Thanks for the pointers, I have stuff to read now :)

On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>>
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
>> volgende:
>>
>>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about MapReduce
>>> or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>
>
>
>
> --
> Bertrand Dechoux



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Hadoop Real time help

Posted by Niels Basjes <Ni...@basjes.nl>.

Thanks for the pointers, I have stuff to read now :)

On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux <de...@gmail.com> wrote:
> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>>
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
>> volgende:
>>
>>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about MapReduce
>>> or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>
>>> wrote:
>>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>
>
>
>
> --
> Bertrand Dechoux



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into
HBase or Solr for quick access. But if your dataset is small that fits into
memory then you could also cache it in memory. There are various options
depending on your requirements. Some of them Bertrand has already
highlighted below.

On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com>
>> het volgende:
>>
>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about
>>> MapReduce or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into
HBase or Solr for quick access. But if your dataset is small that fits into
memory then you could also cache it in memory. There are various options
depending on your requirements. Some of them Bertrand has already
highlighted below.

On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com>
>> het volgende:
>>
>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about
>>> MapReduce or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

One of the most commonly used use case is to perform all IO intensive batch
jobs in HDFS and load more structured data or the output of the job into
HBase or Solr for quick access. But if your dataset is small that fits into
memory then you could also cache it in memory. There are various options
depending on your requirements. Some of them Bertrand has already
highlighted below.

On Mon, Aug 20, 2012 at 12:37 AM, Bertrand Dechoux <de...@gmail.com>wrote:

> The terms are
> * ESP : http://en.wikipedia.org/wiki/Event_stream_processing
> * CEP : http://en.wikipedia.org/wiki/Complex_event_processing
>
> By the way, processing streams in real time tends toward being a pleonasm.
>
> MapReduce follows a batch architecture. You keep data until a given time.
> You then process everything. And at the end you provide all the results.
> Stream processing has by definition a more 'smooth' throughput. Each event
> is processed at a time and potentially each processing could lead to a
> result.
>
> I don't know any complete overview of such tools.
> Esper is well known in that space.
> FlumeBase was an attempt to do something similar (as far as I can tell).
> It shows how an ESP engine fits with log collection using a tool such as
> Flume.
>
> Then you also have other solutions which will allow you to scale such as
> Storm.
> A few people have already considered using Storm for scalability and Esper
> to do the real computation.
>
> Regards
>
> Bertrand
>
>
> On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:
>
>> Is there a "complete" overview of the tools that allow processing streams
>> of data in realtime?
>>
>> Or even better; what are the terms to google for?
>>
>> --
>> Met vriendelijke groet,
>> Niels Basjes
>> (Verstuurd vanaf mobiel )
>> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com>
>> het volgende:
>>
>> That's a good question. More and more people are talking about Hadoop
>>> Real Time.
>>> One key aspect of this question is whether we are talking about
>>> MapReduce or not.
>>>
>>> MapReduce greatly improves the response time of any data intensive jobs
>>> but it is still a batch framework with a noticeable latency.
>>>
>>> There is multiple ways to improve the latency :
>>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>>> * Big Table clones (like HBase ...)
>>> * YARN with a non MapReduce application
>>> * ...
>>>
>>> But it will really depend on the context and the definition of 'real
>>> time'.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>>
>>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>>
>>>>    I am new to hadoop, I just want to get information that how hadoop
>>>> framework is usefull for real time service.?can any one explain me..?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> Bertrand Dechoux
>>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

The terms are
* ESP : http://en.wikipedia.org/wiki/Event_stream_processing
* CEP : http://en.wikipedia.org/wiki/Complex_event_processing

By the way, processing streams in real time tends toward being a pleonasm.

MapReduce follows a batch architecture. You keep data until a given time.
You then process everything. And at the end you provide all the results.
Stream processing has by definition a more 'smooth' throughput. Each event
is processed at a time and potentially each processing could lead to a
result.

I don't know any complete overview of such tools.
Esper is well known in that space.
FlumeBase was an attempt to do something similar (as far as I can tell).
It shows how an ESP engine fits with log collection using a tool such as
Flume.

Then you also have other solutions which will allow you to scale such as
Storm.
A few people have already considered using Storm for scalability and Esper
to do the real computation.

Regards

Bertrand

On Sun, Aug 19, 2012 at 9:44 PM, Niels Basjes <ni...@basj.es> wrote:

> Is there a "complete" overview of the tools that allow processing streams
> of data in realtime?
>
> Or even better; what are the terms to google for?
>
> --
> Met vriendelijke groet,
> Niels Basjes
> (Verstuurd vanaf mobiel )
> Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
> volgende:
>
> That's a good question. More and more people are talking about Hadoop Real
>> Time.
>> One key aspect of this question is whether we are talking about MapReduce
>> or not.
>>
>> MapReduce greatly improves the response time of any data intensive jobs
>> but it is still a batch framework with a noticeable latency.
>>
>> There is multiple ways to improve the latency :
>> * ESP/CEP solutions (like Esper, FlumeBase, ...)
>> * Big Table clones (like HBase ...)
>> * YARN with a non MapReduce application
>> * ...
>>
>> But it will really depend on the context and the definition of 'real
>> time'.
>>
>> Regards
>>
>> Bertrand
>>
>>
>>
>> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Bertrand Dechoux
>>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Niels Basjes <ni...@basj.es>.

Is there a "complete" overview of the tools that allow processing streams
of data in realtime?

Or even better; what are the terms to google for?

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
volgende:

> That's a good question. More and more people are talking about Hadoop Real
> Time.
> One key aspect of this question is whether we are talking about MapReduce
> or not.
>
> MapReduce greatly improves the response time of any data intensive jobs
> but it is still a batch framework with a noticeable latency.
>
> There is multiple ways to improve the latency :
> * ESP/CEP solutions (like Esper, FlumeBase, ...)
> * Big Table clones (like HBase ...)
> * YARN with a non MapReduce application
> * ...
>
> But it will really depend on the context and the definition of 'real time'.
>
> Regards
>
> Bertrand
>
>
>
> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Niels Basjes <ni...@basj.es>.

Is there a "complete" overview of the tools that allow processing streams
of data in realtime?

Or even better; what are the terms to google for?

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
volgende:

> That's a good question. More and more people are talking about Hadoop Real
> Time.
> One key aspect of this question is whether we are talking about MapReduce
> or not.
>
> MapReduce greatly improves the response time of any data intensive jobs
> but it is still a batch framework with a noticeable latency.
>
> There is multiple ways to improve the latency :
> * ESP/CEP solutions (like Esper, FlumeBase, ...)
> * Big Table clones (like HBase ...)
> * YARN with a non MapReduce application
> * ...
>
> But it will really depend on the context and the definition of 'real time'.
>
> Regards
>
> Bertrand
>
>
>
> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Niels Basjes <ni...@basj.es>.

Is there a "complete" overview of the tools that allow processing streams
of data in realtime?

Or even better; what are the terms to google for?

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
volgende:

> That's a good question. More and more people are talking about Hadoop Real
> Time.
> One key aspect of this question is whether we are talking about MapReduce
> or not.
>
> MapReduce greatly improves the response time of any data intensive jobs
> but it is still a batch framework with a noticeable latency.
>
> There is multiple ways to improve the latency :
> * ESP/CEP solutions (like Esper, FlumeBase, ...)
> * Big Table clones (like HBase ...)
> * YARN with a non MapReduce application
> * ...
>
> But it will really depend on the context and the definition of 'real time'.
>
> Regards
>
> Bertrand
>
>
>
> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Niels Basjes <ni...@basj.es>.

Is there a "complete" overview of the tools that allow processing streams
of data in realtime?

Or even better; what are the terms to google for?

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 19 aug. 2012 18:22 schreef "Bertrand Dechoux" <de...@gmail.com> het
volgende:

> That's a good question. More and more people are talking about Hadoop Real
> Time.
> One key aspect of this question is whether we are talking about MapReduce
> or not.
>
> MapReduce greatly improves the response time of any data intensive jobs
> but it is still a batch framework with a noticeable latency.
>
> There is multiple ways to improve the latency :
> * ESP/CEP solutions (like Esper, FlumeBase, ...)
> * Big Table clones (like HBase ...)
> * YARN with a non MapReduce application
> * ...
>
> But it will really depend on the context and the definition of 'real time'.
>
> Regards
>
> Bertrand
>
>
>
> On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
>
>
> --
> Bertrand Dechoux
>

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

That's a good question. More and more people are talking about Hadoop Real
Time.
One key aspect of this question is whether we are talking about MapReduce
or not.

MapReduce greatly improves the response time of any data intensive jobs but
it is still a batch framework with a noticeable latency.

There is multiple ways to improve the latency :
* ESP/CEP solutions (like Esper, FlumeBase, ...)
* Big Table clones (like HBase ...)
* YARN with a non MapReduce application
* ...

But it will really depend on the context and the definition of 'real time'.

Regards

Bertrand

On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

Lucene allows you to build a kind of inverted index "content to document
identifier". Solr or ElasticSearch allows to scale the process.

However, if I am reading it correctly, you are saying that you can not pre
compute a structure (such an index) before the search?

If that's true and that you need to process GB of data, then you have to
allow a latency, if you can not have everything in memory before the search
itself.

I can't say anything more precisely.  It will depend on your context. One
may ask : why can't you index the content of your database and your files?

Bertrand

On Sun, Aug 19, 2012 at 9:06 PM, mahout user <ma...@gmail.com> wrote:

> Thanks Mohit and  Bertrand,
>
>      I am looking into hadoop for search engine as many others. But in
> case of search engine, I know lucene is there. But in my case i have
> implemented java classes, they are searching from databases as well as from
> csv files. But i cant understand if there are GB's of data is there, then
> how can i get real time search service with hadoop. ?
>
>
> On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>>
>>
>> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>> Can you specify your use case? Each use case calls for different design
>> consideration.
>>
>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

Lucene allows you to build a kind of inverted index "content to document
identifier". Solr or ElasticSearch allows to scale the process.

However, if I am reading it correctly, you are saying that you can not pre
compute a structure (such an index) before the search?

If that's true and that you need to process GB of data, then you have to
allow a latency, if you can not have everything in memory before the search
itself.

I can't say anything more precisely.  It will depend on your context. One
may ask : why can't you index the content of your database and your files?

Bertrand

On Sun, Aug 19, 2012 at 9:06 PM, mahout user <ma...@gmail.com> wrote:

> Thanks Mohit and  Bertrand,
>
>      I am looking into hadoop for search engine as many others. But in
> case of search engine, I know lucene is there. But in my case i have
> implemented java classes, they are searching from databases as well as from
> csv files. But i cant understand if there are GB's of data is there, then
> how can i get real time search service with hadoop. ?
>
>
> On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>>
>>
>> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>> Can you specify your use case? Each use case calls for different design
>> consideration.
>>
>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

Lucene allows you to build a kind of inverted index "content to document
identifier". Solr or ElasticSearch allows to scale the process.

However, if I am reading it correctly, you are saying that you can not pre
compute a structure (such an index) before the search?

If that's true and that you need to process GB of data, then you have to
allow a latency, if you can not have everything in memory before the search
itself.

I can't say anything more precisely.  It will depend on your context. One
may ask : why can't you index the content of your database and your files?

Bertrand

On Sun, Aug 19, 2012 at 9:06 PM, mahout user <ma...@gmail.com> wrote:

> Thanks Mohit and  Bertrand,
>
>      I am looking into hadoop for search engine as many others. But in
> case of search engine, I know lucene is there. But in my case i have
> implemented java classes, they are searching from databases as well as from
> csv files. But i cant understand if there are GB's of data is there, then
> how can i get real time search service with hadoop. ?
>
>
> On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>>
>>
>> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>> Can you specify your use case? Each use case calls for different design
>> consideration.
>>
>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

Lucene allows you to build a kind of inverted index "content to document
identifier". Solr or ElasticSearch allows to scale the process.

However, if I am reading it correctly, you are saying that you can not pre
compute a structure (such an index) before the search?

If that's true and that you need to process GB of data, then you have to
allow a latency, if you can not have everything in memory before the search
itself.

I can't say anything more precisely.  It will depend on your context. One
may ask : why can't you index the content of your database and your files?

Bertrand

On Sun, Aug 19, 2012 at 9:06 PM, mahout user <ma...@gmail.com> wrote:

> Thanks Mohit and  Bertrand,
>
>      I am looking into hadoop for search engine as many others. But in
> case of search engine, I know lucene is there. But in my case i have
> implemented java classes, they are searching from databases as well as from
> csv files. But i cant understand if there are GB's of data is there, then
> how can i get real time search service with hadoop. ?
>
>
> On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:
>
>>
>>
>> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com>wrote:
>>
>>> Hello folks,
>>>
>>>
>>>    I am new to hadoop, I just want to get information that how hadoop
>>> framework is usefull for real time service.?can any one explain me..?
>>>
>>> Thanks.
>>>
>>
>> Can you specify your use case? Each use case calls for different design
>> consideration.
>>
>
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by mahout user <ma...@gmail.com>.

Thanks Mohit and  Bertrand,

     I am looking into hadoop for search engine as many others. But in case
of search engine, I know lucene is there. But in my case i have implemented
java classes, they are searching from databases as well as from csv files.
But i cant understand if there are GB's of data is there, then how can i
get real time search service with hadoop. ?


On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:

>
>
> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
> Can you specify your use case? Each use case calls for different design
> consideration.
>

Re: Hadoop Real time help

Posted by mahout user <ma...@gmail.com>.

Thanks Mohit and  Bertrand,

     I am looking into hadoop for search engine as many others. But in case
of search engine, I know lucene is there. But in my case i have implemented
java classes, they are searching from databases as well as from csv files.
But i cant understand if there are GB's of data is there, then how can i
get real time search service with hadoop. ?


On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:

>
>
> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
> Can you specify your use case? Each use case calls for different design
> consideration.
>

Re: Hadoop Real time help

Posted by mahout user <ma...@gmail.com>.

Thanks Mohit and  Bertrand,

     I am looking into hadoop for search engine as many others. But in case
of search engine, I know lucene is there. But in my case i have implemented
java classes, they are searching from databases as well as from csv files.
But i cant understand if there are GB's of data is there, then how can i
get real time search service with hadoop. ?


On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:

>
>
> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
> Can you specify your use case? Each use case calls for different design
> consideration.
>

Re: Hadoop Real time help

Posted by mahout user <ma...@gmail.com>.

Thanks Mohit and  Bertrand,

     I am looking into hadoop for search engine as many others. But in case
of search engine, I know lucene is there. But in my case i have implemented
java classes, they are searching from databases as well as from csv files.
But i cant understand if there are GB's of data is there, then how can i
get real time search service with hadoop. ?


On Sun, Aug 19, 2012 at 10:06 PM, Mohit Anchlia <mo...@gmail.com>wrote:

>
>
> On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:
>
>> Hello folks,
>>
>>
>>    I am new to hadoop, I just want to get information that how hadoop
>> framework is usefull for real time service.?can any one explain me..?
>>
>> Thanks.
>>
>
> Can you specify your use case? Each use case calls for different design
> consideration.
>

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

Can you specify your use case? Each use case calls for different design
consideration.

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

That's a good question. More and more people are talking about Hadoop Real
Time.
One key aspect of this question is whether we are talking about MapReduce
or not.

MapReduce greatly improves the response time of any data intensive jobs but
it is still a batch framework with a noticeable latency.

There is multiple ways to improve the latency :
* ESP/CEP solutions (like Esper, FlumeBase, ...)
* Big Table clones (like HBase ...)
* YARN with a non MapReduce application
* ...

But it will really depend on the context and the definition of 'real time'.

Regards

Bertrand

On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

That's a good question. More and more people are talking about Hadoop Real
Time.
One key aspect of this question is whether we are talking about MapReduce
or not.

MapReduce greatly improves the response time of any data intensive jobs but
it is still a batch framework with a noticeable latency.

There is multiple ways to improve the latency :
* ESP/CEP solutions (like Esper, FlumeBase, ...)
* Big Table clones (like HBase ...)
* YARN with a non MapReduce application
* ...

But it will really depend on the context and the definition of 'real time'.

Regards

Bertrand

On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

Can you specify your use case? Each use case calls for different design
consideration.

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

Can you specify your use case? Each use case calls for different design
consideration.

Re: Hadoop Real time help

Posted by Bertrand Dechoux <de...@gmail.com>.

That's a good question. More and more people are talking about Hadoop Real
Time.
One key aspect of this question is whether we are talking about MapReduce
or not.

MapReduce greatly improves the response time of any data intensive jobs but
it is still a batch framework with a noticeable latency.

There is multiple ways to improve the latency :
* ESP/CEP solutions (like Esper, FlumeBase, ...)
* Big Table clones (like HBase ...)
* YARN with a non MapReduce application
* ...

But it will really depend on the context and the definition of 'real time'.

Regards

Bertrand

On Sun, Aug 19, 2012 at 5:44 PM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

-- 
Bertrand Dechoux

Re: Hadoop Real time help

Posted by Mohit Anchlia <mo...@gmail.com>.

On Sun, Aug 19, 2012 at 8:44 AM, mahout user <ma...@gmail.com> wrote:

> Hello folks,
>
>
>    I am new to hadoop, I just want to get information that how hadoop
> framework is usefull for real time service.?can any one explain me..?
>
> Thanks.
>

Can you specify your use case? Each use case calls for different design
consideration.