You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com> on 2014/07/31 09:32:29 UTC

Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1)      How can we tune RHEL OS for this?
2)      How can we tune yarn?
3)      Is there is any stable framework like Tez which can perform much better
4)      Is there is any caching strategy that we can adopt?
5)      Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

Re: Hadoop Realtime Queries

Posted by Alex Kamil <al...@gmail.com>.

NP,

we use Hbase+Phoenix for "real time" SQL queries in prod:
 http://phoenix.apache.org/

by real time I mean milliseconds for small queries, or seconds for
 hundreds of millions of rows. The speed mostly depends on how many
nodes/ hbase regionservers are in in the cluster.  Hbase is great  for
parallel scanning of TBs of data and Phoenix adds the standard SQL and the
capability to run JOINs on multiple tables. It's using Hbase co-processors
to optimize aggregated queries. It's a breeze to install (just a standard
JDBC driver) and so far been very stable.

Language reference: http://phoenix.apache.org/language/index.html
Performance and comparison with Hive and Impala:
http://phoenix.apache.org/performance.html

Alex


On Thu, Jul 31, 2014 at 12:22 PM, Nitin Pawar <ni...@gmail.com>
wrote:

> Before you read the entire answer, i will advise you to wait for hive
> experts to answer.
>
> you are looking at a wrong system then.
>
> Hive is more batch oriented and bring a near real time scenario with
> ORC/Paraquet fileformats along with tez and stringer.
>
> You may want to design your system in a way where you can take the help of
> batch oriented nature and merge it with real stream processing and making
> those data available for reporting.
>
> I am not sure if anyone has done tests for sizes for 50TB.
> What's the size of your cluster? what is cluster capacity on running maps
> or reducers in parallel ?
>
> I remember doing more than 150TB data processing when RCFile was just
> released and hive was in 0.7 or something like that. My cluster size was
> more than 800 nodes and I could run around 1600 maps. But still we needed
> many hours to consume the data cause the nature of queries was complex and
> it was for pattern matching and pattern recognition
>
>
> On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
>>  Hi Nitin,
>>
>>
>>
>> I want queries to return within a second
>>
>>
>>
>> Hive table DataSize is 50TB – Snappy RC file
>>
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N  aka NP
>>
>> nsn, Bangalore
>>
>> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>>
>>
>>
>>
>>
>> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
>> *Sent:* Thursday, July 31, 2014 6:25 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> how quick is quick for you ?
>>
>> what's the data size?
>>
>> what kind of queries you want to run?
>>
>> what is the frequency of running the query on same dataset again and
>> again?
>>
>>
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
>> IN/Bangalore) <pr...@nsn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Thank you all for the reply.
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N
>>
>>
>>
>> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
>> *Sent:* Thursday, July 31, 2014 1:28 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> It all depends on the context and what is really meant by realtime.
>> Impala (and other concurrent alternatives) are not listed among the tools
>> you have tried.
>>
>> Maybe you should not focus only on batch frameworks for providing a
>> realtime access? The results are not surprising.
>>
>>
>>  Bertrand Dechoux
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
>> wrote:
>>
>> Hi,
>>
>> As far as I know, real time queries are only possible using HBase &
>> cloudera search. Hive would be a batch process, it is not real time. So
>> instead of tuning different parameters , may be you could look for
>> different architecture design so that you could use HBase.
>>
>>
>>
>> Regards,
>>
>> Deepak
>>
>>
>>
>> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
>> prabakaran.1.natarajan@nsn.com]
>> *Sent:* Thursday, July 31, 2014 3:32 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Hadoop Realtime Queries
>>
>>
>>
>> Hi
>>
>>
>>
>> I want to perform realtime query on HDFS data.   I tried
>> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>>
>> But still I couldn’t get subsecond performance on the large data that I
>> have.
>>
>> I understand hadoop is not meant for this, but still want to achieve as
>> max as possible
>>
>>
>>
>> 1.       How can we tune RHEL OS for this?
>>
>> 2.       How can we tune yarn?
>>
>> 3.       Is there is any stable framework like Tez which can perform
>> much better
>>
>> 4.       Is there is any caching strategy that we can adopt?
>>
>> 5.       Any articles related to this are welcome
>>
>>
>>
>> Thanks in Advance
>>
>>
>>
>> Prabakaran.N
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>

Re: Hadoop Realtime Queries

Posted by Alex Kamil <al...@gmail.com>.

NP,

we use Hbase+Phoenix for "real time" SQL queries in prod:
 http://phoenix.apache.org/

by real time I mean milliseconds for small queries, or seconds for
 hundreds of millions of rows. The speed mostly depends on how many
nodes/ hbase regionservers are in in the cluster.  Hbase is great  for
parallel scanning of TBs of data and Phoenix adds the standard SQL and the
capability to run JOINs on multiple tables. It's using Hbase co-processors
to optimize aggregated queries. It's a breeze to install (just a standard
JDBC driver) and so far been very stable.

Language reference: http://phoenix.apache.org/language/index.html
Performance and comparison with Hive and Impala:
http://phoenix.apache.org/performance.html

Alex


On Thu, Jul 31, 2014 at 12:22 PM, Nitin Pawar <ni...@gmail.com>
wrote:

> Before you read the entire answer, i will advise you to wait for hive
> experts to answer.
>
> you are looking at a wrong system then.
>
> Hive is more batch oriented and bring a near real time scenario with
> ORC/Paraquet fileformats along with tez and stringer.
>
> You may want to design your system in a way where you can take the help of
> batch oriented nature and merge it with real stream processing and making
> those data available for reporting.
>
> I am not sure if anyone has done tests for sizes for 50TB.
> What's the size of your cluster? what is cluster capacity on running maps
> or reducers in parallel ?
>
> I remember doing more than 150TB data processing when RCFile was just
> released and hive was in 0.7 or something like that. My cluster size was
> more than 800 nodes and I could run around 1600 maps. But still we needed
> many hours to consume the data cause the nature of queries was complex and
> it was for pattern matching and pattern recognition
>
>
> On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
>>  Hi Nitin,
>>
>>
>>
>> I want queries to return within a second
>>
>>
>>
>> Hive table DataSize is 50TB – Snappy RC file
>>
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N  aka NP
>>
>> nsn, Bangalore
>>
>> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>>
>>
>>
>>
>>
>> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
>> *Sent:* Thursday, July 31, 2014 6:25 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> how quick is quick for you ?
>>
>> what's the data size?
>>
>> what kind of queries you want to run?
>>
>> what is the frequency of running the query on same dataset again and
>> again?
>>
>>
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
>> IN/Bangalore) <pr...@nsn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Thank you all for the reply.
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N
>>
>>
>>
>> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
>> *Sent:* Thursday, July 31, 2014 1:28 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> It all depends on the context and what is really meant by realtime.
>> Impala (and other concurrent alternatives) are not listed among the tools
>> you have tried.
>>
>> Maybe you should not focus only on batch frameworks for providing a
>> realtime access? The results are not surprising.
>>
>>
>>  Bertrand Dechoux
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
>> wrote:
>>
>> Hi,
>>
>> As far as I know, real time queries are only possible using HBase &
>> cloudera search. Hive would be a batch process, it is not real time. So
>> instead of tuning different parameters , may be you could look for
>> different architecture design so that you could use HBase.
>>
>>
>>
>> Regards,
>>
>> Deepak
>>
>>
>>
>> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
>> prabakaran.1.natarajan@nsn.com]
>> *Sent:* Thursday, July 31, 2014 3:32 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Hadoop Realtime Queries
>>
>>
>>
>> Hi
>>
>>
>>
>> I want to perform realtime query on HDFS data.   I tried
>> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>>
>> But still I couldn’t get subsecond performance on the large data that I
>> have.
>>
>> I understand hadoop is not meant for this, but still want to achieve as
>> max as possible
>>
>>
>>
>> 1.       How can we tune RHEL OS for this?
>>
>> 2.       How can we tune yarn?
>>
>> 3.       Is there is any stable framework like Tez which can perform
>> much better
>>
>> 4.       Is there is any caching strategy that we can adopt?
>>
>> 5.       Any articles related to this are welcome
>>
>>
>>
>> Thanks in Advance
>>
>>
>>
>> Prabakaran.N
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>

Re: Hadoop Realtime Queries

Posted by Alex Kamil <al...@gmail.com>.

NP,

we use Hbase+Phoenix for "real time" SQL queries in prod:
 http://phoenix.apache.org/

by real time I mean milliseconds for small queries, or seconds for
 hundreds of millions of rows. The speed mostly depends on how many
nodes/ hbase regionservers are in in the cluster.  Hbase is great  for
parallel scanning of TBs of data and Phoenix adds the standard SQL and the
capability to run JOINs on multiple tables. It's using Hbase co-processors
to optimize aggregated queries. It's a breeze to install (just a standard
JDBC driver) and so far been very stable.

Language reference: http://phoenix.apache.org/language/index.html
Performance and comparison with Hive and Impala:
http://phoenix.apache.org/performance.html

Alex


On Thu, Jul 31, 2014 at 12:22 PM, Nitin Pawar <ni...@gmail.com>
wrote:

> Before you read the entire answer, i will advise you to wait for hive
> experts to answer.
>
> you are looking at a wrong system then.
>
> Hive is more batch oriented and bring a near real time scenario with
> ORC/Paraquet fileformats along with tez and stringer.
>
> You may want to design your system in a way where you can take the help of
> batch oriented nature and merge it with real stream processing and making
> those data available for reporting.
>
> I am not sure if anyone has done tests for sizes for 50TB.
> What's the size of your cluster? what is cluster capacity on running maps
> or reducers in parallel ?
>
> I remember doing more than 150TB data processing when RCFile was just
> released and hive was in 0.7 or something like that. My cluster size was
> more than 800 nodes and I could run around 1600 maps. But still we needed
> many hours to consume the data cause the nature of queries was complex and
> it was for pattern matching and pattern recognition
>
>
> On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
>>  Hi Nitin,
>>
>>
>>
>> I want queries to return within a second
>>
>>
>>
>> Hive table DataSize is 50TB – Snappy RC file
>>
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N  aka NP
>>
>> nsn, Bangalore
>>
>> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>>
>>
>>
>>
>>
>> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
>> *Sent:* Thursday, July 31, 2014 6:25 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> how quick is quick for you ?
>>
>> what's the data size?
>>
>> what kind of queries you want to run?
>>
>> what is the frequency of running the query on same dataset again and
>> again?
>>
>>
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
>> IN/Bangalore) <pr...@nsn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Thank you all for the reply.
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N
>>
>>
>>
>> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
>> *Sent:* Thursday, July 31, 2014 1:28 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> It all depends on the context and what is really meant by realtime.
>> Impala (and other concurrent alternatives) are not listed among the tools
>> you have tried.
>>
>> Maybe you should not focus only on batch frameworks for providing a
>> realtime access? The results are not surprising.
>>
>>
>>  Bertrand Dechoux
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
>> wrote:
>>
>> Hi,
>>
>> As far as I know, real time queries are only possible using HBase &
>> cloudera search. Hive would be a batch process, it is not real time. So
>> instead of tuning different parameters , may be you could look for
>> different architecture design so that you could use HBase.
>>
>>
>>
>> Regards,
>>
>> Deepak
>>
>>
>>
>> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
>> prabakaran.1.natarajan@nsn.com]
>> *Sent:* Thursday, July 31, 2014 3:32 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Hadoop Realtime Queries
>>
>>
>>
>> Hi
>>
>>
>>
>> I want to perform realtime query on HDFS data.   I tried
>> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>>
>> But still I couldn’t get subsecond performance on the large data that I
>> have.
>>
>> I understand hadoop is not meant for this, but still want to achieve as
>> max as possible
>>
>>
>>
>> 1.       How can we tune RHEL OS for this?
>>
>> 2.       How can we tune yarn?
>>
>> 3.       Is there is any stable framework like Tez which can perform
>> much better
>>
>> 4.       Is there is any caching strategy that we can adopt?
>>
>> 5.       Any articles related to this are welcome
>>
>>
>>
>> Thanks in Advance
>>
>>
>>
>> Prabakaran.N
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>

Re: Hadoop Realtime Queries

Posted by Alex Kamil <al...@gmail.com>.

NP,

we use Hbase+Phoenix for "real time" SQL queries in prod:
 http://phoenix.apache.org/

by real time I mean milliseconds for small queries, or seconds for
 hundreds of millions of rows. The speed mostly depends on how many
nodes/ hbase regionservers are in in the cluster.  Hbase is great  for
parallel scanning of TBs of data and Phoenix adds the standard SQL and the
capability to run JOINs on multiple tables. It's using Hbase co-processors
to optimize aggregated queries. It's a breeze to install (just a standard
JDBC driver) and so far been very stable.

Language reference: http://phoenix.apache.org/language/index.html
Performance and comparison with Hive and Impala:
http://phoenix.apache.org/performance.html

Alex


On Thu, Jul 31, 2014 at 12:22 PM, Nitin Pawar <ni...@gmail.com>
wrote:

> Before you read the entire answer, i will advise you to wait for hive
> experts to answer.
>
> you are looking at a wrong system then.
>
> Hive is more batch oriented and bring a near real time scenario with
> ORC/Paraquet fileformats along with tez and stringer.
>
> You may want to design your system in a way where you can take the help of
> batch oriented nature and merge it with real stream processing and making
> those data available for reporting.
>
> I am not sure if anyone has done tests for sizes for 50TB.
> What's the size of your cluster? what is cluster capacity on running maps
> or reducers in parallel ?
>
> I remember doing more than 150TB data processing when RCFile was just
> released and hive was in 0.7 or something like that. My cluster size was
> more than 800 nodes and I could run around 1600 maps. But still we needed
> many hours to consume the data cause the nature of queries was complex and
> it was for pattern matching and pattern recognition
>
>
> On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
>>  Hi Nitin,
>>
>>
>>
>> I want queries to return within a second
>>
>>
>>
>> Hive table DataSize is 50TB – Snappy RC file
>>
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N  aka NP
>>
>> nsn, Bangalore
>>
>> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>>
>>
>>
>>
>>
>> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
>> *Sent:* Thursday, July 31, 2014 6:25 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> how quick is quick for you ?
>>
>> what's the data size?
>>
>> what kind of queries you want to run?
>>
>> what is the frequency of running the query on same dataset again and
>> again?
>>
>>
>>
>>
>>
>> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
>> IN/Bangalore) <pr...@nsn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Thank you all for the reply.
>>
>>
>>
>> I want quick response for SQL queries .
>>
>>
>>
>> *Thanks and Regards*
>>
>> Prabakaran.N
>>
>>
>>
>> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
>> *Sent:* Thursday, July 31, 2014 1:28 PM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Hadoop Realtime Queries
>>
>>
>>
>> It all depends on the context and what is really meant by realtime.
>> Impala (and other concurrent alternatives) are not listed among the tools
>> you have tried.
>>
>> Maybe you should not focus only on batch frameworks for providing a
>> realtime access? The results are not surprising.
>>
>>
>>  Bertrand Dechoux
>>
>>
>>
>> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
>> wrote:
>>
>> Hi,
>>
>> As far as I know, real time queries are only possible using HBase &
>> cloudera search. Hive would be a batch process, it is not real time. So
>> instead of tuning different parameters , may be you could look for
>> different architecture design so that you could use HBase.
>>
>>
>>
>> Regards,
>>
>> Deepak
>>
>>
>>
>> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
>> prabakaran.1.natarajan@nsn.com]
>> *Sent:* Thursday, July 31, 2014 3:32 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Hadoop Realtime Queries
>>
>>
>>
>> Hi
>>
>>
>>
>> I want to perform realtime query on HDFS data.   I tried
>> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>>
>> But still I couldn’t get subsecond performance on the large data that I
>> have.
>>
>> I understand hadoop is not meant for this, but still want to achieve as
>> max as possible
>>
>>
>>
>> 1.       How can we tune RHEL OS for this?
>>
>> 2.       How can we tune yarn?
>>
>> 3.       Is there is any stable framework like Tez which can perform
>> much better
>>
>> 4.       Is there is any caching strategy that we can adopt?
>>
>> 5.       Any articles related to this are welcome
>>
>>
>>
>> Thanks in Advance
>>
>>
>>
>> Prabakaran.N
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

Before you read the entire answer, i will advise you to wait for hive
experts to answer.

you are looking at a wrong system then.

Hive is more batch oriented and bring a near real time scenario with
ORC/Paraquet fileformats along with tez and stringer.

You may want to design your system in a way where you can take the help of
batch oriented nature and merge it with real stream processing and making
those data available for reporting.

I am not sure if anyone has done tests for sizes for 50TB.
What's the size of your cluster? what is cluster capacity on running maps
or reducers in parallel ?

I remember doing more than 150TB data processing when RCFile was just
released and hive was in 0.7 or something like that. My cluster size was
more than 800 nodes and I could run around 1600 maps. But still we needed
many hours to consume the data cause the nature of queries was complex and
it was for pattern matching and pattern recognition


On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi Nitin,
>
>
>
> I want queries to return within a second
>
>
>
> Hive table DataSize is 50TB – Snappy RC file
>
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N  aka NP
>
> nsn, Bangalore
>
> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>
>
>
>
>
> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *Sent:* Thursday, July 31, 2014 6:25 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> I want quick response for SQL queries .
>
>
>
> how quick is quick for you ?
>
> what's the data size?
>
> what kind of queries you want to run?
>
> what is the frequency of running the query on same dataset again and
> again?
>
>
>
>
>
> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
> Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

Before you read the entire answer, i will advise you to wait for hive
experts to answer.

you are looking at a wrong system then.

Hive is more batch oriented and bring a near real time scenario with
ORC/Paraquet fileformats along with tez and stringer.

You may want to design your system in a way where you can take the help of
batch oriented nature and merge it with real stream processing and making
those data available for reporting.

I am not sure if anyone has done tests for sizes for 50TB.
What's the size of your cluster? what is cluster capacity on running maps
or reducers in parallel ?

I remember doing more than 150TB data processing when RCFile was just
released and hive was in 0.7 or something like that. My cluster size was
more than 800 nodes and I could run around 1600 maps. But still we needed
many hours to consume the data cause the nature of queries was complex and
it was for pattern matching and pattern recognition


On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi Nitin,
>
>
>
> I want queries to return within a second
>
>
>
> Hive table DataSize is 50TB – Snappy RC file
>
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N  aka NP
>
> nsn, Bangalore
>
> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>
>
>
>
>
> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *Sent:* Thursday, July 31, 2014 6:25 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> I want quick response for SQL queries .
>
>
>
> how quick is quick for you ?
>
> what's the data size?
>
> what kind of queries you want to run?
>
> what is the frequency of running the query on same dataset again and
> again?
>
>
>
>
>
> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
> Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

Before you read the entire answer, i will advise you to wait for hive
experts to answer.

you are looking at a wrong system then.

Hive is more batch oriented and bring a near real time scenario with
ORC/Paraquet fileformats along with tez and stringer.

You may want to design your system in a way where you can take the help of
batch oriented nature and merge it with real stream processing and making
those data available for reporting.

I am not sure if anyone has done tests for sizes for 50TB.
What's the size of your cluster? what is cluster capacity on running maps
or reducers in parallel ?

I remember doing more than 150TB data processing when RCFile was just
released and hive was in 0.7 or something like that. My cluster size was
more than 800 nodes and I could run around 1600 maps. But still we needed
many hours to consume the data cause the nature of queries was complex and
it was for pattern matching and pattern recognition


On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi Nitin,
>
>
>
> I want queries to return within a second
>
>
>
> Hive table DataSize is 50TB – Snappy RC file
>
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N  aka NP
>
> nsn, Bangalore
>
> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>
>
>
>
>
> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *Sent:* Thursday, July 31, 2014 6:25 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> I want quick response for SQL queries .
>
>
>
> how quick is quick for you ?
>
> what's the data size?
>
> what kind of queries you want to run?
>
> what is the frequency of running the query on same dataset again and
> again?
>
>
>
>
>
> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
> Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

Before you read the entire answer, i will advise you to wait for hive
experts to answer.

you are looking at a wrong system then.

Hive is more batch oriented and bring a near real time scenario with
ORC/Paraquet fileformats along with tez and stringer.

You may want to design your system in a way where you can take the help of
batch oriented nature and merge it with real stream processing and making
those data available for reporting.

I am not sure if anyone has done tests for sizes for 50TB.
What's the size of your cluster? what is cluster capacity on running maps
or reducers in parallel ?

I remember doing more than 150TB data processing when RCFile was just
released and hive was in 0.7 or something like that. My cluster size was
more than 800 nodes and I could run around 1600 maps. But still we needed
many hours to consume the data cause the nature of queries was complex and
it was for pattern matching and pattern recognition


On Thu, Jul 31, 2014 at 9:37 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi Nitin,
>
>
>
> I want queries to return within a second
>
>
>
> Hive table DataSize is 50TB – Snappy RC file
>
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N  aka NP
>
> nsn, Bangalore
>
> *When "I" is replaced by "We" - even Illness becomes "Wellness"*
>
>
>
>
>
> *From:* ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *Sent:* Thursday, July 31, 2014 6:25 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> I want quick response for SQL queries .
>
>
>
> how quick is quick for you ?
>
> what's the data size?
>
> what kind of queries you want to run?
>
> what is the frequency of running the query on same dataset again and
> again?
>
>
>
>
>
> On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
> IN/Bangalore) <pr...@nsn.com> wrote:
>
> Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi Nitin,

I want queries to return within a second

Hive table DataSize is 50TB – Snappy RC file

Thanks and Regards
Prabakaran.N  aka NP
nsn, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"


From: ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Thursday, July 31, 2014 6:25 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?


On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN - IN/Bangalore) <pr...@nsn.com>> wrote:
Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N







--
Nitin Pawar

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi Nitin,

I want queries to return within a second

Hive table DataSize is 50TB – Snappy RC file

Thanks and Regards
Prabakaran.N  aka NP
nsn, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"


From: ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Thursday, July 31, 2014 6:25 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?


On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN - IN/Bangalore) <pr...@nsn.com>> wrote:
Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N







--
Nitin Pawar

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi Nitin,

I want queries to return within a second

Hive table DataSize is 50TB – Snappy RC file

Thanks and Regards
Prabakaran.N  aka NP
nsn, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"


From: ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Thursday, July 31, 2014 6:25 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?


On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN - IN/Bangalore) <pr...@nsn.com>> wrote:
Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N







--
Nitin Pawar

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi Nitin,

I want queries to return within a second

Hive table DataSize is 50TB – Snappy RC file

Thanks and Regards
Prabakaran.N  aka NP
nsn, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"


From: ext Nitin Pawar [mailto:nitinpawar432@gmail.com]
Sent: Thursday, July 31, 2014 6:25 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?


On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN - IN/Bangalore) <pr...@nsn.com>> wrote:
Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com<ma...@gmail.com>]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N







--
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?



On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?



On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?



On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>



-- 
Nitin Pawar

Re: Hadoop Realtime Queries

Posted by Nitin Pawar <ni...@gmail.com>.

I want quick response for SQL queries .

how quick is quick for you ?
what's the data size?
what kind of queries you want to run?
what is the frequency of running the query on same dataset again and again?



On Thu, Jul 31, 2014 at 6:20 PM, Natarajan, Prabakaran 1. (NSN -
IN/Bangalore) <pr...@nsn.com> wrote:

>  Hi,
>
>
>
> Thank you all for the reply.
>
>
>
> I want quick response for SQL queries .
>
>
>
> *Thanks and Regards*
>
> Prabakaran.N
>
>
>
> *From:* ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Thursday, July 31, 2014 1:28 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: Hadoop Realtime Queries
>
>
>
> It all depends on the context and what is really meant by realtime. Impala
> (and other concurrent alternatives) are not listed among the tools you have
> tried.
>
> Maybe you should not focus only on batch frameworks for providing a
> realtime access? The results are not surprising.
>
>
>  Bertrand Dechoux
>
>
>
> On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
> wrote:
>
> Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>
>
>



-- 
Nitin Pawar

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com>.

Hi,

Thank you all for the reply.

I want quick response for SQL queries .

Thanks and Regards
Prabakaran.N

From: ext Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries

It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried.
Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising.

Bertrand Dechoux

On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>> wrote:
Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com<ma...@nsn.com>]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn’t get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

Re: Hadoop Realtime Queries

Posted by Bertrand Dechoux <de...@gmail.com>.

It all depends on the context and what is really meant by realtime. Impala
(and other concurrent alternatives) are not listed among the tools you have
tried.
Maybe you should not focus only on batch frameworks for providing a
realtime access? The results are not surprising.

Bertrand Dechoux


On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
wrote:

>  Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>

Re: Hadoop Realtime Queries

Posted by Bertrand Dechoux <de...@gmail.com>.

It all depends on the context and what is really meant by realtime. Impala
(and other concurrent alternatives) are not listed among the tools you have
tried.
Maybe you should not focus only on batch frameworks for providing a
realtime access? The results are not surprising.

Bertrand Dechoux


On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
wrote:

>  Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>

Re: Hadoop Realtime Queries

Posted by Bertrand Dechoux <de...@gmail.com>.

It all depends on the context and what is really meant by realtime. Impala
(and other concurrent alternatives) are not listed among the tools you have
tried.
Maybe you should not focus only on batch frameworks for providing a
realtime access? The results are not surprising.

Bertrand Dechoux


On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
wrote:

>  Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>

Re: Hadoop Realtime Queries

Posted by Bertrand Dechoux <de...@gmail.com>.

It all depends on the context and what is really meant by realtime. Impala
(and other concurrent alternatives) are not listed among the tools you have
tried.
Maybe you should not focus only on batch frameworks for providing a
realtime access? The results are not surprising.

Bertrand Dechoux


On Thu, Jul 31, 2014 at 9:38 AM, Kumar, Deepak8 <de...@citi.com>
wrote:

>  Hi,
>
> As far as I know, real time queries are only possible using HBase &
> cloudera search. Hive would be a batch process, it is not real time. So
> instead of tuning different parameters , may be you could look for
> different architecture design so that you could use HBase.
>
>
>
> Regards,
>
> Deepak
>
>
>
> *From:* Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:
> prabakaran.1.natarajan@nsn.com]
> *Sent:* Thursday, July 31, 2014 3:32 AM
> *To:* user@hadoop.apache.org
> *Subject:* Hadoop Realtime Queries
>
>
>
> Hi
>
>
>
> I want to perform realtime query on HDFS data.   I tried
> hadoop/yarnt/hive, shark on spark, Tez, etc.,
>
> But still I couldn’t get subsecond performance on the large data that I
> have.
>
> I understand hadoop is not meant for this, but still want to achieve as
> max as possible
>
>
>
> 1.       How can we tune RHEL OS for this?
>
> 2.       How can we tune yarn?
>
> 3.       Is there is any stable framework like Tez which can perform much
> better
>
> 4.       Is there is any caching strategy that we can adopt?
>
> 5.       Any articles related to this are welcome
>
>
>
> Thanks in Advance
>
>
>
> Prabakaran.N
>
>
>
>
>
>
>

RE: Hadoop Realtime Queries

Posted by "Kumar, Deepak8 " <de...@citi.com>.

Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Kumar, Deepak8 " <de...@citi.com>.

Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Kumar, Deepak8 " <de...@citi.com>.

Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N

RE: Hadoop Realtime Queries

Posted by "Kumar, Deepak8 " <de...@citi.com>.

Hi,
As far as I know, real time queries are only possible using HBase & cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase.

Regards,
Deepak

From: Natarajan, Prabakaran 1. (NSN - IN/Bangalore) [mailto:prabakaran.1.natarajan@nsn.com]
Sent: Thursday, July 31, 2014 3:32 AM
To: user@hadoop.apache.org
Subject: Hadoop Realtime Queries

Hi

I want to perform realtime query on HDFS data.   I tried hadoop/yarnt/hive, shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as possible

1.       How can we tune RHEL OS for this?
2.       How can we tune yarn?
3.       Is there is any stable framework like Tez which can perform much better
4.       Is there is any caching strategy that we can adopt?
5.       Any articles related to this are welcome

Thanks in Advance

Prabakaran.N