You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Prakash Hosalli <pr...@syncoms.com> on 2014/09/09 16:19:52 UTC

Hive or Phoenix

Hi,


                Is phoenix as any security layer in it. As we have in hive.

                Getting confuse to go forward with Phoenix or Hive in production environment in my company.




Thanks  & Regards,
Prakash Hosalli
Syncoms Bangalore India.

Re: Hive or Phoenix

Posted by James Taylor <ja...@apache.org>.

Hi Siddharth,
If your data fits into memory, then I'd recommend using a RDBMS. They work
great when they can meet your scaling requirements.
Thanks,
James

On Thursday, September 11, 2014, Siddharth Ubale <
siddharth.ubale@syncoms.com> wrote:

>  Hi Anil,
>
>
>
> Thanks for the concise reply.
>
> Just wanted to take the conversation further and understand what benefits
> would phoenix offer in the scenario where we can employ a in memory system
> like Apache spark or Impala on top of hive to reduce latency?
>
> I am asking cos then security could be handled better….
>
> Please do share your views.
>
>
>
> Thanks,
>
> Siddharth Ubale
>
>
>
>
>
> *From:* anil gupta [mailto:anilgupta84@gmail.com
> <javascript:_e(%7B%7D,'cvml','anilgupta84@gmail.com');>]
> *Sent:* Wednesday, September 10, 2014 9:50 PM
> *To:* Prakash Hosalli
> *Cc:* user@phoenix.apache.org
> <javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org');>
> *Subject:* Re: Hive or Phoenix
>
>
>
> Hi Prakash,
>
> Here is the url for performance comparison:
> http://phoenix.apache.org/performance.html
>
> Thanks,
> Anil Gupta
>
>
>
> On Wed, Sep 10, 2014 at 9:16 AM, anil gupta <anilgupta84@gmail.com
> <javascript:_e(%7B%7D,'cvml','anilgupta84@gmail.com');>> wrote:
>
>  Hi Prakash,
>
> Please find my reply inline.
>
>
>
> On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
> prakash.hosalli@syncoms.com
> <javascript:_e(%7B%7D,'cvml','prakash.hosalli@syncoms.com');>> wrote:
>
> Hi James/Anil,
>
>
>         Regarding the questions you put forward,
>
> 1.      Yes we will stored data in Hbase,
> 2.      Hive will run over Hbase.
>
>  Anil: I am not aware of your use case to say how much you can do with
> OOTB(Out of the Box) features of Hive and HBase integration. But, when i
> tried to use Hive with HBase i could not use it because Hive does not
> supports querying a table that has composite rowkeys. In an production
> environment, most of the times users have composite rowkeys. Obviously, you
> can patch Hive-HBase integration to make it better. Please keep in mind
> that Hive is not designed to support HBase(HBase integration is just a
> small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
> so you will get much much better integration and optimization of HBase
> query.
>
> 3.      We will be using large amount of data (approximately 10 Million of
> rows/daily to be process).
>
>  Anil: What kind of processing you will be doing? If you are doing simple
> aggregates, that is already supported by Phoenix. You can also have a look
> a Phoenix-Pig integration to leverage more analytical power of Pig(Although
> Pig is a data flow language and Hive is declarative but you get Pig
> integration OOTB.)
>
> 4.      Right now we have both options open, but primarily we plan to use
> Hive table to serve client request/query on aggregated data.
>
>  Anil: People primarily use Hive for SQL querying, same can be achieved
> in a better way with Phoenix(especially when HBase is your storage).
>
> 5.      We plan to employ all type of query & we plan to achieve high
> level of low latency.
>
>  Anil: Phoenix will provide you much better performance on HBase.
>
>
>         If I understand correctly phoenix will just connect to Hbase
> securely & rely on the Hbase API to extract query reply, therefore Phoenix
> will depend on security mechanisms employed by Hbase API & will not provide
> any security feature by itself.
>
>  Anil: Yes, that is true. At present, Phoenix does not provides mechanism
> to grant/revoke/create/add users. Same can be done using HBase shell and
> phoenix will honor those changes. Phoenix is open source so a patch is
> always appreciated for new features.
>
>
>         Kindly correct me if my understanding is wrong.
>
>
> Thanks & Regards,
> Prakash Hosalli
>
>   -----Original Message-----
> From: James Taylor [mailto:jamestaylor@apache.org
> <javascript:_e(%7B%7D,'cvml','jamestaylor@apache.org');>]
> Sent: Tuesday, September 09, 2014 11:56 PM
> To: user; anil gupta
> Subject: Re: Hive or Phoenix
>
> Hi Prakash,
> If possible, it'd be helpful if you could describe your use case a bit.
>
> Some questions I'd have for you: is the data over which you'd query stored
> in HBase? And if so, would the Hive run over the HBase data? Is the data
> read-only or does it mutate? How much data are we talking about
> (approximately) and what would your typical queries be: point look-ups,
> range scans, or full table scans?
>
> As far as security, HBase provides some more fine grained mechanisms as
> well which you could leverage through HBase APIs. Other than the ability to
> connect to a secure cluster through the connection URL, Phoenix doesn't yet
> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>
> Thanks,
> James
>
> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
> nmaillard@hortonworks.com
> <javascript:_e(%7B%7D,'cvml','nmaillard@hortonworks.com');>> wrote:
> > Hello Prakash
> >
> > Considering Hive or Phoenix is a little misleading they di serve
> > different needs, let me break it down as I can.
> >
> > You mention security:
> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
> > Hive Atz has a more fine grained authorization model. So from that
> > perspective Hive has more features.
> >
> > Query performance
> > On the performance side Phoenix has random read,write access where
> > Hive is a full data access, so no way to read a particular entry
> > unless you read the whole associated file.
> > So Hive is batch or interactive, meaning a couple of tens of seconds
> > to get your answer, where Phoenix can be sub second, the response time
> > will depend greatly on wether part of the pheonix key is in your
> > query. I you do a full table scan response time will suffer. Granted
> > secondary indexes could help you there.
> >
> > SQL Semantics
> > Hive currently has a more rich sql semantics with analytics functions,
> > complex types etc...
> > Phoenix is also more limited than Hive in joins or UDFS
> >
> > So I would use Hive for large data, random analysis and ETL, and pay
> > the price of the response time a little.
> > Phoenix on the other hand is great for large volumes of data where you
> > can set up your schema and especially keys according to specific needs
> > and query patterns, in this situation you would get great query
> performance.
> >
> > To sum up in all honesty both are needed
> >
> > Hope this helps
> >
> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> > <prakash.hosalli@syncoms.com
> <javascript:_e(%7B%7D,'cvml','prakash.hosalli@syncoms.com');>> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >>
> >>
> >>                 Is phoenix as any security layer in it. As we have in
> >> hive.
> >>
> >>
> >>
> >>                 Getting confuse to go forward with Phoenix or Hive in
> >> production environment in my company.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks  & Regards,
> >>
> >> Prakash Hosalli
> >>
> >> Syncoms Bangalore India.
> >>
> >>
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

RE: Hive or Phoenix

Posted by Siddharth Ubale <si...@syncoms.com>.

Hi Anil,

Thanks for the concise reply.
Just wanted to take the conversation further and understand what benefits would phoenix offer in the scenario where we can employ a in memory system like Apache spark or Impala on top of hive to reduce latency?
I am asking cos then security could be handled better….
Please do share your views.

Thanks,
Siddharth Ubale

From: anil gupta [mailto:anilgupta84@gmail.com]
Sent: Wednesday, September 10, 2014 9:50 PM
To: Prakash Hosalli
Cc: user@phoenix.apache.org
Subject: Re: Hive or Phoenix

Hi Prakash,
Here is the url for performance comparison: http://phoenix.apache.org/performance.html
Thanks,
Anil Gupta

On Wed, Sep 10, 2014 at 9:16 AM, anil gupta <an...@gmail.com>> wrote:
Hi Prakash,
Please find my reply inline.

On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <pr...@syncoms.com>> wrote:
Hi James/Anil,

        Regarding the questions you put forward,

1.      Yes we will stored data in Hbase,
2.      Hive will run over Hbase.
Anil: I am not aware of your use case to say how much you can do with OOTB(Out of the Box) features of Hive and HBase integration. But, when i tried to use Hive with HBase i could not use it because Hive does not supports querying a table that has composite rowkeys. In an production environment, most of the times users have composite rowkeys. Obviously, you can patch Hive-HBase integration to make it better. Please keep in mind that Hive is not designed to support HBase(HBase integration is just a small feature of Hive). In contrast, Phoenix is designed on "Top of HBase" so you will get much much better integration and optimization of HBase query.
3.      We will be using large amount of data (approximately 10 Million of rows/daily to be process).
Anil: What kind of processing you will be doing? If you are doing simple aggregates, that is already supported by Phoenix. You can also have a look a Phoenix-Pig integration to leverage more analytical power of Pig(Although Pig is a data flow language and Hive is declarative but you get Pig integration OOTB.)
4.      Right now we have both options open, but primarily we plan to use Hive table to serve client request/query on aggregated data.
Anil: People primarily use Hive for SQL querying, same can be achieved in a better way with Phoenix(especially when HBase is your storage).
5.      We plan to employ all type of query & we plan to achieve high level of low latency.
Anil: Phoenix will provide you much better performance on HBase.

        If I understand correctly phoenix will just connect to Hbase securely & rely on the Hbase API to extract query reply, therefore Phoenix will depend on security mechanisms employed by Hbase API & will not provide any security feature by itself.
Anil: Yes, that is true. At present, Phoenix does not provides mechanism to grant/revoke/create/add users. Same can be done using HBase shell and phoenix will honor those changes. Phoenix is open source so a patch is always appreciated for new features.

        Kindly correct me if my understanding is wrong.

Thanks & Regards,
Prakash Hosalli

-----Original Message-----
From: James Taylor [mailto:jamestaylor@apache.org<ma...@apache.org>]
Sent: Tuesday, September 09, 2014 11:56 PM
To: user; anil gupta
Subject: Re: Hive or Phoenix

Hi Prakash,
If possible, it'd be helpful if you could describe your use case a bit.

Some questions I'd have for you: is the data over which you'd query stored in HBase? And if so, would the Hive run over the HBase data? Is the data read-only or does it mutate? How much data are we talking about (approximately) and what would your typical queries be: point look-ups, range scans, or full table scans?

As far as security, HBase provides some more fine grained mechanisms as well which you could leverage through HBase APIs. Other than the ability to connect to a secure cluster through the connection URL, Phoenix doesn't yet provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging Phoenix + security in HBase. Anil Gupta can likely tell you more.

Thanks,
James

On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <nm...@hortonworks.com>> wrote:
> Hello Prakash
>
> Considering Hive or Phoenix is a little misleading they di serve
> different needs, let me break it down as I can.
>
> You mention security:
> Phoenix and hive both work on a secured Hadoop cluster, but Hive with
> Hive Atz has a more fine grained authorization model. So from that
> perspective Hive has more features.
>
> Query performance
> On the performance side Phoenix has random read,write access where
> Hive is a full data access, so no way to read a particular entry
> unless you read the whole associated file.
> So Hive is batch or interactive, meaning a couple of tens of seconds
> to get your answer, where Phoenix can be sub second, the response time
> will depend greatly on wether part of the pheonix key is in your
> query. I you do a full table scan response time will suffer. Granted
> secondary indexes could help you there.
>
> SQL Semantics
> Hive currently has a more rich sql semantics with analytics functions,
> complex types etc...
> Phoenix is also more limited than Hive in joins or UDFS
>
> So I would use Hive for large data, random analysis and ETL, and pay
> the price of the response time a little.
> Phoenix on the other hand is great for large volumes of data where you
> can set up your schema and especially keys according to specific needs
> and query patterns, in this situation you would get great query performance.
>
> To sum up in all honesty both are needed
>
> Hope this helps
>
> On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> <pr...@syncoms.com>> wrote:
>>
>>
>>
>> Hi,
>>
>>
>>
>>
>>
>>                 Is phoenix as any security layer in it. As we have in
>> hive.
>>
>>
>>
>>                 Getting confuse to go forward with Phoenix or Hive in
>> production environment in my company.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks  & Regards,
>>
>> Prakash Hosalli
>>
>> Syncoms Bangalore India.
>>
>>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or
> entity to which it is addressed and may contain information that is
> confidential, privileged and exempt from disclosure under applicable
> law. If the reader of this message is not the intended recipient, you
> are hereby notified that any printing, copying, dissemination,
> distribution, disclosure or forwarding of this communication is
> strictly prohibited. If you have received this communication in error,
> please contact the sender immediately and delete it from your system. Thank You.

--
Thanks & Regards,
Anil Gupta

--
Thanks & Regards,
Anil Gupta

Re: Hive or Phoenix

Posted by anil gupta <an...@gmail.com>.

Hi Prakash,

Here is the url for performance comparison:
http://phoenix.apache.org/performance.html

Thanks,
Anil Gupta

On Wed, Sep 10, 2014 at 9:16 AM, anil gupta <an...@gmail.com> wrote:

> Hi Prakash,
>
> Please find my reply inline.
>
> On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
> prakash.hosalli@syncoms.com> wrote:
>
>> Hi James/Anil,
>>
>>
>>         Regarding the questions you put forward,
>>
>> 1.      Yes we will stored data in Hbase,
>> 2.      Hive will run over Hbase.
>>
> Anil: I am not aware of your use case to say how much you can do with
> OOTB(Out of the Box) features of Hive and HBase integration. But, when i
> tried to use Hive with HBase i could not use it because Hive does not
> supports querying a table that has composite rowkeys. In an production
> environment, most of the times users have composite rowkeys. Obviously, you
> can patch Hive-HBase integration to make it better. Please keep in mind
> that Hive is not designed to support HBase(HBase integration is just a
> small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
> so you will get much much better integration and optimization of HBase
> query.
>
>> 3.      We will be using large amount of data (approximately 10 Million
>> of rows/daily to be process).
>>
> Anil: What kind of processing you will be doing? If you are doing simple
> aggregates, that is already supported by Phoenix. You can also have a look
> a Phoenix-Pig integration to leverage more analytical power of Pig(Although
> Pig is a data flow language and Hive is declarative but you get Pig
> integration OOTB.)
>
>> 4.      Right now we have both options open, but primarily we plan to use
>> Hive table to serve client request/query on aggregated data.
>>
> Anil: People primarily use Hive for SQL querying, same can be achieved in
> a better way with Phoenix(especially when HBase is your storage).
>
>> 5.      We plan to employ all type of query & we plan to achieve high
>> level of low latency.
>>
> Anil: Phoenix will provide you much better performance on HBase.
>
>>
>>         If I understand correctly phoenix will just connect to Hbase
>> securely & rely on the Hbase API to extract query reply, therefore Phoenix
>> will depend on security mechanisms employed by Hbase API & will not provide
>> any security feature by itself.
>>
> Anil: Yes, that is true. At present, Phoenix does not provides mechanism
> to grant/revoke/create/add users. Same can be done using HBase shell and
> phoenix will honor those changes. Phoenix is open source so a patch is
> always appreciated for new features.
>
>>
>>         Kindly correct me if my understanding is wrong.
>>
>>
>> Thanks & Regards,
>> Prakash Hosalli
>>
>>
>> -----Original Message-----
>> From: James Taylor [mailto:jamestaylor@apache.org]
>> Sent: Tuesday, September 09, 2014 11:56 PM
>> To: user; anil gupta
>> Subject: Re: Hive or Phoenix
>>
>> Hi Prakash,
>> If possible, it'd be helpful if you could describe your use case a bit.
>>
>> Some questions I'd have for you: is the data over which you'd query
>> stored in HBase? And if so, would the Hive run over the HBase data? Is the
>> data read-only or does it mutate? How much data are we talking about
>> (approximately) and what would your typical queries be: point look-ups,
>> range scans, or full table scans?
>>
>> As far as security, HBase provides some more fine grained mechanisms as
>> well which you could leverage through HBase APIs. Other than the ability to
>> connect to a secure cluster through the connection URL, Phoenix doesn't yet
>> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
>> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>>
>> Thanks,
>> James
>>
>> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
>> nmaillard@hortonworks.com> wrote:
>> > Hello Prakash
>> >
>> > Considering Hive or Phoenix is a little misleading they di serve
>> > different needs, let me break it down as I can.
>> >
>> > You mention security:
>> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
>> > Hive Atz has a more fine grained authorization model. So from that
>> > perspective Hive has more features.
>> >
>> > Query performance
>> > On the performance side Phoenix has random read,write access where
>> > Hive is a full data access, so no way to read a particular entry
>> > unless you read the whole associated file.
>> > So Hive is batch or interactive, meaning a couple of tens of seconds
>> > to get your answer, where Phoenix can be sub second, the response time
>> > will depend greatly on wether part of the pheonix key is in your
>> > query. I you do a full table scan response time will suffer. Granted
>> > secondary indexes could help you there.
>> >
>> > SQL Semantics
>> > Hive currently has a more rich sql semantics with analytics functions,
>> > complex types etc...
>> > Phoenix is also more limited than Hive in joins or UDFS
>> >
>> > So I would use Hive for large data, random analysis and ETL, and pay
>> > the price of the response time a little.
>> > Phoenix on the other hand is great for large volumes of data where you
>> > can set up your schema and especially keys according to specific needs
>> > and query patterns, in this situation you would get great query
>> performance.
>> >
>> > To sum up in all honesty both are needed
>> >
>> > Hope this helps
>> >
>> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
>> > <pr...@syncoms.com> wrote:
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>                 Is phoenix as any security layer in it. As we have in
>> >> hive.
>> >>
>> >>
>> >>
>> >>                 Getting confuse to go forward with Phoenix or Hive in
>> >> production environment in my company.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Thanks  & Regards,
>> >>
>> >> Prakash Hosalli
>> >>
>> >> Syncoms Bangalore India.
>> >>
>> >>
>> >
>> >
>> >
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or
>> > entity to which it is addressed and may contain information that is
>> > confidential, privileged and exempt from disclosure under applicable
>> > law. If the reader of this message is not the intended recipient, you
>> > are hereby notified that any printing, copying, dissemination,
>> > distribution, disclosure or forwarding of this communication is
>> > strictly prohibited. If you have received this communication in error,
>> > please contact the sender immediately and delete it from your system.
>> Thank You.
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: Hive or Phoenix

Posted by anil gupta <an...@gmail.com>.

Hi Prakash,

Please find my reply inline.

On Tue, Sep 9, 2014 at 11:28 PM, Prakash Hosalli <
prakash.hosalli@syncoms.com> wrote:

> Hi James/Anil,
>
>
>         Regarding the questions you put forward,
>
> 1.      Yes we will stored data in Hbase,
> 2.      Hive will run over Hbase.
>
Anil: I am not aware of your use case to say how much you can do with
OOTB(Out of the Box) features of Hive and HBase integration. But, when i
tried to use Hive with HBase i could not use it because Hive does not
supports querying a table that has composite rowkeys. In an production
environment, most of the times users have composite rowkeys. Obviously, you
can patch Hive-HBase integration to make it better. Please keep in mind
that Hive is not designed to support HBase(HBase integration is just a
small feature of Hive). In contrast, Phoenix is designed on "Top of HBase"
so you will get much much better integration and optimization of HBase
query.

> 3.      We will be using large amount of data (approximately 10 Million of
> rows/daily to be process).
>
Anil: What kind of processing you will be doing? If you are doing simple
aggregates, that is already supported by Phoenix. You can also have a look
a Phoenix-Pig integration to leverage more analytical power of Pig(Although
Pig is a data flow language and Hive is declarative but you get Pig
integration OOTB.)

> 4.      Right now we have both options open, but primarily we plan to use
> Hive table to serve client request/query on aggregated data.
>
Anil: People primarily use Hive for SQL querying, same can be achieved in a
better way with Phoenix(especially when HBase is your storage).

> 5.      We plan to employ all type of query & we plan to achieve high
> level of low latency.
>
Anil: Phoenix will provide you much better performance on HBase.

>
>         If I understand correctly phoenix will just connect to Hbase
> securely & rely on the Hbase API to extract query reply, therefore Phoenix
> will depend on security mechanisms employed by Hbase API & will not provide
> any security feature by itself.
>
Anil: Yes, that is true. At present, Phoenix does not provides mechanism to
grant/revoke/create/add users. Same can be done using HBase shell and
phoenix will honor those changes. Phoenix is open source so a patch is
always appreciated for new features.

>
>         Kindly correct me if my understanding is wrong.
>
>
> Thanks & Regards,
> Prakash Hosalli
>
>
> -----Original Message-----
> From: James Taylor [mailto:jamestaylor@apache.org]
> Sent: Tuesday, September 09, 2014 11:56 PM
> To: user; anil gupta
> Subject: Re: Hive or Phoenix
>
> Hi Prakash,
> If possible, it'd be helpful if you could describe your use case a bit.
>
> Some questions I'd have for you: is the data over which you'd query stored
> in HBase? And if so, would the Hive run over the HBase data? Is the data
> read-only or does it mutate? How much data are we talking about
> (approximately) and what would your typical queries be: point look-ups,
> range scans, or full table scans?
>
> As far as security, HBase provides some more fine grained mechanisms as
> well which you could leverage through HBase APIs. Other than the ability to
> connect to a secure cluster through the connection URL, Phoenix doesn't yet
> provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging
> Phoenix + security in HBase. Anil Gupta can likely tell you more.
>
> Thanks,
> James
>
> On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <
> nmaillard@hortonworks.com> wrote:
> > Hello Prakash
> >
> > Considering Hive or Phoenix is a little misleading they di serve
> > different needs, let me break it down as I can.
> >
> > You mention security:
> > Phoenix and hive both work on a secured Hadoop cluster, but Hive with
> > Hive Atz has a more fine grained authorization model. So from that
> > perspective Hive has more features.
> >
> > Query performance
> > On the performance side Phoenix has random read,write access where
> > Hive is a full data access, so no way to read a particular entry
> > unless you read the whole associated file.
> > So Hive is batch or interactive, meaning a couple of tens of seconds
> > to get your answer, where Phoenix can be sub second, the response time
> > will depend greatly on wether part of the pheonix key is in your
> > query. I you do a full table scan response time will suffer. Granted
> > secondary indexes could help you there.
> >
> > SQL Semantics
> > Hive currently has a more rich sql semantics with analytics functions,
> > complex types etc...
> > Phoenix is also more limited than Hive in joins or UDFS
> >
> > So I would use Hive for large data, random analysis and ETL, and pay
> > the price of the response time a little.
> > Phoenix on the other hand is great for large volumes of data where you
> > can set up your schema and especially keys according to specific needs
> > and query patterns, in this situation you would get great query
> performance.
> >
> > To sum up in all honesty both are needed
> >
> > Hope this helps
> >
> > On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> > <pr...@syncoms.com> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >>
> >>
> >>                 Is phoenix as any security layer in it. As we have in
> >> hive.
> >>
> >>
> >>
> >>                 Getting confuse to go forward with Phoenix or Hive in
> >> production environment in my company.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks  & Regards,
> >>
> >> Prakash Hosalli
> >>
> >> Syncoms Bangalore India.
> >>
> >>
> >
> >
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>



-- 
Thanks & Regards,
Anil Gupta

RE: Hive or Phoenix

Posted by Prakash Hosalli <pr...@syncoms.com>.

Hi James/Anil,


	Regarding the questions you put forward, 

1.	Yes we will stored data in Hbase, 
2.	Hive will run over Hbase.
3.	We will be using large amount of data (approximately 10 Million of rows/daily to be process).  	
4.	Right now we have both options open, but primarily we plan to use Hive table to serve client request/query on aggregated data.
5.	We plan to employ all type of query & we plan to achieve high level of low latency.

	If I understand correctly phoenix will just connect to Hbase securely & rely on the Hbase API to extract query reply, therefore Phoenix will depend on security mechanisms employed by Hbase API & will not provide any security feature by itself.

	Kindly correct me if my understanding is wrong.


Thanks & Regards,
Prakash Hosalli


-----Original Message-----
From: James Taylor [mailto:jamestaylor@apache.org] 
Sent: Tuesday, September 09, 2014 11:56 PM
To: user; anil gupta
Subject: Re: Hive or Phoenix

Hi Prakash,
If possible, it'd be helpful if you could describe your use case a bit.

Some questions I'd have for you: is the data over which you'd query stored in HBase? And if so, would the Hive run over the HBase data? Is the data read-only or does it mutate? How much data are we talking about (approximately) and what would your typical queries be: point look-ups, range scans, or full table scans?

As far as security, HBase provides some more fine grained mechanisms as well which you could leverage through HBase APIs. Other than the ability to connect to a secure cluster through the connection URL, Phoenix doesn't yet provide a SQL wrapper on these HBase APIs. This is how Intuit is leveraging Phoenix + security in HBase. Anil Gupta can likely tell you more.

Thanks,
James

On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard <nm...@hortonworks.com> wrote:
> Hello Prakash
>
> Considering Hive or Phoenix is a little misleading they di serve 
> different needs, let me break it down as I can.
>
> You mention security:
> Phoenix and hive both work on a secured Hadoop cluster, but Hive with 
> Hive Atz has a more fine grained authorization model. So from that 
> perspective Hive has more features.
>
> Query performance
> On the performance side Phoenix has random read,write access where 
> Hive is a full data access, so no way to read a particular entry 
> unless you read the whole associated file.
> So Hive is batch or interactive, meaning a couple of tens of seconds 
> to get your answer, where Phoenix can be sub second, the response time 
> will depend greatly on wether part of the pheonix key is in your 
> query. I you do a full table scan response time will suffer. Granted 
> secondary indexes could help you there.
>
> SQL Semantics
> Hive currently has a more rich sql semantics with analytics functions, 
> complex types etc...
> Phoenix is also more limited than Hive in joins or UDFS
>
> So I would use Hive for large data, random analysis and ETL, and pay 
> the price of the response time a little.
> Phoenix on the other hand is great for large volumes of data where you 
> can set up your schema and especially keys according to specific needs 
> and query patterns, in this situation you would get great query performance.
>
> To sum up in all honesty both are needed
>
> Hope this helps
>
> On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli 
> <pr...@syncoms.com> wrote:
>>
>>
>>
>> Hi,
>>
>>
>>
>>
>>
>>                 Is phoenix as any security layer in it. As we have in 
>> hive.
>>
>>
>>
>>                 Getting confuse to go forward with Phoenix or Hive in 
>> production environment in my company.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks  & Regards,
>>
>> Prakash Hosalli
>>
>> Syncoms Bangalore India.
>>
>>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.

Re: Hive or Phoenix

Posted by James Taylor <ja...@apache.org>.

Hi Prakash,
If possible, it'd be helpful if you could describe your use case a bit.

Some questions I'd have for you: is the data over which you'd query
stored in HBase? And if so, would the Hive run over the HBase data? Is
the data read-only or does it mutate? How much data are we talking
about (approximately) and what would your typical queries be: point
look-ups, range scans, or full table scans?

As far as security, HBase provides some more fine grained mechanisms
as well which you could leverage through HBase APIs. Other than the
ability to connect to a secure cluster through the connection URL,
Phoenix doesn't yet provide a SQL wrapper on these HBase APIs. This is
how Intuit is leveraging Phoenix + security in HBase. Anil Gupta can
likely tell you more.

Thanks,
James

On Tue, Sep 9, 2014 at 9:28 AM, Nicolas Maillard
<nm...@hortonworks.com> wrote:
> Hello Prakash
>
> Considering Hive or Phoenix is a little misleading they di serve different
> needs, let me break it down as I can.
>
> You mention security:
> Phoenix and hive both work on a secured Hadoop cluster, but Hive with Hive
> Atz has a more fine grained authorization model. So from that perspective
> Hive has more features.
>
> Query performance
> On the performance side Phoenix has random read,write access where Hive is a
> full data access, so no way to read a particular entry unless you read the
> whole associated file.
> So Hive is batch or interactive, meaning a couple of tens of seconds to get
> your answer, where Phoenix can be sub second, the response time will depend
> greatly on wether part of the pheonix key is in your query. I you do a full
> table scan response time will suffer. Granted secondary indexes could help
> you there.
>
> SQL Semantics
> Hive currently has a more rich sql semantics with analytics functions,
> complex types etc...
> Phoenix is also more limited than Hive in joins or UDFS
>
> So I would use Hive for large data, random analysis and ETL, and pay the
> price of the response time a little.
> Phoenix on the other hand is great for large volumes of data where you can
> set up your schema and especially keys according to specific needs and query
> patterns, in this situation you would get great query performance.
>
> To sum up in all honesty both are needed
>
> Hope this helps
>
> On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli
> <pr...@syncoms.com> wrote:
>>
>>
>>
>> Hi,
>>
>>
>>
>>
>>
>>                 Is phoenix as any security layer in it. As we have in
>> hive.
>>
>>
>>
>>                 Getting confuse to go forward with Phoenix or Hive in
>> production environment in my company.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks  & Regards,
>>
>> Prakash Hosalli
>>
>> Syncoms Bangalore India.
>>
>>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

Re: Hive or Phoenix

Posted by Nicolas Maillard <nm...@hortonworks.com>.

Hello Prakash

Considering Hive or Phoenix is a little misleading they di serve different
needs, let me break it down as I can.

You mention security:
Phoenix and hive both work on a secured Hadoop cluster, but Hive with Hive
Atz has a more fine grained authorization model. So from that perspective
Hive has more features.

Query performance
On the performance side Phoenix has random read,write access where Hive is
a full data access, so no way to read a particular entry unless you read
the whole associated file.
So Hive is batch or interactive, meaning a couple of tens of seconds to get
your answer, where Phoenix can be sub second, the response time will depend
greatly on wether part of the pheonix key is in your query. I you do a full
table scan response time will suffer. Granted secondary indexes could help
you there.

SQL Semantics
Hive currently has a more rich sql semantics with analytics functions,
complex types etc...
Phoenix is also more limited than Hive in joins or UDFS

So I would use Hive for large data, random analysis and ETL, and pay the
price of the response time a little.
Phoenix on the other hand is great for large volumes of data where you can
set up your schema and especially keys according to specific needs and
query patterns, in this situation you would get great query performance.

To sum up in all honesty both are needed

Hope this helps

On Tue, Sep 9, 2014 at 4:19 PM, Prakash Hosalli <prakash.hosalli@syncoms.com
> wrote:

>
>
> Hi,
>
>
>
>
>
>                 Is phoenix as any security layer in it. As we have in hive.
>
>
>
>                 Getting confuse to go forward with Phoenix or Hive in
> production environment in my company.
>
>
>
>
>
>
>
>
>
> Thanks  & Regards,
>
> Prakash Hosalli
>
> Syncoms Bangalore India.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.