You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/04/17 11:52:58 UTC

Moving Hive metastore to Solid State Disks

Hi,

I have had my Hive metastore database on Oracle 11g supporting concurrency
(with added transactional capability)

Over the past few days I created a new schema on Oracle 12c on Solid State
Disks (SSD) and used databump (exdp, imdp) to migrate Hive database from
Oracle 11g to Oracle 12c on SSD.

Couple of years ago I did some work for OLTP operations (many random access
via index scan plus serial scans) for Oracle 11g and SAP ASE 15.7. I
noticed that for Random Access with Index scans the performance improves by
a factor of 20  because of much faster seek time for SSD

https://www.scribd.com/doc/119707722/IOUG-SELECT-Q312-Final

I have recently seen some contention for access resources in Hive database,
so I think going to SSD will improve the performance of Hive in general.

I will look at AWR reports to see how beneficial this set up is as this
Oracle instance is more and less dedicated to Hive.

HTH

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Re: Moving Hive metastore to Solid State Disks

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Jorn,

Sure will do.

What Oracle in-memory  offering does is allow the user to store a *copy* of
selected tables, or partitions, in*columnar* format in-memory within the
Oracle Database memory space. All tables are still present in row format
and all copies on storage are in row format. These columnar copies are not
logged nor are they ever persisted to disk.   The Oracle Database optimizer
is aware of the presence and currency of the in-memory copies and
transparently uses them for any analytical style queries that can benefit
from the vastly faster processing speed. This is all completely transparent
to applications.



The primary use case for this capability is to accelerate the analytics
part of mixed OLTP and Analytical workloads by eliminating the need for
most of the Analytics indexes that are typically found in a database that
supports such a mixed workload. Not only does this speed up the analytical
queries by a huge amount (often 100x or more) but the ability to drop many
of the analytical indexes also has a major benefit for OLTP performance.


Now with regard to Hive database, I am not aware of such mixed load work
case. Additionally one might argue that a better indexing strategy will
benefit Hive database performance compared to in-memory offering or SSD.


I guess I will be doing some investigation on that front as well.


HTH


P.S. Do we have any idea what the largest Hive database (schema)  is in
terms of size? Any published results




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 11:29, Jörn Franke <jo...@gmail.com> wrote:

>
> You could also explore the in-memory database of 12c . However, I am not
> sure how beneficial it is for Oltp scenarios.
>
> I am excited to see how the performance will be on hbase as a hive
> metastore.
>
> Nevertheless, your results on Oracle/SSD will be beneficial for the
> community.
>
> On 17 Apr 2016, at 11:52, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Hi,
>
> I have had my Hive metastore database on Oracle 11g supporting concurrency
> (with added transactional capability)
>
> Over the past few days I created a new schema on Oracle 12c on Solid State
> Disks (SSD) and used databump (exdp, imdp) to migrate Hive database from
> Oracle 11g to Oracle 12c on SSD.
>
> Couple of years ago I did some work for OLTP operations (many random
> access via index scan plus serial scans) for Oracle 11g and SAP ASE 15.7. I
> noticed that for Random Access with Index scans the performance improves by
> a factor of 20  because of much faster seek time for SSD
>
> https://www.scribd.com/doc/119707722/IOUG-SELECT-Q312-Final
>
> I have recently seen some contention for access resources in Hive
> database, so I think going to SSD will improve the performance of Hive in
> general.
>
> I will look at AWR reports to see how beneficial this set up is as this
> Oracle instance is more and less dedicated to Hive.
>
> HTH
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>

Re: Moving Hive metastore to Solid State Disks

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Jorn,

Sure will do.

What Oracle in-memory  offering does is allow the user to store a *copy* of
selected tables, or partitions, in*columnar* format in-memory within the
Oracle Database memory space. All tables are still present in row format
and all copies on storage are in row format. These columnar copies are not
logged nor are they ever persisted to disk.   The Oracle Database optimizer
is aware of the presence and currency of the in-memory copies and
transparently uses them for any analytical style queries that can benefit
from the vastly faster processing speed. This is all completely transparent
to applications.



The primary use case for this capability is to accelerate the analytics
part of mixed OLTP and Analytical workloads by eliminating the need for
most of the Analytics indexes that are typically found in a database that
supports such a mixed workload. Not only does this speed up the analytical
queries by a huge amount (often 100x or more) but the ability to drop many
of the analytical indexes also has a major benefit for OLTP performance.


Now with regard to Hive database, I am not aware of such mixed load work
case. Additionally one might argue that a better indexing strategy will
benefit Hive database performance compared to in-memory offering or SSD.


I guess I will be doing some investigation on that front as well.


HTH


P.S. Do we have any idea what the largest Hive database (schema)  is in
terms of size? Any published results




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 April 2016 at 11:29, Jörn Franke <jo...@gmail.com> wrote:

>
> You could also explore the in-memory database of 12c . However, I am not
> sure how beneficial it is for Oltp scenarios.
>
> I am excited to see how the performance will be on hbase as a hive
> metastore.
>
> Nevertheless, your results on Oracle/SSD will be beneficial for the
> community.
>
> On 17 Apr 2016, at 11:52, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Hi,
>
> I have had my Hive metastore database on Oracle 11g supporting concurrency
> (with added transactional capability)
>
> Over the past few days I created a new schema on Oracle 12c on Solid State
> Disks (SSD) and used databump (exdp, imdp) to migrate Hive database from
> Oracle 11g to Oracle 12c on SSD.
>
> Couple of years ago I did some work for OLTP operations (many random
> access via index scan plus serial scans) for Oracle 11g and SAP ASE 15.7. I
> noticed that for Random Access with Index scans the performance improves by
> a factor of 20  because of much faster seek time for SSD
>
> https://www.scribd.com/doc/119707722/IOUG-SELECT-Q312-Final
>
> I have recently seen some contention for access resources in Hive
> database, so I think going to SSD will improve the performance of Hive in
> general.
>
> I will look at AWR reports to see how beneficial this set up is as this
> Oracle instance is more and less dedicated to Hive.
>
> HTH
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>

Re: Moving Hive metastore to Solid State Disks

Posted by Jörn Franke <jo...@gmail.com>.
You could also explore the in-memory database of 12c . However, I am not sure how beneficial it is for Oltp scenarios.

I am excited to see how the performance will be on hbase as a hive metastore.

Nevertheless, your results on Oracle/SSD will be beneficial for the community.

> On 17 Apr 2016, at 11:52, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Hi,
> 
> I have had my Hive metastore database on Oracle 11g supporting concurrency (with added transactional capability)
> 
> Over the past few days I created a new schema on Oracle 12c on Solid State Disks (SSD) and used databump (exdp, imdp) to migrate Hive database from Oracle 11g to Oracle 12c on SSD.
> 
> Couple of years ago I did some work for OLTP operations (many random access via index scan plus serial scans) for Oracle 11g and SAP ASE 15.7. I noticed that for Random Access with Index scans the performance improves by a factor of 20  because of much faster seek time for SSD
> 
> https://www.scribd.com/doc/119707722/IOUG-SELECT-Q312-Final
> 
> I have recently seen some contention for access resources in Hive database, so I think going to SSD will improve the performance of Hive in general.
> 
> I will look at AWR reports to see how beneficial this set up is as this Oracle instance is more and less dedicated to Hive.
> 
> HTH
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  

Re: Moving Hive metastore to Solid State Disks

Posted by Jörn Franke <jo...@gmail.com>.
You could also explore the in-memory database of 12c . However, I am not sure how beneficial it is for Oltp scenarios.

I am excited to see how the performance will be on hbase as a hive metastore.

Nevertheless, your results on Oracle/SSD will be beneficial for the community.

> On 17 Apr 2016, at 11:52, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Hi,
> 
> I have had my Hive metastore database on Oracle 11g supporting concurrency (with added transactional capability)
> 
> Over the past few days I created a new schema on Oracle 12c on Solid State Disks (SSD) and used databump (exdp, imdp) to migrate Hive database from Oracle 11g to Oracle 12c on SSD.
> 
> Couple of years ago I did some work for OLTP operations (many random access via index scan plus serial scans) for Oracle 11g and SAP ASE 15.7. I noticed that for Random Access with Index scans the performance improves by a factor of 20  because of much faster seek time for SSD
> 
> https://www.scribd.com/doc/119707722/IOUG-SELECT-Q312-Final
> 
> I have recently seen some contention for access resources in Hive database, so I think going to SSD will improve the performance of Hive in general.
> 
> I will look at AWR reports to see how beneficial this set up is as this Oracle instance is more and less dedicated to Hive.
> 
> HTH
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>