You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Alexandre \"TAZ\" dos Santos Andrade" <al...@gmail.com> on 2011/05/04 22:44:48 UTC

Re: Planning a migration from PostgreSQL to Hadoop/Hive

Hi Marcos,

I'm doing exactally the same migration, first of all you have to remember
that hive is gonna make mapreduce for each query you dont write the result
on a table, second is a litle bit anoing to migrate the data, there's no
direct connector so I user a simple dump, extracted the header and footer
and Loaded in hive structure.

I hope I could Help you

Alexandre dos Santos Andrade

2011/5/4 Marcos Ortiz <ml...@uci.cu>

> We are planning a migration from a large PostgreSQL-based DWH to
> Hadoop/Hive. The principal reason for this migration is the massive growth
> of the data to analyze (5.6 TB and growing) where PostgreSQL like a
> MVCC-based RDBMS has its pitfalls with heavy updates and query execution
> with great quantities of data. (We had done many query tunning and
> optimization to the server, with a minor effect on the latency of the
> queries).
>
> So, we have viewed Hadoop and we have done some tests combined with Hive
> and HBase and it´s awesome the obtained performance.
>
> Can you give us some advices to develop a good plan for this?
>
> Environment:
> - O.S:CentOS-5.5 64 bits
> - Java version: 1.6. Update 20
> - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
>                                    8 GB RAM
>                                    1 TB HDD
>
> Regards
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (Large-Scaled Distributed Systems)
>  University of Information Sciences,
>  La Habana, Cuba
>  Linux User # 418229
>  http://about.me/marcosortiz
>
>


-- 
<a href="
http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
src="
http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
border="0" alt="Join Me at CW Connect!"></a>

Re: Planning a migration from PostgreSQL to Hadoop/Hive

Posted by "Alexandre \"TAZ\" dos Santos Andrade" <al...@gmail.com>.
I write a shell script to do that

2011/5/4 Marcos Ortiz <ml...@uci.cu>

>  On 05/04/2011 04:14 PM, Alexandre "TAZ" dos Santos Andrade wrote:
>
> Hi Marcos,
>
> I'm doing exactally the same migration, first of all you have to remember
> that hive is gonna make mapreduce for each query you dont write the result
> on a table, second is a litle bit anoing to migrate the data, there's no
> direct connector so I user a simple dump, extracted the header and footer
> and Loaded in hive structure.
>
> I hope I could Help you
>
> Alexandre dos Santos Andrade
>
> 2011/5/4 Marcos Ortiz <ml...@uci.cu>
>
>> We are planning a migration from a large PostgreSQL-based DWH to
>> Hadoop/Hive. The principal reason for this migration is the massive growth
>> of the data to analyze (5.6 TB and growing) where PostgreSQL like a
>> MVCC-based RDBMS has its pitfalls with heavy updates and query execution
>> with great quantities of data. (We had done many query tunning and
>> optimization to the server, with a minor effect on the latency of the
>> queries).
>>
>> So, we have viewed Hadoop and we have done some tests combined with Hive
>> and HBase and it´s awesome the obtained performance.
>>
>> Can you give us some advices to develop a good plan for this?
>>
>> Environment:
>> - O.S:CentOS-5.5 64 bits
>> - Java version: 1.6. Update 20
>> - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
>>                                    8 GB RAM
>>                                    1 TB HDD
>>
>> Regards
>>
>> --
>> Marcos Luís Ortíz Valmaseda
>>  Software Engineer (Large-Scaled Distributed Systems)
>>  University of Information Sciences,
>>  La Habana, Cuba
>>  Linux User # 418229
>>  http://about.me/marcosortiz
>>
>>
>
>
> --
> <a href="
> http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
> src="
> http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
> border="0" alt="Join Me at CW Connect!"></a>
>
> Thanks a lot, Alexandre.
> Did you use Sqoop to load the data from PostgreSQL to Hive?
>
>
>
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer (Large-Scaled Distributed Systems)
>  University of Information Sciences,
>  La Habana, Cuba
>  Linux User # 418229
>  http://about.me/marcosortiz
>
>


-- 
<a href="
http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img
src="
http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade"
border="0" alt="Join Me at CW Connect!"></a>

Re: Planning a migration from PostgreSQL to Hadoop/Hive

Posted by Marcos Ortiz <ml...@uci.cu>.
On 05/04/2011 04:14 PM, Alexandre "TAZ" dos Santos Andrade wrote:
> Hi Marcos,
>
> I'm doing exactally the same migration, first of all you have to 
> remember that hive is gonna make mapreduce for each query you dont 
> write the result on a table, second is a litle bit anoing to migrate 
> the data, there's no direct connector so I user a simple dump, 
> extracted the header and footer and Loaded in hive structure.
>
> I hope I could Help you
>
> Alexandre dos Santos Andrade
>
> 2011/5/4 Marcos Ortiz <mlortiz@uci.cu <ma...@uci.cu>>
>
>     We are planning a migration from a large PostgreSQL-based DWH to
>     Hadoop/Hive. The principal reason for this migration is the
>     massive growth of the data to analyze (5.6 TB and growing) where
>     PostgreSQL like a MVCC-based RDBMS has its pitfalls with heavy
>     updates and query execution with great quantities of data. (We had
>     done many query tunning and optimization to the server, with a
>     minor effect on the latency of the queries).
>
>     So, we have viewed Hadoop and we have done some tests combined
>     with Hive and HBase and it´s awesome the obtained performance.
>
>     Can you give us some advices to develop a good plan for this?
>
>     Environment:
>     - O.S:CentOS-5.5 64 bits
>     - Java version: 1.6. Update 20
>     - Hardware: 8 Nodes - AMD Opteron QuadCore 4130
>                                        8 GB RAM
>                                        1 TB HDD
>
>     Regards
>
>     -- 
>     Marcos Luís Ortíz Valmaseda
>      Software Engineer (Large-Scaled Distributed Systems)
>      University of Information Sciences,
>      La Habana, Cuba
>      Linux User # 418229
>     http://about.me/marcosortiz
>
>
>
>
> -- 
> <a 
> href="http://cwconnect.computerworld.com.br/profile_view.aspx?customerid=alexandreandrade"><img 
> src="http://cwconnect.computerworld.com.br/businesscard.aspx?customerid=alexandreandrade" 
> border="0" alt="Join Me at CW Connect!"></a>
Thanks a lot, Alexandre.
Did you use Sqoop to load the data from PostgreSQL to Hive?



-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz