You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Ashok Kumar <as...@yahoo.com> on 2016/01/31 14:06:53 UTC

Importing Oracle data into Hive

  Hi,
What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows.
I am running Hive 1.2.1
regards

Re: Importing Oracle data into Hive

Posted by Jörn Franke <jo...@gmail.com>.

Well, you can create an empty Hive table in Orc format and use --hive-override in sqoop

Alternatively you can use --hive-import and set hive.default.format

I recommend to define the schema properly on the command line, because sqoop detection of formats is based on jdbc (Java) types which is not optimal. For example, a decimal(20,2) in oracle should be a decimal(20,2) in hive and not a double.

This may also a good opportunity to review the database model in general. Especially for keys one should use a numeric type and not varchar. This saves a lot of space in the column and can be looked up much faster. 

> On 31 Jan 2016, at 14:15, Ashok Kumar <as...@yahoo.com> wrote:
> 
> Thanks,
> 
> Can sqoop create this table as ORC in Hive?
> 
> 
> On Sunday, 31 January 2016, 13:13, Ashok Kumar <as...@yahoo.com> wrote:
> 
> 
> Thanks.
> 
> Can sqoop create this table as ORC in Hive?
> 
> 
> On Sunday, 31 January 2016, 13:11, Nitin Pawar <ni...@gmail.com> wrote:
> 
> 
> check sqoop 
> 
> On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <as...@yahoo.com> wrote:
>   Hi,
> 
> What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows.
> 
> I am running Hive 1.2.1
> 
> regards
> 
> 
> 
> 
> 
> -- 
> Nitin Pawar
> 
> 
> 
>

Re: Importing Oracle data into Hive

Posted by Ashok Kumar <as...@yahoo.com>.

Thank you Mich and Jorn foryour help. Very useful indeed.
 

   

 On Sunday, 31 January 2016, 13:43, Mich Talebzadeh <mi...@peridale.co.uk> wrote:
 

 #yiv6166657167 -- filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv6166657167 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv6166657167 filtered {}#yiv6166657167 p.yiv6166657167MsoNormal, #yiv6166657167 li.yiv6166657167MsoNormal, #yiv6166657167 div.yiv6166657167MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv6166657167 a:link, #yiv6166657167 span.yiv6166657167MsoHyperlink {color:blue;text-decoration:underline;}#yiv6166657167 a:visited, #yiv6166657167 span.yiv6166657167MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv6166657167 p.yiv6166657167msonormal0, #yiv6166657167 li.yiv6166657167msonormal0, #yiv6166657167 div.yiv6166657167msonormal0 {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv6166657167 span.yiv6166657167EmailStyle18 {color:windowtext;font-weight:normal;font-style:normal;text-decoration:none none;}#yiv6166657167 .yiv6166657167MsoChpDefault {font-size:10.0pt;}#yiv6166657167 filtered {margin:72.0pt 72.0pt 72.0pt 72.0pt;}#yiv6166657167 div.yiv6166657167WordSection1 {}#yiv6166657167 You will need to have Oracle Database 11g  JDBC Driver ojdbc6.jar installed in $SQOOP_HOME/lib. You can download it from here  The approach I prefer is to let Sqoop import it as a text file to a staging table and then insert/select into an ORC table from the staging table.  sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb" --username scratchpad -P \   --query "select * from scratchpad.dummy where \   \$CONDITIONS" \    -split-by id \   -hive-import -hive-table "test.dummy_staging" --target-dir "/a/b/c/dummy_staging" --create-hive-table    Once the table staging is created you can then insert/select to an Orc table of your definition and make sure that the schema is clearly defined as you wish. For example you have to cater for date fields or columns that are varchar as opposed to String.  Case in point  The source table schema in Oracle is      CREATE TABLE "SCRATCHPAD"."DUMMY"    (  "ID" NUMBER,       "CLUSTERED" NUMBER,       "SCATTERED" NUMBER,       "RANDOMISED" NUMBER,       "RANDOM_STRING" VARCHAR2(50 BYTE),       "SMALL_VC" VARCHAR2(10 BYTE),       "PADDING" VARCHAR2(10 BYTE),        CONSTRAINT "DUMMY_PK" PRIMARY KEY ("ID")  )  The staging table dummy_staging  is generated by Sqoop:  desc dummy_staging;+----------------+------------+----------+--+|    col_name    | data_type  | comment  |+----------------+------------+----------+--+| id             | double     |          || clustered      | double     |          || scattered      | double     |          || randomised     | double     |          || random_string  | string     |          || small_vc       | string     |          || padding        | string     |          |+----------------+------------+----------+--+  Your ORC table may look like:  desc dummy;+----------------+--------------+----------+--+|    col_name    |  data_type   | comment  |+----------------+--------------+----------+--+| id             | int          |          || clustered      | int          |          || scattered      | int          |          || randomised     | int          |          || random_string  | varchar(50)  |          || small_vc       | varchar(10)  |          || padding        | varchar(10)  |          |+----------------+--------------+----------+--+    This also translates to Extract Load Transfer (ELT)) methodology which I prefer.  HTH  Dr Mich Talebzadeh  LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw  Sybase ASE 15 Gold Medal Award 2008A Winning Strategy: Running the most Critical Financial Data on ASE 15http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdfAuthor of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4Publications due shortly:Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly  http://talebzadehmich.wordpress.com/  NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.  From: Ashok Kumar [mailto:ashok34668@yahoo.com] 
Sent: 31 January 2016 13:15
To: User <us...@hive.apache.org>
Subject: Re: Importing Oracle data into Hive  Thanks,  Can sqoop create this table as ORC in Hive?  On Sunday, 31 January 2016, 13:13, Ashok Kumar <as...@yahoo.com> wrote:  Thanks.  Can sqoop create this table as ORC in Hive?  On Sunday, 31 January 2016, 13:11, Nitin Pawar <ni...@gmail.com> wrote:  check sqoop   On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <as...@yahoo.com> wrote:
  Hi,  What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows.  I am running Hive 1.2.1  regards    



-- Nitin Pawar

RE: Importing Oracle data into Hive

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

You will need to have Oracle Database 11g  JDBC Driver ojdbc6.jar installed in $SQOOP_HOME/lib. You can download it from here <http://www.oracle.com/technetwork/apps-tech/jdbc-112010-090769.html> 

 

The approach I prefer is to let Sqoop import it as a text file to a staging table and then insert/select into an ORC table from the staging table.

 

sqoop import --connect "jdbc:oracle:thin:@rhes564:1521:mydb" --username scratchpad -P \

   --query "select * from scratchpad.dummy where \

   \$CONDITIONS" \

    -split-by id \

   -hive-import -hive-table "test.dummy_staging" --target-dir "/a/b/c/dummy_staging" --create-hive-table

 

 

Once the table staging is created you can then insert/select to an Orc table of your definition and make sure that the schema is clearly defined as you wish. For example you have to cater for date fields or columns that are varchar as opposed to String.

 

Case in point

 

The source table schema in Oracle is

 

 

  CREATE TABLE "SCRATCHPAD"."DUMMY" 

   (  "ID" NUMBER, 

      "CLUSTERED" NUMBER, 

      "SCATTERED" NUMBER, 

      "RANDOMISED" NUMBER, 

      "RANDOM_STRING" VARCHAR2(50 BYTE), 

      "SMALL_VC" VARCHAR2(10 BYTE), 

      "PADDING" VARCHAR2(10 BYTE), 

       CONSTRAINT "DUMMY_PK" PRIMARY KEY ("ID")

  )

 

The staging table dummy_staging  is generated by Sqoop:

 

desc dummy_staging;

+----------------+------------+----------+--+

|    col_name    | data_type  | comment  |

+----------------+------------+----------+--+

| id             | double     |          |

| clustered      | double     |          |

| scattered      | double     |          |

| randomised     | double     |          |

| random_string  | string     |          |

| small_vc       | string     |          |

| padding        | string     |          |

+----------------+------------+----------+--+

 

Your ORC table may look like:

 

desc dummy;

+----------------+--------------+----------+--+

|    col_name    |  data_type   | comment  |

+----------------+--------------+----------+--+

| id             | int          |          |

| clustered      | int          |          |

| scattered      | int          |          |

| randomised     | int          |          |

| random_string  | varchar(50)  |          |

| small_vc       | varchar(10)  |          |

| padding        | varchar(10)  |          |

+----------------+--------------+----------+--+

 

 

This also translates to Extract Load Transfer (ELT)) methodology which I prefer.

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Ashok Kumar [mailto:ashok34668@yahoo.com] 
Sent: 31 January 2016 13:15
To: User <us...@hive.apache.org>
Subject: Re: Importing Oracle data into Hive

 

Thanks,

 

Can sqoop create this table as ORC in Hive?

 

On Sunday, 31 January 2016, 13:13, Ashok Kumar <ashok34668@yahoo.com <ma...@yahoo.com> > wrote:

 

Thanks.

 

Can sqoop create this table as ORC in Hive?

 

On Sunday, 31 January 2016, 13:11, Nitin Pawar <nitinpawar432@gmail.com <ma...@gmail.com> > wrote:

 

check sqoop 

 

On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <ashok34668@yahoo.com <ma...@yahoo.com> > wrote:

  Hi,

 

What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows.

 

I am running Hive 1.2.1

 

regards

 

 




-- 

Nitin Pawar

Re: Importing Oracle data into Hive

Posted by Ashok Kumar <as...@yahoo.com>.

Thanks,
Can sqoop create this table as ORC in Hive? 

    On Sunday, 31 January 2016, 13:13, Ashok Kumar <as...@yahoo.com> wrote:
 

 Thanks.
Can sqoop create this table as ORC in Hive? 

    On Sunday, 31 January 2016, 13:11, Nitin Pawar <ni...@gmail.com> wrote:
 

 check sqoop 

On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <as...@yahoo.com> wrote:

  Hi,
What is the easiest method of importing data from an Oracle 11g table to Hive please? This will be a weekly periodic job. The source table has 20 million rows.
I am running Hive 1.2.1
regards





-- 
Nitin Pawar

Re: Importing Oracle data into Hive

Posted by Nitin Pawar <ni...@gmail.com>.

check sqoop

On Sun, Jan 31, 2016 at 6:36 PM, Ashok Kumar <as...@yahoo.com> wrote:

>   Hi,
>
> What is the easiest method of importing data from an Oracle 11g table to
> Hive please? This will be a weekly periodic job. The source table has 20
> million rows.
>
> I am running Hive 1.2.1
>
> regards
>
>
>


-- 
Nitin Pawar