You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Dmitry Zagorulkin <za...@me.com> on 2016/11/22 11:16:02 UTC

HIVE-1555 discussion

Hello!

I've implemented simple solution with some hard code by now.
It's tested with oracle database.

{code:sql}
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
Enter password for jdbc:hive2://localhost:10000:
Connected to: Apache Hive (version 2.2.0-SNAPSHOT)
Driver: Hive JDBC (version 2.2.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> SET hive.metastore.warehouse.dir=${env:HOME}/Documents/hive-warehouse;
No rows affected (0.158 seconds)
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE books3 (
. . . . . . . . . . . . . . . .>  book_id            INT,
. . . . . . . . . . . . . . . .>  book_name          STRING,
. . . . . . . . . . . . . . . .>  author_name        STRING,
. . . . . . . . . . . . . . . .>  book_isbn          STRING
. . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . .> STORED BY "org.apache.hive.storagehandler.JDBCStorageHandler"
. . . . . . . . . . . . . . . .> TBLPROPERTIES (
. . . . . . . . . . . . . . . .>  "mapred.jdbc.driver.class" = "oracle.jdbc.OracleDriver",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.url" = "jdbc:oracle:thin:@//localhost:49161/XE",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.username" = "*",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.password" = "*",
. . . . . . . . . . . . . . . .>  "hive.jdbc.update.on.duplicate" = "true",
. . . . . . . . . . . . . . . .>  "mapreduce.jdbc.input.table.name" = "books"
. . . . . . . . . . . . . . . .> );
No rows affected (2.297 seconds)
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000>
0: jdbc:hive2://localhost:10000> select * from books3;
+-----------------+-------------------+---------------------+-------------------+
| books3.book_id  | books3.book_name  | books3.author_name  | books3.book_isbn  |
+-----------------+-------------------+---------------------+-------------------+
| 124123          | name              | author              | 132321adsaf31     |
| 13              | name2             | author2             | asd213fadsf       |
| 2345236         | name3             | author3             | asdfds1234123     |
+-----------------+-------------------+---------------------+-------------------+
3 rows selected (2.146 seconds)
0: jdbc:hive2://localhost:10000> explain select * from books3;
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage                          |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Processor Tree:                              |
|         TableScan                                  |
|           alias: books3                            |
|           Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE |
|           Select Operator                          |
|             expressions: book_id (type: string), book_name (type: string), author_name (type: string), book_isbn (type: string) |
|             outputColumnNames: _col0, _col1, _col2, _col3 |
|             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE |
|             ListSink                               |
|                                                    |
+----------------------------------------------------+
17 rows selected (0.508 seconds)
{code}

This solution works with two steps:
	1. First grab all meta info from external table 
	2. Configure DBInputFormat, DBOutputFormat with table meta

What do you think about to ask user specify all needed information about columns and types inside serde properties section?

Smth like that:

0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE books3 (
. . . . . . . . . . . . . . . .>  book_id            INT,
. . . . . . . . . . . . . . . .>  book_name          STRING,
. . . . . . . . . . . . . . . .>  author_name        STRING,
. . . . . . . . . . . . . . . .>  book_isbn          STRING
. . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . .> STORED BY “org.apache.hive.storagehandler.JDBCStorageHandler"
		WITH SERDEPROPERTIES (
					"hive.jdbc.columns.mapping" = “book_id:int(32), book_name:varchar(20), author_name:varchar(20), book_isbn:varchar(20)")		
. . . . . . . . . . . . . . . .> TBLPROPERTIES (
. . . . . . . . . . . . . . . .>  "mapred.jdbc.driver.class" = "oracle.jdbc.OracleDriver",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.url" = "jdbc:oracle:thin:@//localhost:49161/XE",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.username" = "*",
. . . . . . . . . . . . . . . .>  "mapred.jdbc.password" = "*",
. . . . . . . . . . . . . . . .>  "hive.jdbc.update.on.duplicate" = "true",
. . . . . . . . . . . . . . . .>  "mapreduce.jdbc.input.table.name" = "books"
. . . . . . . . . . . . . . . .> );