You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "YoungWoo Kim (JIRA)" <ji...@apache.org> on 2008/09/07 17:42:44 UTC

[jira] Created: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Support JDBC connections for interoperability between Hive and RDBMS
--------------------------------------------------------------------

                 Key: HADOOP-4101
                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
             Project: Hadoop Core
          Issue Type: Improvement
          Components: contrib/hive
            Reporter: YoungWoo Kim
            Priority: Minor


In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642860#action_12642860 ] 

Raghotham Murthy commented on HADOOP-4101:
------------------------------------------

I had a preliminary set of classes. I didnt get a chance to finish working on them though. Michi has now taken those classes and I believe he has something working now. I'll let him post a patch.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michi Mutsuzaki updated HADOOP-4101:
------------------------------------

    Release Note: JDBC Driver
          Status: Patch Available  (was: Open)

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642983#action_12642983 ] 

Namit Jain commented on HADOOP-4101:
------------------------------------

Michi, did you consider having a client-server approach for the JDBC server ? There is nothing wrong with this approach - infact, this way, the server does not become a single point of failure. 
The client does become thicker, which may be acceptable. I just wanted to know did you consider the pros-cons of that approach.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646348#action_12646348 ] 

Namit Jain commented on HADOOP-4101:
------------------------------------

7pm is fine with me 

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch, hadoop-4101.2.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michi Mutsuzaki updated HADOOP-4101:
------------------------------------

    Attachment: hadoop-4101.1.patch

Added a JDBC driver for hive. Look at src/contrib/hive/ql/src/test/org/apache/hadoop/hive/ql/jdbc/TestHiveDriver.java for example.

Next steps:
   - provide a hive standalone server 
   - integrate with hive metastore (e.g. support different types)

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644812#action_12644812 ] 

Namit Jain commented on HADOOP-4101:
------------------------------------

Hi Michi, Any updates on this. If you want to meet to discuss in more detail, we can also meet

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643014#action_12643014 ] 

Joydeep Sen Sarma commented on HADOOP-4101:
-------------------------------------------

ok - Ashish just walked us through a couple of scenarios:

- BI tool has server side. in this case the approach in this patch might work - but the concern about setting up classpaths and the suitability of running hadoop code setting classloaders and stuff on the same JVM as the BI server is suspect. At the minimum this has significant integration issues for each BI server.

- BI tool does not have a server side - only a client. I think this is a very common scenario and something which we should try to cover (since the whole premise of hadoop/hive is to avoid spending a lot of money - which is what BI tools with server side will require). In this case - the approach in this patch will be hard to make work because of firewalling issues that i had mentioned in the previous post (even if all the technical issues like hive treatment of windows paths are resolved).

hopefully this captures the issues more accurately.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643036#action_12643036 ] 

Prasad Chakka commented on HADOOP-4101:
---------------------------------------

There is already a MetaStore server (HiveMetaStore.java). It is a thrift service so I am not sure it would fit requirements for JDBC server. If it does, we should add JDBC functionality to this server. 


> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642869#action_12642869 ] 

Jeff Hammerbacher commented on HADOOP-4101:
-------------------------------------------

Nice, Michi! Will poke at this tomorrow.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629267#action_12629267 ] 

Ashish Thusoo commented on HADOOP-4101:
---------------------------------------

completely agree on this. With a jdbc driver the front end integration would be much easier.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645110#action_12645110 ] 

Ashish Thusoo commented on HADOOP-4101:
---------------------------------------

How are you planning to implement the metadata calls. There is a lot of inheritance in the JDBC metadata calls and from what I understand, thrift does not support inheritance.

Also, if you do go the thrift route, it may be better to share the server container code between the metastore and the JDBC driver, the apis I think should be independent and should be kept separate. While reorganizing the code, it may be worthwhile to put the server portion of it in common and then share it between the metastore and service..

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HADOOP-4101:
-------------------------------------

    Attachment: hadoop-4101.2.patch

I am not sure I understand what Ashish meant by 'inheritance in JDBC metadata calls'. The plan is to include metastore.thrift in hive_service.thrift and then hive_service will just forward metadata calls to the metastore code. I guess with inheritance we wouldnt have to implement the forwarding functions. Is this what you mean Ashish?
 
And yes, we should have a single implementation of the thrift server container for HiveServer and Metastore. JDBC would then be a wrapper on top of the thrift hive client.

update: step 4a above has been completed - can now issue queries via HiveClient and retrieve results. HiveServer - a thrift server - actually runs the queries via ql/Driver.  I am attaching the patch with the code for the thrift server/client.

We should meet up to figure out what the plan is for the JDBC client.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch, hadoop-4101.2.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629280#action_12629280 ] 

Raghotham Murthy commented on HADOOP-4101:
------------------------------------------

I had already added it to the roadmap. Regarding the simple jdbc driver, I will submit a patch next week.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643000#action_12643000 ] 

Joydeep Sen Sarma commented on HADOOP-4101:
-------------------------------------------

how are we planning on picking up the hadoop and hive configuration file? (the cli picks them up through the classpath). the same concern applies to jar files (there's configuration in the cli shell script to set it up to include jars in auxlib).

We will need a client-server model. the cli does not, for example, run on cygwin/windows and there are all manner of pathing issues that we would need to fix to make that work. within facebook - we won't even be able to access hdfs directly from windows agents that are outside the secure zone (only http ports are available i believe). i verified from Dhruba that this is the case in yahoo as well. so - we just can't run queries directly from windows machines without a server side that is within the secure zone.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646323#action_12646323 ] 

Ashish Thusoo commented on HADOOP-4101:
---------------------------------------

Type 4 should work I guess.

I guess if you use that then you can sidestep the inheritance stuff that I was alluding to. Basically my concern was that if the server APIs mimicked the javax.sql APIs then inheritance would be a problem.

Can you guys come over tomorrow sometime in the afternoon?


> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch, hadoop-4101.2.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644835#action_12644835 ] 

Prasad Chakka commented on HADOOP-4101:
---------------------------------------

Regarding unused files in metastore, these are the files that got carried over hive prototype which used file based metastore. We left them there in case some one wants to use file based metastore. So in a sense they are useful and there are tests.

I think we should combine the servers now. It will be difficult and time consuming to merge them later. Advanced users can still have two installations of the same server but direct metadata calls to one server and data calls to another server. But the default case, there will be only one server and easier for maintenance.

Only issue I see is that metastore code is independent of ql/cli code. So it might be better to build JDBC server on top of metastore server (ie extend metastore server) and import metastore thrift IDL into service thrift IDL. So the JDBC service would be a superset of metastore functionality.

What do you guys think?

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644829#action_12644829 ] 

Raghotham Murthy commented on HADOOP-4101:
------------------------------------------

Michi and I were discussing this over the weekend. Here's our current thinking about the design. Michi, pls confirm.

1. implement a thrift client/server for hive. for now, the interface consists only of execute and fetch_row. we were able to setup the framework with a thrift server and a java client which talks to the server. next step is to get the server to run the queries. 
notes: we looked at the metastore code and thought it might be simpler to first implement a separate thrift client/server before merging it with the metastore. some installations might want to have separate instances of metastore and hive server. and, its easier to test a smaller interface where we understand the code. also, metastore code seems to have classes which arent being used at all and the scripts to start/stop metastore dont really work in non-facebook installations (need to file separate jiras for those).

2. build a jdbc interface which makes calls to the generated java thrift client. we could also have python and perl dbi interfaces which can be make calls to the generated thrift client code in those languages. so, the thrift interface is a generic interface which is not specific to any particular standard (jdbc/dbi etc).

3. the directory structure in the code would be as follows in src/contrib/hive. it follows a similar model to metastore.

service/if/hive_service.thrift
service/include/<headers from thrift>
service/fb303/<scripts for service_ctrl to manage server>
service/src/gen-javabean/<generated java code>
service/src/gen-php/<generated php>
service/src/gen-py/<generated python>
service/src/gen-perl/<generated perl>
service/src/scripts/<ctrl scripts for server>
service/src/java/org/apache/hadoop/hive/service/HiveServer.java
service/src/java/org/apache/hadoop/hive/service/HiveClient.java
jdbc/src/java/org/apache/hadoop/hive/jdbc/<whatever is in current jdbc patch>
dbi/<perl dbi interface calling service/src/gen-perl>
cli/<changed to use HiveClient or HiveJdbc>

4. next steps
a. get server to run queries and return results to client.
b. move ql/Driver.java to service since the actual running of the query is not really part of the query language.
c. change cli to use the service
d. verify which parts of the metastore interface are needed by jdbc and move/copy over parts to hive_service - i dont think it makes sense to do it the other way around i.e. put the hive service into metastore since metastore is not the right abstraction to actually run queries.
e. there is common thrift code in metastore and service. we should either move it to a seprate thrift directory or make metastore use stuff from service.

It will be good to meet up to discuss them in more detail. I'll let Michi provide a patch for the hive server/client and jdbc wrappers for the hive client.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629269#action_12629269 ] 

Ashish Thusoo commented on HADOOP-4101:
---------------------------------------

Also I wanted to add that we have tried to structure the Driver code in such a way that we follow the execute/fetch paradgm that is followed by JDBC drivers - though admittedly the metadata part of jdbc is harder than the data part.

Also Raghu was looking into creating a simple jdbc driver for hive. We should add that to the hive roadmap wiki.


> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645292#action_12645292 ] 

Michi Mutsuzaki commented on HADOOP-4101:
-----------------------------------------

I was thinking the JDBC driver will be of type 4:

http://en.wikipedia.org/wiki/JDBC_driver#Type_4_Driver_-_Native-Protocol_Driver

which means there is a server <--> client api that is independent of JDBC, and JDBC driver uses the client api. 

We should meet up to make sure we are all on the same page. Ragho, can you set up a meeting?

--Michi

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642994#action_12642994 ] 

Namit Jain commented on HADOOP-4101:
------------------------------------

The Driver API has changed - it is now integrated with the serde and returns a vector<string> instead of vector<vector<string>> wrongly.
That needs to be changed also.

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642581#action_12642581 ] 

Jeff Hammerbacher commented on HADOOP-4101:
-------------------------------------------

Raghu, any progress on the JDBC driver?

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646326#action_12646326 ] 

Michi Mutsuzaki commented on HADOOP-4101:
-----------------------------------------

7pm?

--Michi

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch, hadoop-4101.2.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643400#action_12643400 ] 

Michi Mutsuzaki commented on HADOOP-4101:
-----------------------------------------

I talked about this with Ragho.

- The next step is to separate client from the server.
- I'll check if we can use thrift to implement JDBC server/client.

--Michi

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4101) Support JDBC connections for interoperability between Hive and RDBMS

Posted by "Michi Mutsuzaki (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644837#action_12644837 ] 

Michi Mutsuzaki commented on HADOOP-4101:
-----------------------------------------

Ragho: I confirm.

- I should be able to finish implementing HiveServer.java/HiveClient.java by the end of this week (maybe by Sunday). As Ragho said, right now we have only 2 methods: void execute(String query) and list<String> fetch_row().
- After that, I will modify the JDBC driver to use HiveClient. 
- Command line interface can use either HiveClient or JDBC driver. 
- I'm usually available after 7 on tue-fri.

--Michi

> Support JDBC connections for interoperability between Hive and RDBMS
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4101
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4101
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hive
>            Reporter: YoungWoo Kim
>            Priority: Minor
>         Attachments: hadoop-4101.1.patch
>
>
> In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
> It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
> If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.