You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Bill Graham (JIRA)" <ji...@apache.org> on 2009/06/12 20:26:07 UTC

[jira] Created: (HIVE-559) Support JDBC ResultSetMetata

Support JDBC ResultSetMetata
----------------------------

                 Key: HIVE-559
                 URL: https://issues.apache.org/jira/browse/HIVE-559
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Clients
            Reporter: Bill Graham


Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html

The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 

1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.

2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.

Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-559:
--------------------------

    Issue Type: Sub-task  (was: New Feature)
        Parent: HIVE-576

> Support JDBC ResultSetMetadata
> ------------------------------
>
>                 Key: HIVE-559
>                 URL: https://issues.apache.org/jira/browse/HIVE-559
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Clients
>            Reporter: Bill Graham
>            Assignee: Min Zhou
>
> Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
> http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
> The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 
> 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.
> 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.
> Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HIVE-559) Support JDBC ResultSetMetadata

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou reassigned HIVE-559:
-----------------------------

    Assignee: Min Zhou

> Support JDBC ResultSetMetadata
> ------------------------------
>
>                 Key: HIVE-559
>                 URL: https://issues.apache.org/jira/browse/HIVE-559
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Clients
>            Reporter: Bill Graham
>            Assignee: Min Zhou
>
> Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
> http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
> The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 
> 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.
> 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.
> Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated HIVE-559:
-----------------------------

    Summary: Support JDBC ResultSetMetadata  (was: Support JDBC ResultSetMetata)

> Support JDBC ResultSetMetadata
> ------------------------------
>
>                 Key: HIVE-559
>                 URL: https://issues.apache.org/jira/browse/HIVE-559
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Clients
>            Reporter: Bill Graham
>
> Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
> http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
> The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 
> 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.
> 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.
> Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: [jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

Posted by wuxy <wu...@huawei.com>.

I found following section at the end of chapter 6 of the book <Hadoop, the
definitive guide>,  
--------------------
'Task side-effect files';
"Care needs to be taken to ensure that multiple instances of the same task
don't try to
write to the same file. There are two problems to avoid: if a task failed
and was retried,
then the old partial output would still be present when the second task ran,
and it would
have to delete the old file first. Second, with speculative execution
enabled, two instances
of the same task could try to write to the same file simultaneously." 
-----------------------
In the description: "two instances of the same task could try to write to
the same file simultaneously" is a case should be avoided.
Can anyone confirm this for me, and if possible, tell me the reason below
behind it. 
Thanks.

Steven. Wu

[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-559:
--------------------------------

    Affects Version/s: 0.5.0

> Support JDBC ResultSetMetadata
> ------------------------------
>
>                 Key: HIVE-559
>                 URL: https://issues.apache.org/jira/browse/HIVE-559
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Drivers
>    Affects Versions: 0.5.0
>            Reporter: Bill Graham
>            Assignee: Min Zhou
>
> Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
> http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
> The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 
> 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.
> 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.
> Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-559) Support JDBC ResultSetMetadata

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-559:
--------------------------------

    Component/s: Drivers
                     (was: Clients)

> Support JDBC ResultSetMetadata
> ------------------------------
>
>                 Key: HIVE-559
>                 URL: https://issues.apache.org/jira/browse/HIVE-559
>             Project: Hadoop Hive
>          Issue Type: Sub-task
>          Components: Drivers
>    Affects Versions: 0.5.0
>            Reporter: Bill Graham
>            Assignee: Min Zhou
>
> Support ResultSetMetadata for JDBC ResultSets. The getColumn* methods would be particularly useful I'd expect:
> http://java.sun.com/javase/6/docs/api/java/sql/ResultSetMetaData.html
> The challenge as I see it though, is that the JDBC client only has access to the raw query string and the result data when running in standalone mode. Therefore, it will need to get the column metadata one of two way: 
> 1. By parsing the query to determine the tables/columns involved and then making a request to the metastore to get the metadata for the columns. This certainly feels like duplicate work, since the query of course gets properly parsed on the server.
> 2. By returning the column metadata from the server. My thrift knowledge is limited, but I suspect adding this to the response would present other challenges.
> Any thoughts or suggestions? Option #1 feels clunkier, yet safer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.