You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Something Something <ma...@gmail.com> on 2012/09/17 07:38:46 UTC

Questions about Hive

Note:  I am a newbie to Hive.

Can someone please answer the following questions?

1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
data from the tables in Hive from a Java program?  I heard somewhere that
the data can be accessed with JDBC (style) APIs.  True?

2)  I don't see how I can add indexes on the tables, so does that mean a
query such as the following will trigger a MR job that will search files on
HDFS sequentially?

hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';


3)  Has anyone compared performance of Hive against other NOSQL databases
such as HBase, MongoDB.  I understand it's not exactly apples to apples
comparison, but still...

Thanks.

RE: Questions about Hive

Posted by "Balaraman, Anand" <An...@SYNTELINC.COM>.

Regarding usage of APIs to work on HIVE, here is a tip:

Try using a JDBC connector (like 'hive-jdbc-0.7.1-cdh3u1.jar') as a
plugin in any querying tool such as DbVisualizer.

I am connecting to hive using the above setup as well as using SQL
Explorer plugin in Eclipse.

 

Regards

Anand B

 

From: Something Something [mailto:mailinglists19@gmail.com] 
Sent: 17 September 2012 11:09
To: hive-user@hadoop.apache.org
Subject: Questions about Hive

 

Note:  I am a newbie to Hive.

Can someone please answer the following questions?

1)  Does Hive provide APIs (like HBase does) that can be used to
retrieve data from the tables in Hive from a Java program?  I heard
somewhere that the data can be accessed with JDBC (style) APIs.  True?

2)  I don't see how I can add indexes on the tables, so does that mean a
query such as the following will trigger a MR job that will search files
on HDFS sequentially?




 
hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';


3)  Has anyone compared performance of Hive against other NOSQL
databases such as HBase, MongoDB.  I understand it's not exactly apples
to apples comparison, but still...

Thanks. 


Confidential: This electronic message and all contents contain information from Syntel, Inc. which may be privileged, confidential or otherwise protected from disclosure. The information is intended to be for the addressee only. If you are not the addressee, any disclosure, copy, distribution or use of the contents of this message is prohibited. If you have received this electronic message in error, please notify the sender immediately and destroy the original message and all copies.

Re: Questions about Hive

Posted by Tim Robertson <ti...@gmail.com>.

I don't think Hive is intended for web request scoped operations... that
would be a rather unusual case from my understanding.

HBase sounds more like the Hadoop equivalent that you might be looking for,
but you need to look at your search patterns to see if HBase is a good fit
(you need to manage your own indexes again).

Cheers,
Tim


On Mon, Sep 17, 2012 at 8:07 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Thank you both for the answers.  We are trying to find out if Hive can be
> used as a replacement of Netezza, but if there are no indexes then I don't
> see how it will beat Netezza in terms of performance.  Sounds like it
> certainly can't be used to do a quick lookup from a webapp - like Netezza
> can.
>
> If performance isn't a concern, then I guess it could be a useful tool.
> Will try it out & see how it works out.  Thanks.
>
>
>
> On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson <timrobertson100@gmail.com
> > wrote:
>
>> Note:  I am a newbie to Hive.
>>>
>>> Can someone please answer the following questions?
>>>
>>> 1)  Does Hive provide APIs (like HBase does) that can be used to
>>> retrieve data from the tables in Hive from a Java program?  I heard
>>> somewhere that the data can be accessed with JDBC (style) APIs.  True?
>>>
>>
>> True.
>> https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC
>>
>>
>>> 2)  I don't see how I can add indexes on the tables, so does that mean a
>>> query such as the following will trigger a MR job that will search files on
>>> HDFS sequentially?
>>>
>>> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>>>
>>>
>> There are some index implementations in hive, but it is not as simple as
>> a traditional db.
>> E.g. Search Jira and see some of the work:
>> https://issues.apache.org/jira/browse/HIVE-417
>>
>> You are correct that the above would do a full table scan
>>
>> 3)  Has anyone compared performance of Hive against other NOSQL databases
>>> such as HBase, MongoDB.  I understand it's not exactly apples to apples
>>> comparison, but still...
>>>
>>
>> I think you misunderstand what Hive is.  It is a basically a SQL to MR
>> translation engine, which has adapters for the input source.  By default it
>> uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you
>> can use it to run SQL on HBase tables for example (which works great).
>>  Regarding performance, on the HBase scans, the operation is the same as
>> running a normal HBase MR scan, so is the same.
>>
>>
>>>
>>> Thanks.
>>
>>
>>
>

Re: Questions about Hive

Posted by Something Something <ma...@gmail.com>.

Thank you both for the answers.  We are trying to find out if Hive can be
used as a replacement of Netezza, but if there are no indexes then I don't
see how it will beat Netezza in terms of performance.  Sounds like it
certainly can't be used to do a quick lookup from a webapp - like Netezza
can.

If performance isn't a concern, then I guess it could be a useful tool.
Will try it out & see how it works out.  Thanks.


On Sun, Sep 16, 2012 at 10:51 PM, Tim Robertson
<ti...@gmail.com>wrote:

> Note:  I am a newbie to Hive.
>>
>> Can someone please answer the following questions?
>>
>> 1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
>> data from the tables in Hive from a Java program?  I heard somewhere that
>> the data can be accessed with JDBC (style) APIs.  True?
>>
>
> True.
> https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC
>
>
>> 2)  I don't see how I can add indexes on the tables, so does that mean a
>> query such as the following will trigger a MR job that will search files on
>> HDFS sequentially?
>>
>> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>>
>>
> There are some index implementations in hive, but it is not as simple as a
> traditional db.
> E.g. Search Jira and see some of the work:
> https://issues.apache.org/jira/browse/HIVE-417
>
> You are correct that the above would do a full table scan
>
> 3)  Has anyone compared performance of Hive against other NOSQL databases
>> such as HBase, MongoDB.  I understand it's not exactly apples to apples
>> comparison, but still...
>>
>
> I think you misunderstand what Hive is.  It is a basically a SQL to MR
> translation engine, which has adapters for the input source.  By default it
> uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you
> can use it to run SQL on HBase tables for example (which works great).
>  Regarding performance, on the HBase scans, the operation is the same as
> running a normal HBase MR scan, so is the same.
>
>
>>
>> Thanks.
>
>
>

Re: Questions about Hive

Posted by Tim Robertson <ti...@gmail.com>.

>
> Note:  I am a newbie to Hive.
>
> Can someone please answer the following questions?
>
> 1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
> data from the tables in Hive from a Java program?  I heard somewhere that
> the data can be accessed with JDBC (style) APIs.  True?
>

True.
https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC


> 2)  I don't see how I can add indexes on the tables, so does that mean a
> query such as the following will trigger a MR job that will search files on
> HDFS sequentially?
>
> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>
>
There are some index implementations in hive, but it is not as simple as a
traditional db.
E.g. Search Jira and see some of the work:
https://issues.apache.org/jira/browse/HIVE-417

You are correct that the above would do a full table scan

3)  Has anyone compared performance of Hive against other NOSQL databases
> such as HBase, MongoDB.  I understand it's not exactly apples to apples
> comparison, but still...
>

I think you misunderstand what Hive is.  It is a basically a SQL to MR
translation engine, which has adapters for the input source.  By default it
uses simple files on the HDFS, but there is (e.g.) HBase adapters, so you
can use it to run SQL on HBase tables for example (which works great).
 Regarding performance, on the HBase scans, the operation is the same as
running a normal HBase MR scan, so is the same.


>
> Thanks.

Re: Questions about Hive

Posted by Ricky Saltzer <ri...@cloudera.com>.

Yes,  Hive is meant for batch processing on a very large data set. It's
very latent when compared to other "databases" such as,  MySQL,  but excels
where other databases faulter. For example,  running analysis on several
terabytes of data is not unusual in Hive.

It was mentioned to consider HBase, be sure to understand that this is a
"NoSQL"  database,  and so you will need to re-think a lot of application
logic if it relied on SQL beforehand.

Ricky
 On Sep 17, 2012 6:29 PM, "MiaoMiao" <li...@gmail.com> wrote:

> I believe Hive is not for web users, since it takes several minutes or
> even hours to do one query. But I managed to provide a web service via
> THRIFT and php.
> http://nousefor.net/55/2011/12/php/hbase-and-hive-thrift-php-client/
> On Mon, Sep 17, 2012 at 10:42 PM, Hamilton, Robert (Austin)
> <ro...@hp.com> wrote:
> > Hello, something J
> >
> > Regarding jdbc style: I understand this approach has some limitations,
> but
> > here is an example.
> >
> > You will need to make sure the hive service is running:
> > https://cwiki.apache.org/Hive/hiveserver.html
> >
> > Here is a sample code that I’ve used for testing. It is not the best
> java in
> > the world but it gets the job done.
> >
> > You will need to make sure the hive and hadoop jars are on the classpath.
> > Note you will have to edit the connectionString.
> >
> >
> >
> >
> >
> > import java.sql.*;
> >
> >
> >
> > public class RunSQL {
> >
> >    private static String driverName =
> > "org.apache.hadoop.hive.jdbc.HiveDriver";
> >
> >    private static String connectionString =
> > "jdbc:hive://myserver.hp.com:10000/default";
> >
> >
> >
> >     public static void main(String[] args) throws SQLException
> > ,org.apache.hadoop.hive.ql.metadata.HiveException {
> >
> >
> >
> >         String SQLToRun=(args[0]);
> >
> >
> >
> >         ResultSet res = null;
> >
> >
> >
> >         try {
> >
> >             Class.forName(driverName);
> >
> >         } catch (ClassNotFoundException e) {
> >
> >             e.printStackTrace();
> >
> >             System.exit(1);
> >
> >           }
> >
> >         Connection con = DriverManager.getConnection(connectionString);
> >
> >         System.out.println("Connected.");
> >
> >
> >
> >         Statement stmt = con.createStatement();
> >
> >
> >
> >         System.out.println("Running: " + SQLToRun);
> >
> >         res = stmt.executeQuery(SQLToRun);
> >
> >         ResultSetMetaData meta=res.getMetaData();
> >
> >         int numberOfColumns=meta.getColumnCount();
> >
> >
> >
> >         System.out.println("Result:");
> >
> >         while (res.next()) {
> >
> >                 for (int i=1;i<=numberOfColumns;i++){
> >
> >                     System.out.print(String.valueOf("\t" +
> > res.getString(i)));
> >
> >                 }
> >
> >                 System.out.println();
> >
> >         }
> >
> >
> >
> >     }
> >
> > }
> >
> >
> >
> > From: Something Something [mailto:mailinglists19@gmail.com]
> > Sent: Monday, September 17, 2012 12:39 AM
> >
> >
> > To: hive-user@hadoop.apache.org
> > Subject: Questions about Hive
> >
> >
> >
> > Note:  I am a newbie to Hive.
> >
> >
> >
> > Can someone please answer the following questions?
> >
> > 1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
> > data from the tables in Hive from a Java program?  I heard somewhere that
> > the data can be accessed with JDBC (style) APIs.  True?
> >
> > 2)  I don't see how I can add indexes on the tables, so does that mean a
> > query such as the following will trigger a MR job that will search files
> on
> > HDFS sequentially?
> >
> >
> >
> >
> > hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
> >
> >
> > 3)  Has anyone compared performance of Hive against other NOSQL databases
> > such as HBase, MongoDB.  I understand it's not exactly apples to apples
> > comparison, but still...
> >
> > Thanks.
>

Re: Questions about Hive

Posted by MiaoMiao <li...@gmail.com>.

I believe Hive is not for web users, since it takes several minutes or
even hours to do one query. But I managed to provide a web service via
THRIFT and php.
http://nousefor.net/55/2011/12/php/hbase-and-hive-thrift-php-client/
On Mon, Sep 17, 2012 at 10:42 PM, Hamilton, Robert (Austin)
<ro...@hp.com> wrote:
> Hello, something J
>
> Regarding jdbc style: I understand this approach has some limitations, but
> here is an example.
>
> You will need to make sure the hive service is running:
> https://cwiki.apache.org/Hive/hiveserver.html
>
> Here is a sample code that I’ve used for testing. It is not the best java in
> the world but it gets the job done.
>
> You will need to make sure the hive and hadoop jars are on the classpath.
> Note you will have to edit the connectionString.
>
>
>
>
>
> import java.sql.*;
>
>
>
> public class RunSQL {
>
>    private static String driverName =
> "org.apache.hadoop.hive.jdbc.HiveDriver";
>
>    private static String connectionString =
> "jdbc:hive://myserver.hp.com:10000/default";
>
>
>
>     public static void main(String[] args) throws SQLException
> ,org.apache.hadoop.hive.ql.metadata.HiveException {
>
>
>
>         String SQLToRun=(args[0]);
>
>
>
>         ResultSet res = null;
>
>
>
>         try {
>
>             Class.forName(driverName);
>
>         } catch (ClassNotFoundException e) {
>
>             e.printStackTrace();
>
>             System.exit(1);
>
>           }
>
>         Connection con = DriverManager.getConnection(connectionString);
>
>         System.out.println("Connected.");
>
>
>
>         Statement stmt = con.createStatement();
>
>
>
>         System.out.println("Running: " + SQLToRun);
>
>         res = stmt.executeQuery(SQLToRun);
>
>         ResultSetMetaData meta=res.getMetaData();
>
>         int numberOfColumns=meta.getColumnCount();
>
>
>
>         System.out.println("Result:");
>
>         while (res.next()) {
>
>                 for (int i=1;i<=numberOfColumns;i++){
>
>                     System.out.print(String.valueOf("\t" +
> res.getString(i)));
>
>                 }
>
>                 System.out.println();
>
>         }
>
>
>
>     }
>
> }
>
>
>
> From: Something Something [mailto:mailinglists19@gmail.com]
> Sent: Monday, September 17, 2012 12:39 AM
>
>
> To: hive-user@hadoop.apache.org
> Subject: Questions about Hive
>
>
>
> Note:  I am a newbie to Hive.
>
>
>
> Can someone please answer the following questions?
>
> 1)  Does Hive provide APIs (like HBase does) that can be used to retrieve
> data from the tables in Hive from a Java program?  I heard somewhere that
> the data can be accessed with JDBC (style) APIs.  True?
>
> 2)  I don't see how I can add indexes on the tables, so does that mean a
> query such as the following will trigger a MR job that will search files on
> HDFS sequentially?
>
>
>
>
> hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';
>
>
> 3)  Has anyone compared performance of Hive against other NOSQL databases
> such as HBase, MongoDB.  I understand it's not exactly apples to apples
> comparison, but still...
>
> Thanks.

RE: Questions about Hive

Posted by "Hamilton, Robert (Austin)" <ro...@hp.com>.

Hello, something :)
Regarding jdbc style: I understand this approach has some limitations, but here is an example.
You will need to make sure the hive service is running: https://cwiki.apache.org/Hive/hiveserver.html
Here is a sample code that I've used for testing. It is not the best java in the world but it gets the job done.
You will need to make sure the hive and hadoop jars are on the classpath. Note you will have to edit the connectionString.


import java.sql.*;

public class RunSQL {
   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
   private static String connectionString = "jdbc:hive://myserver.hp.com:10000/default";

    public static void main(String[] args) throws SQLException ,org.apache.hadoop.hive.ql.metadata.HiveException {

        String SQLToRun=(args[0]);

        ResultSet res = null;

        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
            System.exit(1);
          }
        Connection con = DriverManager.getConnection(connectionString);
        System.out.println("Connected.");

        Statement stmt = con.createStatement();

        System.out.println("Running: " + SQLToRun);
        res = stmt.executeQuery(SQLToRun);
        ResultSetMetaData meta=res.getMetaData();
        int numberOfColumns=meta.getColumnCount();

        System.out.println("Result:");
        while (res.next()) {
                for (int i=1;i<=numberOfColumns;i++){
                    System.out.print(String.valueOf("\t" + res.getString(i)));
                }
                System.out.println();
        }

    }
}

From: Something Something [mailto:mailinglists19@gmail.com]
Sent: Monday, September 17, 2012 12:39 AM
To: hive-user@hadoop.apache.org
Subject: Questions about Hive

Note:  I am a newbie to Hive.

Can someone please answer the following questions?

1)  Does Hive provide APIs (like HBase does) that can be used to retrieve data from the tables in Hive from a Java program?  I heard somewhere that the data can be accessed with JDBC (style) APIs.  True?

2)  I don't see how I can add indexes on the tables, so does that mean a query such as the following will trigger a MR job that will search files on HDFS sequentially?





hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

3)  Has anyone compared performance of Hive against other NOSQL databases such as HBase, MongoDB.  I understand it's not exactly apples to apples comparison, but still...

Thanks.