You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Amit Bajpai <Am...@flextronics.com> on 2016/07/15 05:22:18 UTC

Yarn Application ID for Hive query

Hi,

I am using the below python program to run a hive query. How can I get the Yarn application ID using the python program for the hive query execution.

import pyhs2

with pyhs2.connect(host='abc.sac.com',
               port=10000,
               authMechanism="PLAIN",
               user='amit',
               password='amit',
               database='default') as conn:
    with conn.cursor() as cur:
        #Execute query
        cur.execute("SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID")

        #Fetch table results
        for i in cur.fetch():
            print i

Thanks
Amit


Legal Disclaimer:
The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!

RE: Yarn Application ID for Hive query

Posted by Amit Bajpai <Am...@flextronics.com>.
I am running hive on Tez. I am able to get the Yarn application ID for the hive query by submitting the query through Hive JDBC and using HiveStatement.

Connection con = DriverManager.getConnection("jdbc:hive2://abc:10000/default","xyz", "");
HiveStatement stmt = (HiveStatement) con.createStatement();
String sql = " SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID ";
ResultSet res = stmt.executeQuery(sql);
String yarn_app_id = new String();

for (String log : stmt.getQueryLog()) {
if (log.contains("App id")){
                yarn_app_id = log.substring(log.indexOf("App id") +7, log.length()-1);
}
}

System.out.println("YARN Application ID: " + yarn_app_id);

Now I am trying to find the Tez DAG ID for the query.


From: Gerber, Bryan W [mailto:Bryan.Gerber@pnnl.gov]
Sent: Monday, July 18, 2016 1:47 PM
To: user@hive.apache.org
Subject: RE: Yarn Application ID for Hive query

Making Hive look like a normal SQL database is the goal of libraries like this, so it make sense that that abstraction wouldn't leak a concept like application ID. Especially because not all Hive queries generate a YARN application.

That said, we went through this with JDBC access to Hive a while back to allow our user interface to cancel a query. Only relevant discussion I found was here: http://grokbase.com/t/cloudera/hue-user/1373c258xg/how-hue-beeswax-is-able-to-read-the-hadoop-job-id-that-gets-generated-by-hiveserver2

We are using this method, plus a background task that polls the YARN resource manager API to find the job with the corresponding hive.session.id. It is a lot of work for something that seems very simple. It would be nice to have access to a command or API call in HiveServer2 similar to MySQL's "SHOW PROCESSLIST" (and equivalent commands in most other databases).

From: Amit Bajpai [mailto:Amit.Bajpai@flextronics.com]
Sent: Thursday, July 14, 2016 10:22 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Yarn Application ID for Hive query

Hi,

I am using the below python program to run a hive query. How can I get the Yarn application ID using the python program for the hive query execution.

import pyhs2

with pyhs2.connect(host='abc.sac.com',
               port=10000,
               authMechanism="PLAIN",
               user='amit',
               password='amit',
               database='default') as conn:
    with conn.cursor() as cur:
        #Execute query
        cur.execute("SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID")

        #Fetch table results
        for i in cur.fetch():
            print i

Thanks
Amit


Legal Disclaimer:
The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!

Legal Disclaimer:
The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!

Re: Yarn Application ID for Hive query

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> be nice to have access to a command or API call in HiveServer2 similar
>to MySQL¹s ³SHOW PROCESSLIST² (and equivalent commands in most other
>databases).
 

There is one - if you have the HiveServer2 UI (in 2.0), that can be seen.

It would take 10-15 line JSP script to export that as a JSON API.

The reason that's not very interesting is that single machine information
like the MYSQL one is useless in a properly configured HA environment for
Hive.

Cheers,
Gopal



RE: Yarn Application ID for Hive query

Posted by "Gerber, Bryan W" <Br...@pnnl.gov>.
Making Hive look like a normal SQL database is the goal of libraries like this, so it make sense that that abstraction wouldn't leak a concept like application ID. Especially because not all Hive queries generate a YARN application.

That said, we went through this with JDBC access to Hive a while back to allow our user interface to cancel a query. Only relevant discussion I found was here: http://grokbase.com/t/cloudera/hue-user/1373c258xg/how-hue-beeswax-is-able-to-read-the-hadoop-job-id-that-gets-generated-by-hiveserver2

We are using this method, plus a background task that polls the YARN resource manager API to find the job with the corresponding hive.session.id. It is a lot of work for something that seems very simple. It would be nice to have access to a command or API call in HiveServer2 similar to MySQL's "SHOW PROCESSLIST" (and equivalent commands in most other databases).

From: Amit Bajpai [mailto:Amit.Bajpai@flextronics.com]
Sent: Thursday, July 14, 2016 10:22 PM
To: user@hive.apache.org
Subject: Yarn Application ID for Hive query

Hi,

I am using the below python program to run a hive query. How can I get the Yarn application ID using the python program for the hive query execution.

import pyhs2

with pyhs2.connect(host='abc.sac.com',
               port=10000,
               authMechanism="PLAIN",
               user='amit',
               password='amit',
               database='default') as conn:
    with conn.cursor() as cur:
        #Execute query
        cur.execute("SELECT COMP_ID, COUNT(1) FROM tableA GROUP BY COMP_ID")

        #Fetch table results
        for i in cur.fetch():
            print i

Thanks
Amit


Legal Disclaimer:
The information contained in this message may be privileged and confidential. It is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete or destroy any copy of this message!