You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Hive QA (JIRA)" <ji...@apache.org> on 2018/08/09 13:24:00 UTC

[jira] [Commented] (HIVE-19587) HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster

    [ https://issues.apache.org/jira/browse/HIVE-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574831#comment-16574831 ] 

Hive QA commented on HIVE-19587:
--------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m 27s{color} | {color:blue} ql in master has 2305 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m  0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 26s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-13124/dev-support/hive-personality.sh |
| git revision | master / 873d31f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-13124/yetus.txt |
| Powered by | Apache Yetus    http://yetus.apache.org |


This message was automatically generated.



> HeartBeat thread uses cancelled delegation token while connecting to meta on KERBEROS cluster
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19587
>                 URL: https://issues.apache.org/jira/browse/HIVE-19587
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0, 2.3.2, 3.1.0
>            Reporter: Oleksiy Sayankin
>            Assignee: Oleksiy Sayankin
>            Priority: Blocker
>         Attachments: HIVE-19587.1.patch
>
>
> *STEP 1. Create test data*
> {code}
> create table t1 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ",";
> create table t2 (id int) ROW FORMAT DELIMITED FIELDS TERMINATED BY  ",";
> {code}
> Generate 10 000 000 lines of random data
> {code}
> package com.test.app;
> import java.io.FileNotFoundException;
> import java.io.PrintWriter;
> import java.util.concurrent.ThreadLocalRandom;
> public class App {
>   public static void main(String[] args) throws FileNotFoundException {
>     try (PrintWriter out = new PrintWriter("table.data");) {
>       int min = 0;
>       int max = 10_000;
>       int numRows = 10_000_000;
>       for (int i = 0; i <= numRows - 1; i++){
>         int randomNum = ThreadLocalRandom.current().nextInt(min, max + 1);
>         out.println(randomNum);
>       }
>     }
>   }
> }
> {code}
> Upload data to Hive tables
> {code}
> load data local inpath '/home/myuser/table.data' into table t1;
> load data local inpath '/home/myuser/table.data' into table t2;
> {code}
> *STEP 2. Configure transactions in hive-site.xml*
> {code}
> <!-- Transactions -->
>    <property>
>       <name>hive.exec.dynamic.partition.mode</name>
>       <value>nonstrict</value>
>    </property>
>    <property>
>       <name>hive.support.concurrency</name>
>       <value>true</value>
>    </property>
>    <property>
>       <name>hive.enforce.bucketing</name>
>       <value>true</value>
>    </property>
>    <property>
>       <name>hive.txn.manager</name>
>       <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
>    </property>
>    <property>
>       <name>hive.compactor.initiator.on</name>
>       <value>true</value>
>    </property>
>    <property>
>       <name>hive.compactor.worker.threads</name>
>       <value>1</value>
>    </property>
> {code}
> *STEP 3. Configure hive.txn.timeout in hive-site.xml*
> {code}
> <!-- Delegation token issue -->
>    <property>
>       <name>hive.txn.timeout</name>
>       <value>10s</value>
>    </property>
> {code}
> *STEP 4. Connect via beeline to HS2 with KERBEROS*
> {code}
> !connect jdbc:hive2://node8.cluster:10000/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore
> {code}
> {code}
> select count(*) from t1;
> {code}
> *STEP 5. Close connection and reconnect*
> {code}
> !close
> {code}
> {code}
> !connect jdbc:hive2://node8.cluster:10000/default;principal=myuser/node8.cluster@NODE8;ssl=true;sslTrustStore=/opt/myuser/conf/ssl_truststore
> {code}
> *STEP 6. Perform long playing query*
> This query lasts about 600s
> {code}
> select count(*) from t1 join t2 on t1.id = t2.id;
> {code}
> *EXPECTED RESULT*
> Query finishes successfully
> *ACTUAL RESULT*
> {code}
> 2018-05-17T13:54:54,921 ERROR [pool-7-thread-10] transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) 
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) 
>         at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>         at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) 
>         at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660) 
>         at java.security.AccessController.doPrivileged(Native Method) 
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) 
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, maxDate=1527170029297, sequenceNumber=1, masterKeyId=1
>         at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:104) 
>         at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:472)
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:503) 
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>         ... 15 more
> 2018-05-17T13:54:54,966 ERROR [pool-7-thread-10] server.TThreadPoolServer: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: DIGEST-MD5: IO error acquiring password
>         at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660)
>         at java.security.AccessController.doPrivileged(Native Method) 
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) 
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748) 
> Caused by: org.apache.thrift.transport.TTransportException: DIGEST-MD5: IO error acquiring password
>         at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232) 
>         at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316) 
>         at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) 
>         at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) 
>         ... 10 more
> {code}
> *ROOT-CAUSE*
> HS2 uses cancelled delegation token to connect to metastore. From logs
> 1. Metastore creates delegation token (issueDate=1526565229297)
> {code}
> 2018-05-17T13:53:49,297  INFO [pool-7-thread-3] delegation.AbstractDelegationTokenSecretManager: Creating password for identifier: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, m$
> 2018-05-17T13:53:50,789  INFO [Thread-18] txn.TxnHandler: 'Cleaner' locked by 'node8.cluster'
> 2018-05-17T13:53:50,793  INFO [Thread-18] txn.TxnHandler: 'Cleaner' unlocked by 'node8.cluster'
> 2018-05-17T13:53:54,840  INFO [pool-7-thread-4] metastore.HiveMetaStore: 3: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
> {code}
> 2. Metastore cancels delegation token (issueDate=1526565229297)
> {code}
> 2018-05-17T13:54:32,380  INFO [pool-7-thread-4] thrift.TokenStoreDelegationTokenSecretManager: Token cancelation requested for identifier: owner=mapr, renewer=mapr, realUser=, issueDate=1526565229297, maxDate=1527170029297,$
> 2018-05-17T13:54:32,416  INFO [pool-7-thread-4] metastore.HiveMetaStore: 3: Cleaning up thread local RawStore...
> {code}
> 3. HS2 uses canceled delegation token and crashes (issueDate=1526565229297):
> {code}
> 2018-05-17T13:54:54,921 ERROR [pool-7-thread-10] transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) 
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) 
>         at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283)
>         at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) 
>         at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:663) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:660) 
>         at java.security.AccessController.doPrivileged(Native Method) 
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1613) 
>         at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:660) 
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: owner=myuser, renewer=myuser, realUser=, issueDate=1526565229297, maxDate=1527170029297, sequenceNumber=1, masterKeyId=1
>         at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:104)
> {code}
> *WORKAROUND*
> From Hive sources:
> {code}
> heartBeatInterval = HIVE_TXN_TIMEOUT / 2.
> {code}
> So to skip HB during job one needs to set
> {code}
> HIVE_TXN_TIMEOUT = (max_job_duration * 2) * 1.25.
> {code}
> Here I added 1.25 factor to be sure there is no HB. In my case max_job_duration = ~600s, then 
> {code}
> HIVE_TXN_TIMEOUT = 600s * 2 * 1.25 = 1500s
> {code}
> When I {{set hive.txn.timeout}} = 1500s issue was gone on my local env.
> FYI [~isnogood], [~ashutoshc]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)