You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by GitBox <gi...@apache.org> on 2019/03/05 06:54:24 UTC

[GitHub] [flume] NiXuebing edited a comment on issue #206: FLUME-2956 - hive sink not sending heartbeat correctly

NiXuebing edited a comment on issue #206: FLUME-2956 - hive sink not sending heartbeat correctly
URL: https://github.com/apache/flume/pull/206#issuecomment-469562180
 
 
   > > 我看HiveSink.java里`setupHeartBeatTimer()`的作用也仅仅是将`timeToSendHeartBeat`设置为`true`,实际触发心跳还是得等消息来时的flush。如果长时间没有消息,事务还是会自动断开。
   > > 我的处理是`HiveSink.setupHeartBeatTimer()`里直接加上`writer.heartBeat()`,不知道这样有没有问题 @hejiang2000
   > 
   > 最好不要这么做。setupHeartBeatTimer()是在异步定时通过TCP发送这个心跳信息(调用txnBatch.heartbeat()),所以有可能和我们在同一个TCP上发送数据的操作(txnBatch.commit() 等)发生冲突,导致问题。@NiXuebing
   
   我发现我的问题是`drainOneBatch(Channel channel)`中循环`batchSize`处理event和write的耗时太久,目前设指定`sink.batchSize`=10000,处理时间都在300秒左右,再进行`flush`时事务已经被断开了。
   
   `2019-03-05 14:43:20,543 INFO org.apache.flume.sink.hive.HiveSink: batch event write = 277142
   
   2019-03-05 14:43:20,543 INFO org.apache.flume.sink.hive.HiveWriter: Committing Txn 205240 on EndPoint: {metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   
   2019-03-05 14:48:36,292 INFO org.apache.flume.sink.hive.HiveSink: batch event write = 315524
   
   2019-03-05 14:48:36,293 INFO org.apache.flume.sink.hive.HiveWriter: Sending heartbeat on batch TxnIds=[205240...205299] on endPoint = {metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   
   2019-03-05 14:48:37,124 WARN org.apache.flume.sink.hive.HiveWriter: Unable to send heartbeat on Txn Batch TxnIds=[205240...205299] on endPoint = {metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   org.apache.hive.hcatalog.streaming.HeartBeatFailure: Heart beat error. InvalidTxns: [205243, 205242, 205247, 205246, 205245, 205244, 205251, 205250, 205249, 205248, 205255, 205254, 205253, 205252, 205259, 205258, 205257, 205256, 205263, 205262, 205261, 205260, 205267, 205266, 205265, 205264, 205271, 205270, 205269, 205268, 205275, 205274, 205273, 205272, 205279, 205278, 205277, 205276, 205283, 205282, 205281, 205280, 205287, 205286, 205285, 205284, 205291, 205290, 205289, 205288, 205295, 205294, 205293, 205292, 205299, 205298, 205297, 205296]. AbortedTxns: [205241]
   	at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.heartbeat(HiveEndPoint.java:953)
   	at org.apache.flume.sink.hive.HiveWriter$2.call(HiveWriter.java:240)
   	at org.apache.flume.sink.hive.HiveWriter$2.call(HiveWriter.java:236)
   	at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:431)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   2019-03-05 14:48:37,133 INFO org.apache.flume.sink.hive.HiveWriter: Committing Txn 205241 on EndPoint: {metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }
   
   2019-03-05 14:48:37,209 ERROR org.apache.hive.hcatalog.streaming.HiveEndPoint: Fatal error on TxnIds=[205240...205299] on endPoint = {metaStoreUri='thrift://hdfs-master01:9083', database='prod_ad_rds', table='startup_shutdown', partitionVals=[2019-03-05,  2019-03-05-14] }; cause Unable to abort invalid transaction id : 205241: No such transaction txnid:205241
   org.apache.hive.hcatalog.streaming.TransactionError: Unable to abort invalid transaction id : 205241: No such transaction txnid:205241
   	at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.abortImpl(HiveEndPoint.java:936)
   	at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.abort(HiveEndPoint.java:894)
   	at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.markDead(HiveEndPoint.java:753)
   	at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.commit(HiveEndPoint.java:853)
   	at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:346)
   	at org.apache.flume.sink.hive.HiveWriter$6.call(HiveWriter.java:343)
   	at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:431)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: NoSuchTxnException(message:No such transaction txnid:205241)
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result$abort_txn_resultStandardScheme.read(ThriftHiveMetastore.java)
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result$abort_txn_resultStandardScheme.read(ThriftHiveMetastore.java)
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$abort_txn_result.read(ThriftHiveMetastore.java)
   	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_abort_txn(ThriftHiveMetastore.java:4484)
   	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.abort_txn(ThriftHiveMetastore.java:4471)
   	`
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services