You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/03/24 06:32:52 UTC

[GitHub] [incubator-doris] weizuo93 opened a new issue #5562: Message "failed to call frontend service" is returned when stream load

weizuo93 opened a new issue #5562:
URL: https://github.com/apache/incubator-doris/issues/5562


   Description:
   There are 3 BE and 1 FE in doris cluster. The number of replica is 3. If one BE is stoped,  the stream load will return the following result:
   ```
   {
       "TxnId": 34149,
       "Label": "1616552269",
       "Status": "Fail",
       "Message": "failed to call frontend service",
       "NumberTotalRows": 200,
       "NumberLoadedRows": 200,
       "NumberFilteredRows": 0,
       "NumberUnselectedRows": 0,
       "LoadBytes": 8935,
       "LoadTimeMs": 23154,
       "BeginTxnTimeMs": 0,
       "StreamLoadPutTimeMs": 3,
       "ReadDataTimeMs": 0,
       "WriteDataTimeMs": 149,
       "CommitAndPublishTimeMs": 0
   }
   
   ```
   
   FE log is as follow:
   ```
   2021-03-24 10:17:49,818 INFO (thrift-server-pool-5|133) [DatabaseTransactionMgr.beginTransaction():295] begin transaction: txn id 34149 with label 1616552269 from coordinator BE: 10.38.167.158, listner id: -1
   2021-03-24 10:17:49,984 INFO (thrift-server-pool-5|133) [DatabaseTransactionMgr.commitTransaction():559] transaction:[TransactionState. transaction id: 34149, label: 1616552269, db id: 11001, table id list: 11012, callback id: -1, coordinator: BE: 10.38.167.158, transaction status: COMMITTED, error replicas num: 100, replica ids: 11265,11269,11273,11017,11277, prepare time: 1616552269818, commit time: 1616552269980, finish time: -1, reason: ] successfully committed
   2021-03-24 10:17:49,986 INFO (PUBLISH_VERSION|19) [PublishVersionDaemon.publishVersion():131] send publish tasks for transaction: 34149
   2021-03-24 10:18:13,071 WARN (thrift-server-pool-5|133) [FrontendServiceImpl.loadTxnRollback():848] failed to rollback txn 34149: errCode = 2, detailMessage = transaction's state is already COMMITTED, could not abort
   2021-03-24 10:18:19,995 INFO (PUBLISH_VERSION|19) [DatabaseTransactionMgr.finishTransaction():826] finish transaction TransactionState. transaction id: 34149, label: 1616552269, db id: 11001, table id list: 11012, callback id: -1, coordinator: BE: 10.38.167.158, transaction status: VISIBLE, error replicas num: 100, replica ids: 11265,11269,11273,11017,11277, prepare time: 1616552269818, commit time: 1616552269980, finish time: 1616552299994, reason:  successfully
   2021-03-24 11:18:20,995 INFO (txnCleaner|56) [DatabaseTransactionMgr.removeExpiredTxns():1087] transaction [34149] is expired, remove it from transaction manager
   ```
   
   Coordinator BE log is as follow:
   ```
   I0324 10:17:49.820899 155649 stream_load_executor.cpp:50] begin to execute job. label=1616552269, txn_id=34149, query_id=a941b0d30a337020-c6406f0699945b82
   I0324 10:17:49.820930 155649 plan_fragment_executor.cpp:76] Prepare(): query_id=a941b0d30a337020-c6406f0699945b82 fragment_instance_id=a941b0d30a337020-c6406f0699945b83 backend_num=0
   I0324 10:17:49.820993 155649 plan_fragment_executor.cpp:138] Using query memory limit: 2.00 GB
   W0324 10:18:00.050282 155649 thrift_rpc_helper.cpp:66] retrying call frontend service after 1000 ms, address=TNetworkAddress(hostname=10.38.163.97, port=19020), reason=THRIFT_EAGAIN (timed out)
   W0324 10:18:11.050664 155649 thrift_rpc_helper.cpp:79] call frontend service failed, address=TNetworkAddress(hostname=10.38.163.97, port=19020), reason=THRIFT_EAGAIN (timed out)
   W0324 10:18:13.051308 155649 stream_load.cpp:116] handle streaming load failed, id=a941b0d30a337020-c6406f0699945b82, errmsg=failed to call frontend service
   ```
   
   According to log, I find that BE called `commit txn rpc`, then FE received and finished transaction commit, but BE didn't receive the return of `commit txn rpc`successfully. From BE's point of view, `commit txn rpc` is failed and then BE called `rollback txn rpc`. In FE, `committed txn` can not execute rollback, so BE received the return message of  `rollback txn rpc`from FE is `failed to call frontend service`, and then the message will be return to client.
   
   Who could tell me why coordinator BE can not receive the return of `commit txn rpc` successfully when one BE is stoped.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org