You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/02/19 04:05:09 UTC

[GitHub] [incubator-doris] WingsGo opened a new issue #2942: [Alter]Schema Change in a big table will cause FE timeout and cancel this job

WingsGo opened a new issue #2942: [Alter]Schema Change in a big table will cause FE timeout and cancel this job
URL: https://github.com/apache/incubator-doris/issues/2942
 
 
   **Describe the bug**
   When I doing a schema change job in a table which has 147 partitions,each partitions have 175 tablets,in pending stage,there is a error occurs, the error msg is `Create replicas failed. Error: Error replicas: 15072977=32666105, 15072977=32672253,15072977=32653793`. I go to FE source code and found the following code,it means that if be create tablet execcess the max timeout(1 min) will cause this error,so i search more info in be.info, try to found out if there is some error when create replicas or just only create replicas is timeout.
   
   ```
   if (!FeConstants.runningUnitTest) {
   // send all tasks and wait them finished
   AgentTaskQueue.addBatchTask(batchTask);
   AgentTaskExecutor.submit(batchTask);
   // max timeout is 1 min
   long timeout = Math.min(Config.tablet_create_timeout_second * 1000L * totalReplicaNum, 60000);
   boolean ok = false;
   try
   
   { ok = countDownLatch.await(timeout, TimeUnit.MILLISECONDS); }
   catch (InterruptedException e)
   
   { LOG.warn("InterruptedException: ", e); ok = false; }
   if (!ok) {
   // create replicas failed. just cancel the job
   // clear tasks and show the failed replicas to user
   AgentTaskQueue.removeBatchTask(batchTask, TTaskType.CREATE);
   String errMsg = null;
   if (!countDownLatch.getStatus().ok())
   
   { errMsg = countDownLatch.getStatus().getErrorMsg(); }
   else
   
   { List<Entry<Long, Long>> unfinishedMarks = countDownLatch.getLeftMarks(); // only show at most 3 results List<Entry<Long, Long>> subList = unfinishedMarks.subList(0, Math.min(unfinishedMarks.size(), 3)); errMsg = "Error replicas:" + Joiner.on(", ").join(subList); }
   LOG.warn("failed to create replicas for job: {}, {}", jobId, errMsg);
   throw new AlterCancelException("Create replicas failed. Error: " + errMsg);
   }
   }
   ```
   
   as shown in the picture, the task starts at `00:00:31`,ends at `00:01:32`,so I go to the be machine , the last information as following,it means that the `create replicas task` finish and erase from queue at `00:01:33:88`. So I think we should change the max timeout as configurable to avoid this case, and I will add a PR later.
   
   ```
   I0218 00:01:33.876698 37320 tablet_manager.cpp:277] begin to process create tablet. tablet=32677985, schema_hash=1328100050
   I0218 00:01:33.876894 37319 task_worker_pool.cpp:328] finish task success. result:1
   I0218 00:01:33.876904 37319 task_worker_pool.cpp:286] type: CREATE, signature: 32677937, has been erased, queue size: 1
   I0218 00:01:33.878705 37320 tablet_manager.cpp:1329] next_unique_id:267
   I0218 00:01:33.879608 37320 tablet.cpp:333] no rowset for version:0-1, tablet: 32677985.1328100050.e24345ecc852edf0-4814d0d8717e04b9
   I0218 00:01:33.879624 37320 tablet_manager.cpp:379] this request is for alter tablet request v2, so that not add alter task to tablet
   I0218 00:01:33.880174 37320 tablet_meta_manager.cpp:115] save tablet meta , key:tabletmeta_32677985_1328100050 meta_size=11029
   I0218 00:01:33.880295 37320 tablet_manager.cpp:437] finish to process create tablet. res=0
   I0218 00:01:33.880302 37320 tablet_manager.cpp:324] finish to process create tablet. res=0
   I0218 00:01:33.880513 37320 task_worker_pool.cpp:328] finish task success. result:1
   I0218 00:01:33.880523 37320 task_worker_pool.cpp:286] type: CREATE, signature: 32677985, has been erased, queue size: 0
   ```
   
   **Screenshots**
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org