You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/01/03 10:49:12 UTC

[GitHub] [druid] qianmoQ edited a comment on issue #8605: Failed to publish segments because of [java.lang.RuntimeException: Aborting transaction!].

qianmoQ edited a comment on issue #8605: Failed to publish segments because of [java.lang.RuntimeException: Aborting transaction!].
URL: https://github.com/apache/druid/issues/8605#issuecomment-570538079
 
 
   > [coordinator-overlord.log](https://github.com/apache/druid/files/4018386/coordinator-overlord.log)
   > I encountered this problem, too. As a freshman on Druid, I've no idea how to solve it.
   > The datas in kafka may be ingested again when next new task is running. I can query those old records before that segment fails, but after the failure those records are gone.
   
   The reason for this problem is that the current running task node is short of memory |CPU resources, and druid is unable to release the completed tasks, which are still resident in memory, resulting in insufficient resources. Only manually release these physical resources, you can refer to the following script for resource monitoring and release:
   
   ```sh
   # /bin/bash
   
   DRUID_RUNNING_TASKS_PIDS=`ps -ef f|grep '\_ java -cp conf/druid/_common:conf/druid/middleManager:lib'|grep -v grep|awk '{print $2}'`
   
   CURRENT_TIMESTAMP=`date +%s`
   
   for pid in $DRUID_RUNNING_TASKS_PIDS
   do
       CUEERNT_START_TIME=`ps -p $pid -o lstart|tail -1`
       TEMP=`date -d "$CUEERNT_START_TIME" +%s`
       TIME_DIFF=$(($CURRENT_TIMESTAMP - $TEMP))
       if [[ $TIME_DIFF -gt 3600 ]]; then
          echo 'current PID $pid,start time $CUEERNT_START_TIME, timestamp$TEMP'
          kill -9 $pid
       fi
   done
   ```
   
   Please modify the 3600 in the script that is longer than the time to deploy the task. In order to ensure the data service, try to double the time of task. Add the script monitoring into the crontab of the system, for example:
   
   ```sh
   */5 * * * * /bin/sh /hadoop/data1/druid-0.12.3/druid-task-monitor.sh
   ```
   
   Mine is a check every 5 minutes to free up resources

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org