You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/10/08 11:49:26 UTC

[jira] [Resolved] (TAJO-1875) Resource leak after a query failure

     [ https://issues.apache.org/jira/browse/TAJO-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jihoon Son resolved TAJO-1875.
------------------------------
       Resolution: Fixed
    Fix Version/s: 0.11.0

Committed to master and 0.11.

> Resource leak after a query failure
> -----------------------------------
>
>                 Key: TAJO-1875
>                 URL: https://issues.apache.org/jira/browse/TAJO-1875
>             Project: Tajo
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> I'm testing the join on hdfs and JDBC storage, and found that the worker resource is not collected properly after a query failure.
> Here is the log.
> {noformat}
> tpch> select
>     l_orderkey,
>     sum(l_extendedprice*(1-l_discount)) as revenue,
>     o_orderdate,
>     o_shippriority
> from
>     pgsql_db.customer as c
>     join pgsql_db.orders as o
>         on c.c_mktsegment = 'BUILDING' and c.c_custkey = o.o_custkey
>     join lineitem as l
>         on l.l_orderkey = o.o_orderkey
> where
>     o_orderdate < '1995-03-15'::date
>     and l_shipdate > '1995-03-15'::date
> group by
>     l_orderkey, o_orderdate, o_shippriority
> order by
>     revenue desc, o_orderdate;
> ...
> 2015-09-23 01:22:36,776 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000002_000000_02
> 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000002_000000_02 is initialized
> 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
> 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/2/0_2
> 2015-09-23 01:22:36,777 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,778 INFO org.apache.tajo.storage.jdbc.JdbcScanner: Generated SQL: SELECT o_orderkey,o_custkey,o_orderdate,o_shippriority FROM orders WHERE o.o_orderdate < DATE '1995-03-15'
> 2015-09-23 01:22:36,783 ERROR org.apache.tajo.worker.TaskImpl: internal error: ERROR: missing FROM-clause entry for table "o"
>   Position: 74
> org.apache.tajo.exception.TajoInternalError: internal error: ERROR: missing FROM-clause entry for table "o"
>   Position: 74
>         at org.apache.tajo.storage.jdbc.JdbcScanner.executeQueryAndGetIter(JdbcScanner.java:263)
>         at org.apache.tajo.storage.jdbc.JdbcScanner.next(JdbcScanner.java:116)
>         at org.apache.tajo.engine.planner.physical.FullScanIterator.hasNext(FullScanIterator.java:39)
>         at org.apache.tajo.engine.planner.physical.SeqScanExec.next(SeqScanExec.java:292)
>         at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.next(HashShuffleFileWriteExec.java:102)
>         at org.apache.tajo.worker.TaskImpl.run(TaskImpl.java:405)
>         at org.apache.tajo.worker.TaskContainer.run(TaskContainer.java:65)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-09-23 01:22:36,783 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000002_000000_02 completed. Worker's task counter - total:3, succeeded: 0, killed: 0, failed: 3
> 2015-09-23 01:22:36,784 ERROR org.apache.tajo.querymaster.TaskAttempt: ta_1442938691159_0002_000002_000000_02 FROM jihoon-desktop >> internal error: ERROR: missing FROM-clause entry for table "o"
>   Position: 74
> 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ====================================================================================
> 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: >>> Task Failed: ta_1442938691159_0002_000002_000000_02, retry:false, attempts:3 <<<
> 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.Task: ====================================================================================
> 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: TaskScheduler schedulingThread stopped
> 2015-09-23 01:22:36,784 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Stage: Stage completed - eb_1442938691159_0002_000002 (total=1, success=0, killed=0)
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: FAILED, 1/2.
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 1
> 2015-09-23 01:22:36,785 INFO org.apache.tajo.storage.HashShuffleAppenderManager: Close HashShuffleAppender:eb_1442938691159_0002_000002, not a hash shuffle
> 2015-09-23 01:22:36,786 ERROR org.apache.tajo.querymaster.Stage: Can't handle this event at current state, eventType:SQ_SHUFFLE_REPORT, oldState:FAILED, nextState:FAILED
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: SQ_SHUFFLE_REPORT at FAILED
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.tajo.querymaster.Stage.handle(Stage.java:797)
>         at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:210)
>         at org.apache.tajo.querymaster.QueryMasterTask$StageEventDispatcher.handle(QueryMasterTask.java:204)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:745)
> 2015-09-23 01:22:36,786 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Task Scheduler stopped
> 2015-09-23 01:22:36,786 INFO org.apache.tajo.worker.TaskManager: Stopped execution block:eb_1442938691159_0002_000002
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type STAGE_COMPLETED
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Complete Stage[eb_1442938691159_0002_000002], State: ERROR, 2/2.
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: Processing q_1442938691159_0002 of type QUERY_COMPLETED
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.Query: q_1442938691159_0002 Query Transitioned from QUERY_RUNNING to QUERY_ERROR
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.QueryMasterTask: Query completion notified from q_1442938691159_0002 final state: QUERY_ERROR
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 2
> 2015-09-23 01:22:36,787 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000002_00
> 2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,788 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopping QueryMasterTask:q_1442938691159_0002
> 2015-09-23 01:22:36,788 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000002_00 is initialized
> 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
> 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/2_0
> 2015-09-23 01:22:36,789 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Cleanup resources of all workers. Query: q_1442938691159_0002, workers: 1
> 2015-09-23 01:22:36,789 INFO org.apache.tajo.querymaster.QueryMasterTask: Stopped QueryMasterTask:q_1442938691159_0002
> 2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Task basedir is created (file:/disk2/tajo-tmp/q_1442938691159_0002/output/1)
> 2015-09-23 01:22:36,790 INFO org.apache.tajo.worker.TaskImpl: Initializing: ta_1442938691159_0002_000001_000003_00
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 3
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Stage ta_1442938691159_0002_000001_000003_00 is initialized
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * InterQuery: true, Use HASH_SHUFFLE shuffle, Fragments (num: 1), Fetches (total:0) :
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: * Local task dir: file:/disk2/tajo-tmp/q_1442938691159_0002/output/1/3_0
> 2015-09-23 01:22:36,792 INFO org.apache.tajo.worker.TaskImpl: ==================================
> 2015-09-23 01:22:36,795 INFO org.apache.tajo.worker.TaskManager: QueryMaster Address:jihoon-desktop:28093
> 2015-09-23 01:22:36,797 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 4
> 2015-09-23 01:22:36,798 INFO org.apache.tajo.worker.TaskImpl: ta_1442938691159_0002_000001_000002_00 completed. Worker's task counter - total:3, succeeded: 2, killed: 0, failed: 1
> 2015-09-23 01:22:36,798 WARN org.apache.tajo.rpc.NettyClientBase: java.nio.channels.ClosedChannelException Try to reconnect :jihoon-desktop/127.0.1.1:28093
> 2015-09-23 01:22:36,801 INFO org.apache.tajo.querymaster.DefaultTaskScheduler: Assigned host : jihoon-desktop, Unknown Volume : -1, Concurrency : 5
> 2015-09-23 01:22:36,992 INFO org.apache.tajo.util.history.HistoryWriter: Saving query summary: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/query.hist
> 2015-09-23 01:22:37,026 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000001.hist
> 2015-09-23 01:22:37,042 INFO org.apache.tajo.util.history.HistoryWriter: Saving query unit: hdfs://localhost:7020/tmp/tajo-jihoon/staging/history/20150923/query-detail/q_1442938691159_0002/eb_1442938691159_0002_000002.hist
> 2015-09-23 01:22:37,559 INFO org.apache.tajo.util.history.HistoryWriter: Closing task history file: file:/tmp/tajo-jihoon/history/19700101/tasks/jihoon-desktop_28091/jihoon-desktop_28091_09_0.hist
> 2015-09-23 01:22:37,803 WARN org.apache.tajo.querymaster.QueryMasterManagerService: No QueryMasterTask: ta_1442938691159_0002_000001_000002_00
> 2015-09-23 01:22:46,797 FATAL org.apache.tajo.worker.TaskManager:
> java.util.concurrent.TimeoutException
>         at org.apache.tajo.rpc.CallFuture.get(CallFuture.java:79)
>         at org.apache.tajo.worker.TaskManager.createExecutionBlock(TaskManager.java:126)
>         at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:178)
>         at org.apache.tajo.worker.TaskManager.handle(TaskManager.java:54)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)