You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/02/08 14:44:19 UTC

[jira] [Commented] (TAJO-587) Query is hanging when OutOfMemoryError occurs in the query master

    [ https://issues.apache.org/jira/browse/TAJO-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895569#comment-13895569 ] 

Hyunsik Choi commented on TAJO-587:
-----------------------------------

There may be many rooms for improvement in the method _scheduleRangeShuffledFetces()_. First of all, we should use just hostname, several intergers indicating subquery id, task id, and attempt id, instead of URI. It will significantly reduce the main memory usage.

As an temporary solution, you also can set more memory to TAJO_WORKER_HEAPSIZE. It would be helpful depending on your environment.

> Query is hanging when OutOfMemoryError occurs in the query master
> -----------------------------------------------------------------
>
>                 Key: TAJO-587
>                 URL: https://issues.apache.org/jira/browse/TAJO-587
>             Project: Tajo
>          Issue Type: Bug
>          Components: tajo master
>            Reporter: Jihoon Son
>             Fix For: 0.8-incubating
>
>
> See the title. When I run a simple sort query against a table of 1TB, the query is hanging and not finished.
> {noformat}
> tajo> select l_orderkey from lineitem order by l_orderkey
> 2014-02-05 17:20:52,339 FATAL master.TajoAsyncDispatcher (TajoAsyncDispatcher.java:dispatch(143)) - Error in dispatcher thread:SUBQUERY_COMPLETED
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at java.net.URI.create(URI.java:857)
>         at org.apache.tajo.master.querymaster.Repartitioner.scheduleRangeShuffledFetches(Repartitioner.java:342)
>         at org.apache.tajo.master.querymaster.Repartitioner.scheduleFragmentsForNonLeafTasks(Repartitioner.java:261)
>         at org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.schedule(SubQuery.java:680)
>         at org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:523)
>         at org.apache.tajo.master.querymaster.SubQuery$InitAndRequestContainer.transition(SubQuery.java:504)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.tajo.master.querymaster.SubQuery.handle(SubQuery.java:481)
>         at org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.executeNextBlock(Query.java:311)
>         at org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:357)
>         at org.apache.tajo.master.querymaster.Query$SubQueryCompletedTransition.transition(Query.java:297)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.tajo.master.querymaster.Query.handle(Query.java:584)
>         at org.apache.tajo.master.querymaster.Query.handle(Query.java:58)
>         at org.apache.tajo.master.TajoAsyncDispatcher.dispatch(TajoAsyncDispatcher.java:137)
>         at org.apache.tajo.master.TajoAsyncDispatcher$1.run(TajoAsyncDispatcher.java:79)
>         at java.lang.Thread.run(Thread.java:701)
> 2014-02-05 17:20:52,339 WARN  querymaster.QueryMaster (QueryMaster.java:run(459)) - Query q_1391587770871_0001 stopped cause query sesstion timeout: 384113 ms
> 2014-02-05 17:20:52,339 INFO  querymaster.QueryMasterTask (QueryMasterTask.java:stop(168)) - Stopping QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:20:52,346 INFO  master.TajoAsyncDispatcher (TajoAsyncDispatcher.java:stop(122)) - AsyncDispatcher stopped:q_1391587770871_0001
> 2014-02-05 17:20:52,351 INFO  querymaster.QueryMasterTask (QueryMasterTask.java:stop(198)) - Stopped QueryMasterTask:q_1391587770871_0001
> 2014-02-05 17:23:28,614 ERROR worker.TajoWorker (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)