You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Bennie Schut (JIRA)" <ji...@apache.org> on 2010/09/27 12:04:35 UTC
[jira] Created: (HIVE-1671) multithreading on Context.pathToCS
multithreading on Context.pathToCS
----------------------------------
Key: HIVE-1671
URL: https://issues.apache.org/jira/browse/HIVE-1671
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: Bennie Schut
Fix For: 0.7.0
we having 2 threads running at 100%
With a stacktrace like this:
"Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.get(HashMap.java:303)
at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HIVE-1671) multithreading on
Context.pathToCS
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440 ]
Namit Jain edited comment on HIVE-1671 at 9/27/10 3:22 PM:
-----------------------------------------------------------
Are your using HiveServer ?
>> we having 2 threads running at 100%
What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?
was (Author: namit):
Are your using HiveServer ?
.bq we having 2 threads running at 100%
What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915468#action_12915468 ]
Bennie Schut commented on HIVE-1671:
------------------------------------
Sorry I was a bit short on the description. I'm running the HiveServer with hive.exec.parallel set to true. I'm running many jobs each day for about a week after startup. Then I notice 2 threads are stuck at 100% cpu for about 3days. I used jstack to look at both threads and they showed the same stacktrace.
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915262#action_12915262 ]
Bennie Schut commented on HIVE-1671:
------------------------------------
Perhaps because we now have multiple sub queries running in hive for the same overall query we can have concurrent use of this map?
We could simply fix this by using the ConcurrentHashMap
private Map<String, ContentSummary> pathToCS = new ConcurrentHashMap<String, ContentSummary>();
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bennie Schut updated HIVE-1671:
-------------------------------
Attachment: HIVE-1671-1.patch
No tests are added. It would be difficult to time a test to reproduce this and then it would show a normal HashMap isn't thread safe which we already know.
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440 ]
Namit Jain commented on HIVE-1671:
----------------------------------
Are your using HiveServer ?
.bq we having 2 threads running at 100%
What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain resolved HIVE-1671.
------------------------------
Hadoop Flags: [Reviewed]
Resolution: Fixed
Committed. Thanks Bennie
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915508#action_12915508 ]
Namit Jain commented on HIVE-1671:
----------------------------------
OK, I can now see the problem.
+1
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS
Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915275#action_12915275 ]
HBase Review Board commented on HIVE-1671:
------------------------------------------
Message from: "Bennie Schut" <be...@schut.cc>
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/909/
-----------------------------------------------------------
Review request for Hive Developers.
Summary
-------
simple change HashMap into ConcurrentHashMap
This addresses bug HIVE-1671.
http://issues.apache.org/jira/browse/HIVE-1671
Diffs
-----
trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1001658
Diff: http://review.cloudera.org/r/909/diff
Testing
-------
Thanks,
Bennie
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
> Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1671) multithreading on Context.pathToCS
Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bennie Schut reassigned HIVE-1671:
----------------------------------
Assignee: Bennie Schut
> multithreading on Context.pathToCS
> ----------------------------------
>
> Key: HIVE-1671
> URL: https://issues.apache.org/jira/browse/HIVE-1671
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
> at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
> at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.