You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Bennie Schut (JIRA)" <ji...@apache.org> on 2010/09/27 12:04:35 UTC

[jira] Created: (HIVE-1671) multithreading on Context.pathToCS

multithreading on Context.pathToCS
----------------------------------

                 Key: HIVE-1671
                 URL: https://issues.apache.org/jira/browse/HIVE-1671
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Bennie Schut
             Fix For: 0.7.0


we having 2 threads running at 100%

With a stacktrace like this:

"Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
   java.lang.Thread.State: RUNNABLE
        at java.util.HashMap.get(HashMap.java:303)
        at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
        at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440 ] 

Namit Jain edited comment on HIVE-1671 at 9/27/10 3:22 PM:
-----------------------------------------------------------

Are your using HiveServer ?

>> we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?

      was (Author: namit):
    Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?
  
> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915468#action_12915468 ] 

Bennie Schut commented on HIVE-1671:
------------------------------------

Sorry I was a bit short on the description. I'm running the HiveServer with hive.exec.parallel set to true. I'm running many jobs each day for about a week after startup. Then I notice 2 threads are stuck at 100% cpu for about 3days. I used jstack to look at both threads and they showed the same stacktrace.

> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915262#action_12915262 ] 

Bennie Schut commented on HIVE-1671:
------------------------------------

Perhaps because we now have multiple sub queries running in hive for the same overall query we can have concurrent use of this map?
We could simply fix this by using the ConcurrentHashMap


  private Map<String, ContentSummary> pathToCS = new ConcurrentHashMap<String, ContentSummary>();


> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bennie Schut updated HIVE-1671:
-------------------------------

    Attachment: HIVE-1671-1.patch

No tests are added. It would be difficult to time a test to reproduce this and then it would show a normal HashMap isn't thread safe which we already know.

> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915440#action_12915440 ] 

Namit Jain commented on HIVE-1671:
----------------------------------

Are your using HiveServer ?

.bq we having 2 threads running at 100%

What do you mean by the above ? Are you setting hive.exec.parallel to true, in which case, I can see the problem happening ?

> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-1671.
------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed. Thanks Bennie

> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915508#action_12915508 ] 

Namit Jain commented on HIVE-1671:
----------------------------------

OK, I can now see the problem.

+1


> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1671) multithreading on Context.pathToCS

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915275#action_12915275 ] 

HBase Review Board commented on HIVE-1671:
------------------------------------------

Message from: "Bennie Schut" <be...@schut.cc>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/909/
-----------------------------------------------------------

Review request for Hive Developers.


Summary
-------

simple change HashMap into ConcurrentHashMap


This addresses bug HIVE-1671.
    http://issues.apache.org/jira/browse/HIVE-1671


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1001658 

Diff: http://review.cloudera.org/r/909/diff


Testing
-------


Thanks,

Bennie




> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1671-1.patch
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1671) multithreading on Context.pathToCS

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bennie Schut reassigned HIVE-1671:
----------------------------------

    Assignee: Bennie Schut

> multithreading on Context.pathToCS
> ----------------------------------
>
>                 Key: HIVE-1671
>                 URL: https://issues.apache.org/jira/browse/HIVE-1671
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>
> we having 2 threads running at 100%
> With a stacktrace like this:
> "Thread-16725" prio=10 tid=0x00007ff410662000 nid=0x497d runnable [0x00000000442eb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.HashMap.get(HashMap.java:303)
>         at org.apache.hadoop.hive.ql.Context.getCS(Context.java:524)
>         at org.apache.hadoop.hive.ql.exec.Utilities.getInputSummary(Utilities.java:1369)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.estimateNumberOfReducers(MapRedTask.java:329)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.setNumberOfReducers(MapRedTask.java:297)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:84)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.