You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Devaraj K (JIRA)" <ji...@apache.org> on 2011/07/01 17:12:28 UTC

[jira] [Created] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
----------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2637
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
            Reporter: Devaraj K
            Assignee: Devaraj K


Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
We cannot specify the remote debugging port dynamically.
As a result, it's not possible to remote debug multiple Child JVMs.
As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Ravi Teja Ch N V (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168157#comment-13168157 ] 

Ravi Teja Ch N V commented on MAPREDUCE-2637:
---------------------------------------------

With IsolationRunner no longer functional(MAPREDUCE-2606),we need to find a alternative for it, as it makes the task debug a bit difficult.
                
> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063872#comment-13063872 ] 

Harsh J commented on MAPREDUCE-2637:
------------------------------------

(The motivation behind the previous comment was that the same code is executed almost everywhere, so not necessary to start a server on each one)

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063940#comment-13063940 ] 

Arun C Murthy commented on MAPREDUCE-2637:
------------------------------------------

Devaraj and Harsh - right now, you can specify that subset of maps/reduces be run with special options to debug/profile only a subset.

I'm pointing out that allowing special denominations for ports etc. is not necessary since the profiler output can be dumped on local-disk and fetched.

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064081#comment-13064081 ] 

Harsh J commented on MAPREDUCE-2637:
------------------------------------

Arun - Yep, I know about that option. I had tried this out (the ports, as a hack) once earlier, and the interactive debug option looks like a more lucrative one to me, however difficult to achieve on a distributed system (perhaps it makes no sense at all today, when you factor in mrunit/etc. tools which are available now.)

I believe my trials also had me include jmx ports this way, and that was really beneficial (although crazy), to see stats update live via a remotely attached visualvm :)

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Devaraj K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063779#comment-13063779 ] 

Devaraj K commented on MAPREDUCE-2637:
--------------------------------------

When the user faces any problem with the job code if they want to see the code execution then it will be useful. If any problem is coming with large jobs they can use this feature to find out the problem in the user code. 

We can provide a configuration which will enable this feature explicitly if the user wants it. This may not be much useful for the production environment. Here scalability doesn’t come into picture because this will be used to debug issues which are difficult to find out or hard to analyze with logs. 


> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063749#comment-13063749 ] 

Arun C Murthy commented on MAPREDUCE-2637:
------------------------------------------

I'm not sure this is a scalable solution, do we really want each map/reduce task to come up with a server for connecting to - particularly for large jobs this makes very little sense.

OTOH, most profilers allow you to 'dump' the profile information to the local disk. Hadoop MR also has support for automatically shipping that to the job-client. Doesn't that suffice?

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063978#comment-13063978 ] 

Luke Lu commented on MAPREDUCE-2637:
------------------------------------

Another simpler and probably more practical approach would be to allow specifying jvm options for a specific input split. BTW, all these options are possible without cluster wide upgrade in MR2: user can launch a special mapreduce runtime with its special AM etc.

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279151#comment-13279151 ] 

Harsh J commented on MAPREDUCE-2637:
------------------------------------

Shouldn't we raise issues on MRUNIT to tackle all debug-ability issues? I believe thats where it ought to go.

For developers of YARN framework who wish to debug at the container levels, the way is to do it via the IDE. We can repurpose this JIRA to have that documented.
                
> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Bhallamudi Venkata Siva Kamesh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063746#comment-13063746 ] 

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-2637:
-----------------------------------------------------------

We choose a base port. 
1) If the task type is map task, *basePort + MapTaskId* will be the debug port for map task.
2) If the task type is reduce task, *basePort - ReduceTaskId* will be the debug port for reduce task.

There will be no port conflict issues, even a reduce task is started while a map task is still in progress.
But this is possible, only when a particular range of ports are free.

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063968#comment-13063968 ] 

Luke Lu commented on MAPREDUCE-2637:
------------------------------------

I can see this being useful for interactive debugging of certain type of tasks without access to local machines via ssh.

The basePort + taskId approach won't even work as the taskId can cover the entire port range. The only approach that can possibly work is to use the ephemeral port for the jpda ports and make them available in web UI and/or API. 

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058663#comment-13058663 ] 

Harsh J commented on MAPREDUCE-2637:
------------------------------------

Something like a base port number, upon which the task ID number may be added upon to get the specific port #?

We'd also need to take care of reducers launching over while the maps run, cause if they are of the same id. A secondary addition would be good if we're talking about adding IDs to base port numbers.

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2637) Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)

Posted by "Harsh J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063871#comment-13063871 ] 

Harsh J commented on MAPREDUCE-2637:
------------------------------------

One more thought I had is to let the user specify the particular (or multiples of those) map task he'd like the server to be started upon. This way, he can even debug particular splits (of course, having splits information needs work upon as well).

> Providing options to debug the mapreduce user code (Mapper, Reducer, Combiner, Sort implementations)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2637
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2637
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: Devaraj K
>            Assignee: Devaraj K
>
> Presently Hadoop provides "mapred.child.java.opts" configuration which can be used to set JVM options for Child JVM running Map or Reduce Task. 
> If we need to remote debug the Child JVM, we can add remote debugging options to this configuration value.
> But this will work only for single Child JVM. Other children will fail as the remote debugging port is already used.
> We cannot specify the remote debugging port dynamically.
> As a result, it's not possible to remote debug multiple Child JVMs.
> As a solution to this problem, we can provide a configuration to debug Task JVMs in this scenario.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira