You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/12/08 07:14:18 UTC

[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

     [ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated MAPREDUCE-967:
----------------------------------

    Attachment: mapreduce-967.txt

bq. we should definitely document clearly the points you've mentioned above w.r.t the classpath.

You're totally right, and I actually did this and forgot to upload the patch! My bad. Here's a new one.

bq. makes this JIRA issue an incompatible change

Yes, this is technically incompatible. But I think it's not a problem for the following reasons:
- Since job.jar is itself added to the classpath, the standard classloader will pick up anything inside job.jar just as if it were expanded and the resulting dir were put on the classpath
- The only other people this should break are those who are using java.io (or other non-classpath-related access methods) to access things unpacked from the jar. The new configuration parameter is a suitable workaround for them (as demonstrated by Streaming). In this case, what's on the classpath doesn't matter since they're not using a ClassLoader anyhow.
- Non-java applications are the only ones for whom the above two points don't apply, but non-Java applications don't have any concept of classpath and therefore it shouldn't be a problem.

Philosophically, isn't pre-1.0 exactly when we should be making these minor incompatible changes for the purposes of code cleanliness? Compared to the other drastic changes we're putting in 22, this is hardly a showstopper. I don't see anything *against* the change you're requesting, except that I think we should do everything in our power now to clean up the code before we call Hadoop 1.0. If I'm the only one with this philosophy, I'll acquiesce, but I think the sloppy classpath is just as likely to come back to bite us as fixing it.

> TaskTracker does not need to fully unjar job jars
> -------------------------------------------------
>
>                 Key: MAPREDUCE-967
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt
>
>
> In practice we have seen some users submitting job jars that consist of 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning up after them has a significant cost (both in wall clock and in unnecessary heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.