You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Vladimir Klimontovich (JIRA)" <ji...@apache.org> on 2009/07/10 19:11:15 UTC

[jira] Created: (HADOOP-6140) addArchiveToClassPath doesn't work in 0.18.x branch

addArchiveToClassPath doesn't work in 0.18.x branch
---------------------------------------------------

                 Key: HADOOP-6140
                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs
    Affects Versions: 0.18.3
            Reporter: Vladimir Klimontovich
            Priority: Minor


addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.

This method didn't work.

Bug 1:

addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.

getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").

In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
[ hdfs,//host,port/path].

Suggested solution: use "," instead of  

Bug 2:

in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
It compares

   if (archives[i].getPath().equals(
                                               archiveClasspaths[j].toString())){

instead of

    if (archives[i].toString().equals(
                                               archiveClasspaths[j].toString())) 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729812#action_12729812 ] 

Philip Zeyliger commented on HADOOP-6140:
-----------------------------------------

Vladimir,

Patch looks good.  It would be nice to have a test for (2).  It may also be appropriate to add an exception if someone passes in a filename with a comma.

-- Philip

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733477#action_12733477 ] 

Amareshwari Sriramadasu commented on HADOOP-6140:
-------------------------------------------------

path.separator bug is fixed by HADOOP-4864.

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment:     (was: HADOOP-6140-ver2.patch)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment:     (was: HADOOP-6140.patch)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729866#action_12729866 ] 

Vladimir Klimontovich commented on HADOOP-6140:
-----------------------------------------------

Philip,

I tried to create junit test for (2), but I met some difficulties. Seems that I need to start whole cluster from junit test (jobtracker, tasktracker) to
test running job with additional classpath in distributed cache. I'm not familiar with hadoop code, but maybe you or someone could point to the test that I could use as example.

Or there is another option. I can extract piece of code with bug to separate method and test it separately. But I'm not sure it's reasonable to in in 0.18 branch.

Also, I noticed that a comma bug is already fixed in trunk. 


> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment: HADOOP-6140-ver3.patch

HADOOP-6140-ver3.patch contains patch with junit-test for (2)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver2.patch, HADOOP-6140-ver3.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment: HADOOP-6140.patch

This patch should fix this bug.

> addArchiveToClassPath doesn't work in 0.18.x branch
> ---------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>            Priority: Minor
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment: HADOOP-6140-ver2.patch

This patch contains unit-test for (1) and it validates if path to classpath entry contains CLASSPATH_ARCHIVES_SEPARATOR (",")

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver2.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Philip Zeyliger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729870#action_12729870 ] 

Philip Zeyliger commented on HADOOP-6140:
-----------------------------------------

One creates clusters in unit tests using MiniMRCluster, but that's too heavy-weight: I think it's ok to extract the relevant function and test it via unit test.

+1 to the tests for (1).

-- Philip

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver2.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729813#action_12729813 ] 

Vladimir Klimontovich commented on HADOOP-6140:
-----------------------------------------------

Ok, will make the test.

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Status: Patch Available  (was: Open)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment: HADOOP-6140-ver4.patch

HADOOP-6140-ver4.patch is a enhancement of HADOOP-6140-ver3.patch: it extracts comparison statement to separate method and improves junit-tests 

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730083#action_12730083 ] 

Vladimir Klimontovich commented on HADOOP-6140:
-----------------------------------------------

The same fix, but in trunk (0.21): MAPREDUCE-752

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver2.patch, HADOOP-6140-ver3.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Attachment:     (was: HADOOP-6140-ver3.patch)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-6140:
----------------------------------

    Status: Open  (was: Patch Available)

All the work for this issue should probably be in MAPREDUCE-752.

If this fix is to be included on the 0.18 branch, we'll also need fixes for the 0.19 and 0.20 branches.

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver4.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729900#action_12729900 ] 

Hadoop QA commented on HADOOP-6140:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12413164/HADOOP-6140-ver2.patch
  against trunk revision 793098.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/564/console

This message is automatically generated.

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140-ver2.patch, HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Priority: Major  (was: Minor)
     Summary: DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch  (was: addArchiveToClassPath doesn't work in 0.18.x branch)

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6140) DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch

Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Klimontovich updated HADOOP-6140:
------------------------------------------

    Tags: DistributedCache

> DistributedCache.addArchiveToClassPath doesn't work in 0.18.x branch
> --------------------------------------------------------------------
>
>                 Key: HADOOP-6140
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6140
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Vladimir Klimontovich
>         Attachments: HADOOP-6140.patch
>
>
> addArchiveToClassPath is a method of DistributedCache class. It should be called before running a task. It accepts path to a jar file on a DFS. After it
> this method should put this jar file on sitribuuted cache and than add this file to classpath to each map/reduce process on job tracker.
> This method didn't work.
> Bug 1:
> addArchiveToClassPath adds DFS-path to archive to mapred.job.classpath.archives property. It uses System.getProperty("path.separator") as delimiter of multiple path.
> getFileClassPaths that is called from TaskRunner uses splits mapred.job.classpath.archives using System.getProperty("path.separator").
> In unix systems System.getProperty("path.separator") equals to ":". DFS-path urls is hdfs://host:port/path. It means that a result of split will be
> [ hdfs,//host,port/path].
> Suggested solution: use "," instead of  
> Bug 2:
> in TaskRunner there is an algorithm that looks for correspondence between DFS paths and local paths in distributed cache. 
> It compares
>    if (archives[i].getPath().equals(
>                                                archiveClasspaths[j].toString())){
> instead of
>     if (archives[i].toString().equals(
>                                                archiveClasspaths[j].toString())) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.