You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2010/02/23 00:52:27 UTC

[jira] Created: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

HarFileSystem cannot handle paths with the space character
----------------------------------------------------------

                 Key: HADOOP-6591
                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs
            Reporter: Tsz Wo (Nicholas), SZE
            Assignee: Mahadev konar


Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-6591:
-------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks Rodrigo!

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842831#action_12842831 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Nicholas, Mahadev,

What do you think of this patch? Is it "committable" :-)?

Thanks,
Rodrigo

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842937#action_12842937 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Hi Nicholas,

Thanks for reviewing the patch!

I don't understand the incompatibility problem you mentioned. On decodeFileName, I check version (lowercase, which is the version of the archive generated, not the protocol version) and return the original filename if it's not 2 (which means it's 1 since the protocol rejects higher versions). That should make this patch backward compatible.

I think you might have confused version (lowercase - version of the archive being read) with VERSION (uppercase - version of the protocol). Let me know if that's indeed the case or I missed something else.

-Rodrigo

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rodrigo Schmidt updated HADOOP-6591:
------------------------------------

    Attachment: HADOOP-6591.patch

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845938#action_12845938 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

The unit tests will be added in MAPREDUCE-1585.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845022#action_12845022 ] 

Mahadev konar commented on HADOOP-6591:
---------------------------------------

+1 the patch looks good rodrigo.... 

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845550#action_12845550 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Mahadev, thanks for reviewing it. The tests will be added with the new patch I'll submit to MAPREDUCE-1585.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845877#action_12845877 ] 

Hudson commented on HADOOP-6591:
--------------------------------

Integrated in Hadoop-Common-trunk #278 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/278/])
    . HarFileSystem can handle paths with the whitespace characters.
(Rodrigo Schmidt via dhruba)


> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841958#action_12841958 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Yes, it's incompatible. I changed the protocol version. as Mahadev suggested on his comment at MAPREDUCE-1548.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842931#action_12842931 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6591:
------------------------------------------------

BTW, the version 1 archive should handle spaces in a better way but not create corrupted har silently.  So I filed MAPREDUCE-1579.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated HADOOP-6591:
----------------------------------

    Fix Version/s: 0.22.0

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842203#action_12842203 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

There is no way I can add a contrib test to this patch if the HAR creation is performed by a new version of HadoopArchives.java that must be committed to MAPREDUCE. This is an instance of the chicken or the egg dilemma.

IMO, the best solution is to commit this patch, which is harmless to previous versions of Hadoop Archives, and then submit my new patch to HadopArchives.java, with new unit tests.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842138#action_12842138 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Thanks for the examples, Nicholas!

My patch actually solves all these issues, but I don't know how to submit it given that part of the files I changed is in hadoop common, and the other part is in mapreduce. Any recommendations?

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841953#action_12841953 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6591:
------------------------------------------------

> ... using URL encoding and decoding of filenames ...
Then, is it an incompatible change?  Does it work with the previously created har files?

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841967#action_12841967 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

The other problem I see is that HadoopArchives uses HarFileSystem.VERSION when it creates files. So, if we commit HarFileSystem.java first, HadoopArchives will generate old-style files with VERSION = 2. And if we commit HadoopArchives.java first, new-style files will be generated with VERSION 1 (although the old HadoopArchives won't be able to read them properly). It's sort of a chicken-and-egg problem created because two java files that actually depend on each other ended up in two different hadoop projects (common and mapreduce). 

Sincerely, I don't understand why HarFileSystem is part of common.

For instance, in Raid, we made DistributedRaidFileSystem part of MAPREDUCE as well, partly to avoid this type or problem.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843247#action_12843247 ] 

Mahadev konar commented on HADOOP-6591:
---------------------------------------

{code}
-  private static class HarStatus {
+  private class HarStatus {
{code}

Rodrigo, is there is a reason to make it non static?

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843379#action_12843379 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

I've created MAPREDUCE-1585 to cope with the creation of version 2 archives. I'm uploading the rest of my current patch there, but MAPREDUCE-1585 really depends on this jira for it not to crash the unit tests.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841964#action_12841964 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

I see...

Since we are just starting to use hadoop archives, I didn't care to make my patch backwards compatible. Does it have to be to make it to the hadoop trunk?

I could easily adapt my patch so that HarFileSystem could read the version 1 protocol, but that doesn't solve the unit test problem. The new unit tests have files/directories with spaces in their names and HadoopArchives.java currently can't handle these cases.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842180#action_12842180 ] 

Hadoop QA commented on HADOOP-6591:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438075/HADOOP-6591.patch
  against trunk revision 918880.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/33/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/33/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/33/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h1.grid.sp2.yahoo.net/33/console

This message is automatically generated.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841753#action_12841753 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

I have a simple patch for this JIRA using URL encoding and decoding of filenames, but it also involves modifying HadoopArchives.java, which is in MAPREDUCE.

The modifications to HarFileSystem and TestHarFileSystem will fail the unit tests if not used with the new version of HadoopArchives.

How can I submit such patch?

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842131#action_12842131 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6591:
------------------------------------------------

Tested current implementation:
- Suppose we have a file "k20d3f3/aaaa bbb".  The archive job failed with FileNotFoundException.

- If another file k20d3f3/aaaa exist, then the archive job succeeded but the result .har is corrupted and "fs -ls" failed.
{noformat}
-bash-3.1$ hadoop fs -ls  har://hdfs-namenode:8020/user/tsz/k20d3f3.2.har/k20d3f3
ls: could not get get listing for 'har://hdfs-namenode:8020/user/tsz/k20d3f3.2.har/k20d3f3' :
 File: har://hdfs-namenode:8020/user/tsz/k20d3f3.2.har/k20d3f3/bbb does not exist in har://hdfs-namenode:8020/user/tsz/k20d3f3.2.har
{noformat}

- If file k20d3f3/bbb also exist, the archive job successed.  "fs -ls" also succeeded with duplicated entries.
{noformat}
-bash-3.1$ hadoop fs -ls  har://hdfs-namenode:8020/user/tsz/k20d3f3.3.har/k20d3f3
Found 69 items
-rw-------   5 tsz users          0 2010-03-06 00:29 /user/tsz/k20d3f3.3.har/k20d3f3/aaaa
-rw-------   5 tsz users          0 2010-03-06 00:29 /user/tsz/k20d3f3.3.har/k20d3f3/aaaa
-rw-------   5 tsz users          0 2010-03-06 00:29 /user/tsz/k20d3f3.3.har/k20d3f3/bbb
-rw-------   5 tsz users          0 2010-03-06 00:29 /user/tsz/k20d3f3.3.har/k20d3f3/bbb
...
{noformat}


> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rodrigo Schmidt updated HADOOP-6591:
------------------------------------

    Status: Patch Available  (was: Open)

Submitting my patch without unit tests or changes to HadoopArchives.java since these should be on MAPREDUCE project.

I made it backwards-compatible though.

I welcome any comments.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843228#action_12843228 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6591:
------------------------------------------------

> ... On decodeFileName, I check version (lowercase, which is the version of the archive generated, not the protocol version) and return the original filename if it's not 2 (which means it's 1 since the protocol rejects higher versions). That should make this patch backward compatible.

Yes, it should work fine.  Thanks, Rodrigo.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841962#action_12841962 ] 

Mahadev konar commented on HADOOP-6591:
---------------------------------------

rodrigo,
  I think what nicholas meant was that how would keep it backwards compatible? Would you be supporting both the versions?

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845765#action_12845765 ] 

Hudson commented on HADOOP-6591:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #200 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/200/])
    . HarFileSystem can handle paths with the whitespace characters.
(Rodrigo Schmidt via dhruba)


> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842927#action_12842927 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6591:
------------------------------------------------

Hi Rodrigo, the patch is not backward compatible since it always decodes the path stored in the index files.  If there are special character sequences in the existing version 1 har, they will be incorrectly decoded.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated HADOOP-6591:
----------------------------------

    Assignee: Rodrigo Schmidt  (was: Mahadev konar)

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Rodrigo Schmidt
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6591) HarFileSystem cannot handle paths with the space character

Posted by "Rodrigo Schmidt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843258#action_12843258 ] 

Rodrigo Schmidt commented on HADOOP-6591:
-----------------------------------------

Mahadev, the reason is to use the version property of class HarFileSystem. Without that, the code would have looked more cumbersome. Making it non-static creates a link between HarStatus and the HarFileSystem object that uses it. I see no problem in that since it's a private class anyway.

> HarFileSystem cannot handle paths with the space character
> ----------------------------------------------------------
>
>                 Key: HADOOP-6591
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6591
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Mahadev konar
>         Attachments: HADOOP-6591.patch
>
>
> Since HarFileSystem is using " " as a separator in the index files, it won't work if there are " " in the path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.