You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Karl Kuntz <kk...@tradebotsystems.com> on 2011/04/11 17:14:43 UTC

Bug using ^ in a file name

Hi all,

Wondering if anybody else has seen this behavior and if/how they resolved it.  - on Hadoop 0.20.2.

When I specify the ^ char in a file name within a hadoop command line it works just fine for commands that don't use globbing  like put.
But after putting that file, then I can't get a listing of the file with ls or view the file with cat/text using the same path.

i.e.
hadoop dfs -put test^ing /tmp         <- works
hadoop dfs -ls /tmp                              <- works and shows the file in the dir
hadoop dfs -ls /tmp/test^ing            <- returns "ls: Cannot access /tmp/test^ing: No such file or directory."

After walking through the code it looks like the culprit is in FileSystem.java @line 1050 in setRegex(String filePattern).:

....
      for (int i = 0; i < len; i++) {
        char pCh;

        // Examine a single pattern character
        pCh = filePattern.charAt(i);
...
    } else if (pCh == '[' && setOpen == 0) {
          setOpen++;
          hasPattern = true;
        } else if (pCh == '^' && setOpen > 0) {
        } else if (pCh == '-' && setOpen > 0) {
          // Character set range
          setRange = true;
        } else if (pCh == PAT_SET_CLOSE && setRange) {
...

So it looks like the ^ char isn't appended to the output file pattern under any circumstances.

Is this a bug in the put (that it doesn't test to disallow the ^) or is it a bug in the globbing that doesn't support the use of the ^ literal in the path?
I looks to me like it's an issue with the globbing, since I can use the ^ when I create Path() objects in code, but I could be wrong.

I've thought about mangling the filename to get around the use of the ^ char, but would prefer to just use them as literals if possible.

Thanks for any thoughts / help.

-Karl

RE: Bug using ^ in a file name

Posted by Karl Kuntz <kk...@tradebotsystems.com>.
https://issues.apache.org/jira/browse/HADOOP-7222

It's the first Jira I've created.  Hopefully it is acceptable.

Thanks!

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Monday, April 11, 2011 11:25 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Bug using ^ in a file name

Hello Karl,

On Mon, Apr 11, 2011 at 8:44 PM, Karl Kuntz <kk...@tradebotsystems.com> wrote:
> Hi all,
> When I specify the ^ char in a file name within a hadoop command line it
> works just fine for commands that don't use globbing  like put.
>
> But after putting that file, then I can't get a listing of the file with ls
> or view the file with cat/text using the same path.
> i.e.
>
> hadoop dfs -put test^ing /tmp         <- works
>
> hadoop dfs -ls /tmp                              <- works and shows the file
> in the dir
>
> hadoop dfs -ls /tmp/test^ing            <- returns "ls: Cannot access
> /tmp/test^ing: No such file or directory."

Good hunting down! The "^" is a valid regex symbol and hadoop-common's
'globbing' support adds a bit of regex support to itself.

And since most FsShell funcs do glob matching by default, it is
difficult to escape this from the shell. However, if you were to use a
non globbing java API (pure FS.listStatus, instead of FS.globStatus)
for doing the same operation in code with that string, it should work
just fine. So while ^ is acceptable in DFS filenames, it means a
different thing in the FsShell utils.

Maybe adding a fix for this somehow (a glob switch?) can help in
certain scenarios. Remember file a JIRA on this at
https://issues.apache.org/jira/browse/HADOOP if you do want it fixed!
:)

-- 
Harsh J

Re: Bug using ^ in a file name

Posted by Harsh J <ha...@cloudera.com>.
Hello Karl,

On Mon, Apr 11, 2011 at 8:44 PM, Karl Kuntz <kk...@tradebotsystems.com> wrote:
> Hi all,
> When I specify the ^ char in a file name within a hadoop command line it
> works just fine for commands that don’t use globbing  like put.
>
> But after putting that file, then I can’t get a listing of the file with ls
> or view the file with cat/text using the same path.
> i.e.
>
> hadoop dfs -put test^ing /tmp         <- works
>
> hadoop dfs -ls /tmp                              <- works and shows the file
> in the dir
>
> hadoop dfs -ls /tmp/test^ing            <- returns “ls: Cannot access
> /tmp/test^ing: No such file or directory.”

Good hunting down! The "^" is a valid regex symbol and hadoop-common's
'globbing' support adds a bit of regex support to itself.

And since most FsShell funcs do glob matching by default, it is
difficult to escape this from the shell. However, if you were to use a
non globbing java API (pure FS.listStatus, instead of FS.globStatus)
for doing the same operation in code with that string, it should work
just fine. So while ^ is acceptable in DFS filenames, it means a
different thing in the FsShell utils.

Maybe adding a fix for this somehow (a glob switch?) can help in
certain scenarios. Remember file a JIRA on this at
https://issues.apache.org/jira/browse/HADOOP if you do want it fixed!
:)

-- 
Harsh J