You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Paul Sheer <pa...@gmail.com> on 2009/03/05 11:58:28 UTC
Hadoop with case-preservation and case-insensitivity
Hi there,
I have the requirement to use Hadoop with case-insensitivity and
case-preservation ala Windows.
Hadoop has such a clean class hierarchy it seems that the only change
needed is in INode.java,
(snippet below).
Can anyone help with the following question -
If I change only the methods below (to all work case-insensitively) is
this sufficient?
I.e. can I trust that all file/dir name comparison go through these
four methods.
Or will I get bitten by code elsewhere that does name comparisons or otherwise
requires case-sensitive behavior?
Many thanks for any comments.
-paul
=================
public abstract class INode implements Comparable<byte[]> {
.............
.............
//
// Comparable interface
//
public int compareTo(byte[] o) {
return compareBytes(name, o);
}
public boolean equals(Object o) {
if (!(o instanceof INode)) {
return false;
}
return Arrays.equals(this.name, ((INode)o).name);
}
public int hashCode() {
return Arrays.hashCode(this.name);
}
//
// static methods
//
/**
* Compare two byte arrays.
*
* @return a negative integer, zero, or a positive integer
* as defined by {@link #compareTo(byte[])}.
*/
static int compareBytes(byte[] a1, byte[] a2) {
if (a1==a2)
return 0;
int len1 = (a1==null ? 0 : a1.length);
int len2 = (a2==null ? 0 : a2.length);
int n = Math.min(len1, len2);
byte b1, b2;
for (int i=0; i<n; i++) {
b1 = a1[i];
b2 = a2[i];
if (b1 != b2)
return b1 - b2;
}
return len1 - len2;
}
.............
.............
}
Re: Hadoop with case-preservation and case-insensitivity
Posted by Steve Loughran <st...@apache.org>.
Doug Cutting wrote:
> Paul Sheer wrote:
>> I have the requirement to use Hadoop with case-insensitivity and
>> case-preservation ala Windows.
>
> I think you may have difficultly convincing folks that Hadoop should
> directly support this mode of operation, and it's also a bad idea to run
> a hacked version of HDFS, since that will be hard to maintain.
>
> The safest and simplest way to support this might be to layer it on top
> of the standard API. You can implement a FilterFileSystem that, when
> opening files or listing directories, uses case-insensitive comparisons.
> So, to open "/foo/bar" you'd first list "/" looking for subdirectories
> which case-insensitively match "foo", then, if one is found, list it
> looking for a file which case-insensitively matches "bar". Could this
> suffice?
>
> Doug
full windows case-logic is pretty bizarre, as you need to ignore case
all file operations ;mv lower LOWER would result in a file called
"lower" because of the rule that if there is a destination file whose
case-insensitive name matches that of the target file, it becomes the
destination name.
Other issues:
- it should be impossible to create two files in the same directory with
the same case-insensitive name.
- you need to take locale into account when comparing case. Turkey is
the testcase, as "I".toLower()!="i"; it's the place where you get the
bugreps when your logic is broken.
I would stay very clear of it.
Re: Hadoop with case-preservation and case-insensitivity
Posted by Doug Cutting <cu...@apache.org>.
Paul Sheer wrote:
> Sorry if I gave the impression that Hadoop ought to support this feature
> in general.
> No, I was only asking about my own setup and I'm happy to maintain my own
> private branch.
You didn't imply that Hadoop ought to support it. But maintaining your
own private branch is a bad idea long-term, and you'll not get a lot of
help here for doing that, since the goal here is to build a shared
version that we can all support together.
> Can you help by telling me if changes to INode.java are all the changes
> I need to make?
I don't know. There's a good chance it's not the only change you'd need
to make, and there's a good chance that folks might later make other
changes that break your version in strange and hard-to-detect ways. So,
if you do decide to maintain your own branch, I strongly suggest you
also write a thorough test suite for this feature.
Cheers,
Doug
Re: Hadoop with case-preservation and case-insensitivity
Posted by Paul Sheer <pa...@gmail.com>.
Thanks very much for the reply,
Sorry if I gave the impression that Hadoop ought to support this feature in
general.
No, I was only asking about my own setup and I'm happy to maintain my own
private branch.
Can you help by telling me if changes to INode.java are all the changes
I need to make?
The layer you describe is a great idea, so I will certainly consider this
option.
-paul
On Thu, Mar 5, 2009 at 8:48 PM, Doug Cutting <cu...@apache.org> wrote:
> Paul Sheer wrote:
>
>> I have the requirement to use Hadoop with case-insensitivity and
>> case-preservation ala Windows.
>>
>
> I think you may have difficultly convincing folks that Hadoop should
> directly support this mode of operation, and it's also a bad idea to run a
> hacked version of HDFS, since that will be hard to maintain.
>
> The safest and simplest way to support this might be to layer it on top of
> the standard API. You can implement a FilterFileSystem that, when opening
> files or listing directories, uses case-insensitive comparisons. So, to
> open "/foo/bar" you'd first list "/" looking for subdirectories which
> case-insensitively match "foo", then, if one is found, list it looking for a
> file which case-insensitively matches "bar". Could this suffice?
>
> Doug
>
Re: Hadoop with case-preservation and case-insensitivity
Posted by Doug Cutting <cu...@apache.org>.
Paul Sheer wrote:
> I have the requirement to use Hadoop with case-insensitivity and
> case-preservation ala Windows.
I think you may have difficultly convincing folks that Hadoop should
directly support this mode of operation, and it's also a bad idea to run
a hacked version of HDFS, since that will be hard to maintain.
The safest and simplest way to support this might be to layer it on top
of the standard API. You can implement a FilterFileSystem that, when
opening files or listing directories, uses case-insensitive comparisons.
So, to open "/foo/bar" you'd first list "/" looking for subdirectories
which case-insensitively match "foo", then, if one is found, list it
looking for a file which case-insensitively matches "bar". Could this
suffice?
Doug