You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Dan Brickley <da...@danbri.org> on 2011/07/31 19:41:53 UTC

OSX/Hadoop problem: filename 'LICENSE' and dir 'license/' clash in mahout-examples-0.6-SNAPSHOT-job.jar

With SVN 'At revision 1152597.', and freshly rebuilt:

jar -tvf /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
| grep -i license

 19355 Sat Feb 26 19:16:30 CET 2011 META-INF/LICENSE.txt
 11358 Sun Apr 11 21:45:12 CEST 2010 META-INF/LICENSE
  1596 Mon Dec 20 15:47:30 CET 2010 LICENSE
     0 Sun Dec 01 11:57:24 CET 2002 license/
  4083 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-documentation.txt
  3595 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-software.txt
   804 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.sax.txt
  2827 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.txt
  1274 Sun Dec 01 11:57:24 CET 2002 license/README.dom.txt
   715 Sun Dec 01 11:57:24 CET 2002 license/README.sax.txt
   672 Sun Dec 01 11:57:24 CET 2002 license/README.txt


This situation seems to quite confuse Hadoop. The underlying OSX
filesystem doesn't support file and directory names differing only by
case; see http://developer.apple.com/library/mac/#documentation/Java/Conceptual/Java14Development/01-JavaOverview/JavaOverview.html

mahout  lucene.vector --dir solr/data/index/ --output bar/vecs --field
label --idField id --dictOut bar/dict.out --norm 2

Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2
HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf
MAHOUT-JOB: /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar

Exception in thread "main" java.io.IOException: Mkdirs failed to
create /tmp/hadoop/hadoop-unjar5018665014541152120/license
	at org.apache.hadoop.util.RunJar.unJar(RunJar.java:48)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

That Hadoop error message is somewhat unhelpful, especially for those
who doubt their hadoop knowhow; but technically correct. The
/tmp/hadoop and its subdirectory exist and are writeable. The problem
is the specific file/dir names being written into it. That wasn't so
obvious. So I went chasing around configuring hadoop tmp dirs,
checking it existed and was writable in local and in hdfs dirs, ...
then ... I finally, belatedly tried unzipping the jar with 'jar -xvf '
to see what was special about 'license', and got the same error from
commandline 'jar' that upset !file.getParentFile().isDirectory() in
Hadoop's ./src/core/org/apache/hadoop/util/RunJar.java:

java.io.IOException: license : could not create directory
	at sun.tools.jar.Main.extractFile(Main.java:909)
	at sun.tools.jar.Main.extract(Main.java:852)
	at sun.tools.jar.Main.run(Main.java:242)
	at sun.tools.jar.Main.main(Main.java:1149)

(this is the same error that trips up hadoop)

This seems to be reproducible; I did an svn up, mvn clean and mvn
package, let all the tests run and pass, and confirm that the same
thing happens.

I compared an early job .jar from 0.5, where all was fine. Any
suggestions for best quick fix?

cheers,

Dan


ps.
java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)

Re: OSX/Hadoop problem: filename 'LICENSE' and dir 'license/' clash in mahout-examples-0.6-SNAPSHOT-job.jar

Posted by Dan Brickley <da...@danbri.org>.
On 1 August 2011 09:19, Sean Owen <sr...@gmail.com> wrote:
> Great report here. I imagine the answer is to make 'license' into
> 'licenses'. Let me have a look and file a JIRA with patch.

Thanks. I tried adding into ./examples/src/main/assembly/job.xml

        <exclude>LICENSE</exclude>
        <exclude>license*</exclude>

...but that was a bad guess. Renaming to 'licenses' sounds better and
simpler; I just didn't track down yet where it's actually coming from.

cheers,

Dan

Re: OSX/Hadoop problem: filename 'LICENSE' and dir 'license/' clash in mahout-examples-0.6-SNAPSHOT-job.jar

Posted by Sean Owen <sr...@gmail.com>.
Great report here. I imagine the answer is to make 'license' into
'licenses'. Let me have a look and file a JIRA with patch.

Sean

On Sun, Jul 31, 2011 at 6:41 PM, Dan Brickley <da...@danbri.org> wrote:
> With SVN 'At revision 1152597.', and freshly rebuilt:
>
> jar -tvf /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
> | grep -i license
>
>  19355 Sat Feb 26 19:16:30 CET 2011 META-INF/LICENSE.txt
>  11358 Sun Apr 11 21:45:12 CEST 2010 META-INF/LICENSE
>  1596 Mon Dec 20 15:47:30 CET 2010 LICENSE
>     0 Sun Dec 01 11:57:24 CET 2002 license/
>  4083 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-documentation.txt
>  3595 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-software.txt
>   804 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.sax.txt
>  2827 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.txt
>  1274 Sun Dec 01 11:57:24 CET 2002 license/README.dom.txt
>   715 Sun Dec 01 11:57:24 CET 2002 license/README.sax.txt
>   672 Sun Dec 01 11:57:24 CET 2002 license/README.txt
>
>
> This situation seems to quite confuse Hadoop. The underlying OSX
> filesystem doesn't support file and directory names differing only by
> case; see http://developer.apple.com/library/mac/#documentation/Java/Conceptual/Java14Development/01-JavaOverview/JavaOverview.html
>
> mahout  lucene.vector --dir solr/data/index/ --output bar/vecs --field
> label --idField id --dictOut bar/dict.out --norm 2
>
> Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2
> HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf
> MAHOUT-JOB: /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
>
> Exception in thread "main" java.io.IOException: Mkdirs failed to
> create /tmp/hadoop/hadoop-unjar5018665014541152120/license
>        at org.apache.hadoop.util.RunJar.unJar(RunJar.java:48)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> That Hadoop error message is somewhat unhelpful, especially for those
> who doubt their hadoop knowhow; but technically correct. The
> /tmp/hadoop and its subdirectory exist and are writeable. The problem
> is the specific file/dir names being written into it. That wasn't so
> obvious. So I went chasing around configuring hadoop tmp dirs,
> checking it existed and was writable in local and in hdfs dirs, ...
> then ... I finally, belatedly tried unzipping the jar with 'jar -xvf '
> to see what was special about 'license', and got the same error from
> commandline 'jar' that upset !file.getParentFile().isDirectory() in
> Hadoop's ./src/core/org/apache/hadoop/util/RunJar.java:
>
> java.io.IOException: license : could not create directory
>        at sun.tools.jar.Main.extractFile(Main.java:909)
>        at sun.tools.jar.Main.extract(Main.java:852)
>        at sun.tools.jar.Main.run(Main.java:242)
>        at sun.tools.jar.Main.main(Main.java:1149)
>
> (this is the same error that trips up hadoop)
>
> This seems to be reproducible; I did an svn up, mvn clean and mvn
> package, let all the tests run and pass, and confirm that the same
> thing happens.
>
> I compared an early job .jar from 0.5, where all was fine. Any
> suggestions for best quick fix?
>
> cheers,
>
> Dan
>
>
> ps.
> java -version
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode)
>