You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Bart Vandewoestyne <Ba...@telenet.be> on 2014/09/22 17:13:52 UTC

NumberFormatException when running mahout

Hello list,

I am trying to run the Big-Bench benchmark from 
https://github.com/intel-hadoop/Big-Bench/  Everything runs fine, except 
for query 20:

https://github.com/intel-hadoop/Big-Bench/tree/master/queries/q20

As you can see from the run.sh script in the above GitHub directory, 
query 20 consists of 6 steps.  After step 2, I have a 
/user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp directory 
in my HDFS that contains one file called 000000_0 and has a content that 
looks like

0 0.0 0.0 0.0 0
4 0.0 0.0 0.0 0
5 0.0 0.0 0.0 0
6 0.0 0.0 0.0 0
7 0.0 0.0 0.0 0
8 0.0 0.0 0.0 0
11 0.0 0.0 0.0 0
15 0.0 0.0 0.0 0
17 0.0 0.0 0.0 0
23 0.0 0.0 0.0 0
24 0.0 0.0 0.0 0
27 0.0 0.0 0.0 0
31 0.0 0.0 0.0 0
32 0.0 0.0 0.0 0
33 0.0 0.0 0.0 0
37 50.0 66.66666666666667 77.39147525947116 1
38 0.0 0.0 0.0 0
42 0.0 0.0 0.0 0
45 0.0 0.0 0.0 0
47 0.0 0.0 0.0 0
48 100.0 88.88888888888889 34.90258447119835 1
51 0.0 0.0 0.0 0
52 50.0 7.6923076923076925 0.16715403929463815 1
... and so on ...
... and so on ...
... and so on ...
15051 0.0 0.0 0.0 0
15052 0.0 0.0 0.0 0
15053 0.0 0.0 0.0 0
15056 50.0 26.923076923076923 16.24084215689073 2
15057 100.0 100.0 69.601

So until step 2, i think everything went fine.  However, in step 3, I 
get the following NumberFormatException:

------------------------------------------------------------------------
q20 Step 3/6: Generating sparse vectors
Command mahout org.apache.mahout.clustering.conversion.InputDriver -i 
/user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp -o 
/user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp/Vec -v 
org.apache.mahout.math.RandomAccessSparseVector
tmp output: 
/user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp/Vec
=========================
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using 
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/bin/../lib/hadoop/bin/hadoop 
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: 
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/mahout/mahout-examples-0.9-cdh5.1.2-job.jar
14/09/22 17:04:39 WARN driver.MahoutDriver: No 
org.apache.mahout.clustering.conversion.InputDriver.props found on 
classpath, will use command-line arguments only
14/09/22 17:04:41 INFO client.RMProxy: Connecting to ResourceManager at 
sandy-quad-1.sslab.lan/192.168.35.75:8032
14/09/22 17:04:42 WARN mapreduce.JobSubmitter: Hadoop command-line 
option parsing not performed. Implement the Tool interface and execute 
your application with ToolRunner to remedy this.
14/09/22 17:04:42 INFO input.FileInputFormat: Total input paths to 
process : 1
14/09/22 17:04:42 INFO mapreduce.JobSubmitter: number of splits:1
14/09/22 17:04:43 INFO mapreduce.JobSubmitter: Submitting tokens for 
job: job_1410945757266_2536
14/09/22 17:04:43 INFO impl.YarnClientImpl: Submitted application 
application_1410945757266_2536
14/09/22 17:04:43 INFO mapreduce.Job: The url to track the job: 
http://sandy-quad-1.sslab.lan:8088/proxy/application_1410945757266_2536/
14/09/22 17:04:43 INFO mapreduce.Job: Running job: job_1410945757266_2536
14/09/22 17:04:55 INFO mapreduce.Job: Job job_1410945757266_2536 running 
in uber mode : false
14/09/22 17:04:55 INFO mapreduce.Job:  map 0% reduce 0%
14/09/22 17:05:01 INFO mapreduce.Job: Task Id : 
attempt_1410945757266_2536_m_000000_0, Status : FAILED
Error: java.lang.NumberFormatException: For input string: "\N"
	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
	at java.lang.Double.valueOf(Double.java:504)
	at 
org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:48)
	at 
org.apache.mahout.clustering.conversion.InputMapper.map(InputMapper.java:34)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
------------------------------------------------------------------------

Apparently, the mahout command line that was used is

mahout org.apache.mahout.clustering.conversion.InputDriver \
   -i /user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp \
   -o /user/bart/benchmarks/bigbench/temp/q20_hive_RUN_QUERY_0_temp/Vec \
   -v org.apache.mahout.math.RandomAccessSparseVector

and for as far as I can tell, the directory specified with the -i flag 
exists.  Unfortunately, from the NumberFormatException I get, it looks 
as if Mahout doesn't parse the values from my data file in HDFS correctly.

Any hints on how to get this running are highly appreciated!

Kind regards,
Bart

Re: NumberFormatException when running mahout

Posted by Bart Vandewoestyne <Ba...@telenet.be>.
On 09/23/2014 07:48 AM, Ted Dunning wrote:
> On Mon, Sep 22, 2014 at 8:13 AM, Bart Vandewoestyne <
> Bart.Vandewoestyne@telenet.be> wrote:
>
>> 14/09/22 17:05:01 INFO mapreduce.Job: Task Id :
>> attempt_1410945757266_2536_m_000000_0, Status : FAILED
>> Error: java.lang.NumberFormatException: For input string: "\N"
>>          at sun.misc.FloatingDecimal.readJavaFormatString(
>> FloatingDecimal.java:1241)
>>
>
> Looks like you have a null in your data which is being encoded as \N.

I had also contacted the authors of the Big-Bench benchmark about this 
issue, and in the meanwhile they have solved the problem.  See the 
following GitHub commits:

https://github.com/intel-hadoop/Big-Bench/commit/d28b83d128240641a19a0dca67a0ecb02b2cc6b2

https://github.com/intel-hadoop/Big-Bench/commit/b0350a923e0ee005b773e73a67491a28eadd8060

Problem solved :-)

Kind regards,
Bart

Re: NumberFormatException when running mahout

Posted by Ted Dunning <te...@gmail.com>.
On Mon, Sep 22, 2014 at 8:13 AM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> 14/09/22 17:05:01 INFO mapreduce.Job: Task Id :
> attempt_1410945757266_2536_m_000000_0, Status : FAILED
> Error: java.lang.NumberFormatException: For input string: "\N"
>         at sun.misc.FloatingDecimal.readJavaFormatString(
> FloatingDecimal.java:1241)
>

Looks like you have a null in your data which is being encoded as \N.