You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sameer Tilak <ss...@live.com> on 2014/07/28 22:03:22 UTC

java.lang.NumberFormatException.forInputString



Hi everyone,I have a TSV file (around 4 GB). I have creted a hive table on that  using the following command. It works finr without indexing. However, when I create an index based on 2 columsn I get the following error:

create table products (user_id String, session_id String, ordering_date Date, product_id String, reorder_date Date, ordering_mode int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
ALTER INDEX user_id_ordering_mode_index ON products REBUILD;
SET hive.index.compact.file=/user/hive/warehouse/default__ user_id_ordering_mode_index__;SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

hive> select count (*) from products where ordering_mode = 1;Total MapReduce jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapred.reduce.tasks=<number>java.lang.NumberFormatException: For input string: "00000001218	1	hdfs://pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products_clean.tsv	16599219"	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)	at java.lang.Long.parseLong(Long.java:441)	at java.lang.Long.parseLong(Long.java:483)	at org.apache.hadoop.hive.ql.index.HiveIndexResult.add(HiveIndexResult.java:174)	at org.apache.hadoop.hive.ql.index.HiveIndexResult.<init>(HiveIndexResult.java:127)	at org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:120)	at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)	at java.security.AccessController.doPrivileged(Native Method)	at javax.security.auth.Subject.doAs(Subject.java:415)	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)	at java.security.AccessController.doPrivileged(Native Method)	at javax.security.auth.Subject.doAs(Subject.java:415)	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)	at java.lang.reflect.Method.invoke(Method.java:606)	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)Job Submission failed with exception 'java.lang.NumberFormatException(For input string: "00000001218	1	hdfs://pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products.tsv	16599219")'FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 		 	   		  

RE: java.lang.NumberFormatException.forInputString

Posted by Sameer Tilak <ss...@live.com>.
Hi Nishant,Yes, that works, I don't see the crash any more. I also don't use the following sentence:SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
However, I don't see any benefit of indexing now. With or without indexing I get the results back in same time! 

Date: Mon, 28 Jul 2014 14:57:26 -0700
Subject: Re: java.lang.NumberFormatException.forInputString
From: nishant.k02@gmail.com
To: user@hive.apache.org; sstilak@live.com

Hi Sameer,
Try the following: 

CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode)  AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;


Let me know what it says.


On Mon, Jul 28, 2014 at 1:03 PM, Sameer Tilak <ss...@live.com> wrote:







Hi everyone,I have a TSV file (around 4 GB). I have creted a hive table on that  using the following command. It works finr without indexing. However, when I create an index based on 2 columsn I get the following error:


create table products (user_id String, session_id String, ordering_date Date, product_id String, reorder_date Date, ordering_mode int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;


CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;

ALTER INDEX user_id_ordering_mode_index ON products REBUILD;

SET hive.index.compact.file=/user/hive/warehouse/default__ user_id_ordering_mode_index__;
SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;


hive> select count (*) from products where ordering_mode = 1;
Total MapReduce jobs = 1Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>java.lang.NumberFormatException: For input string: "00000001218	1	hdfs://pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products_clean.tsv	16599219"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:441)	at java.lang.Long.parseLong(Long.java:483)
	at org.apache.hadoop.hive.ql.index.HiveIndexResult.add(HiveIndexResult.java:174)
	at org.apache.hadoop.hive.ql.index.HiveIndexResult.<init>(HiveIndexResult.java:127)
	at org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:120)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
	at java.security.AccessController.doPrivileged(Native Method)	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.lang.NumberFormatException(For input string: "00000001218	1	hdfs://pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products.tsv	16599219")'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 		 	   		  

 		 	   		  

Re: java.lang.NumberFormatException.forInputString

Posted by Nishant Kelkar <ni...@gmail.com>.
Hi Sameer,

Try the following:

CREATE INDEX user_id_ordering_mode_index ON TABLE
products(user_id,ordering_mode)
 AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS
TEXTFILE;

Let me know what it says.


On Mon, Jul 28, 2014 at 1:03 PM, Sameer Tilak <ss...@live.com> wrote:

>  Hi everyone,
> I have a TSV file (around 4 GB). I have creted a hive table on that  using
> the following command. It works finr without indexing. However, when I
> create an index based on 2 columsn I get the following error:
>
>
> create table products (user_id String, session_id String, ordering_date
> Date, product_id String, reorder_date Date, ordering_mode int) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
>
>
> CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode)
> AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH
> DEFERRED REBUILD;
>
> ALTER INDEX user_id_ordering_mode_index ON products REBUILD;
>
> SET hive.index.compact.file=/user/hive/warehouse/default__
>  user_id_ordering_mode_index__;
> SET
> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
>
>
> hive> select count (*) from products where ordering_mode = 1;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> java.lang.NumberFormatException: For input string: "00000001218 1 hdfs://
> pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products_clean.tsv
> 16599219"
> at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:441)
> at java.lang.Long.parseLong(Long.java:483)
> at
> org.apache.hadoop.hive.ql.index.HiveIndexResult.add(HiveIndexResult.java:174)
> at
> org.apache.hadoop.hive.ql.index.HiveIndexResult.<init>(HiveIndexResult.java:127)
> at
> org.apache.hadoop.hive.ql.index.HiveIndexedInputFormat.getSplits(HiveIndexedInputFormat.java:120)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:564)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:559)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:559)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:550)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
> at
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 'java.lang.NumberFormatException(For
> input string: "00000001218 1 hdfs://
> pzxdcc0250.cdbt.pldc.kp.org:8020/user/hive/warehouse/products/products.tsv
> 16599219")'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>