You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by yu <yu...@iastate.edu> on 2014/12/15 22:49:02 UTC
NumberFormatException
Hello, everyone
I know 'NumberFormatException' is due to the reason that String can not be
parsed properly, but I really can not find any mistakes for my code. I hope
someone may kindly help me.
My hdfs file is as follows:
8,22
3,11
40,10
49,47
48,29
24,28
50,30
33,56
4,20
30,38
...
So each line contains an integer + "," + an integer + "\n"
My code is as follows:
object StreamMonitor {
def main(args: Array[String]): Unit = {
val myFunc = (str: String) => {
val strArray = str.trim().split(",")
(strArray(0).toInt, strArray(1).toInt)
}
val conf = new SparkConf().setAppName("StreamMonitor");
val ssc = new StreamingContext(conf, Seconds(30));
val datastream = ssc.textFileStream("/user/yu/streaminput");
val newstream = datastream.map(myFunc)
newstream.saveAsTextFiles("output/", "");
ssc.start()
ssc.awaitTermination()
}
}
The exception info is:
14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
(TID 0, h3): java.lang.NumberFormatException: For input string: "8"
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
java.lang.Integer.parseInt(Integer.java:492)
java.lang.Integer.parseInt(Integer.java:527)
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
So based on the above info, "8" is the first number in the file and I think
it should be parsed to integer without any problems.
I know it may be a very stupid question and the answer may be very easy. But
I really can not find the reason. I am thankful to anyone who helps!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: NumberFormatException
Posted by Sean Owen <so...@cloudera.com>.
That certainly looks surprising. Are you sure there are no unprintable
characters in the file?
On Mon, Dec 15, 2014 at 9:49 PM, yu <yu...@iastate.edu> wrote:
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, h3): java.lang.NumberFormatException: For input string: "8"
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: NumberFormatException
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
There could be some other character like a space or ^M etc. You could try
the following and see the actual row.
val newstream = datastream.map(row => {
try{
val strArray = str.trim().split(",")
(strArray(0).toInt, strArray(1).toInt)
//Instead try this
//*(strArray(0).trim().toInt, strArray(1).trim().toInt)*
}catch{ case e: Exception => println("W000t!! Exception!! => " + e + "\n
The line was :" + row); (0, 0) }
})
Thanks
Best Regards
On Tue, Dec 16, 2014 at 3:19 AM, yu <yu...@iastate.edu> wrote:
>
> Hello, everyone
>
> I know 'NumberFormatException' is due to the reason that String can not be
> parsed properly, but I really can not find any mistakes for my code. I hope
> someone may kindly help me.
> My hdfs file is as follows:
> 8,22
> 3,11
> 40,10
> 49,47
> 48,29
> 24,28
> 50,30
> 33,56
> 4,20
> 30,38
> ...
>
> So each line contains an integer + "," + an integer + "\n"
> My code is as follows:
> object StreamMonitor {
> def main(args: Array[String]): Unit = {
> val myFunc = (str: String) => {
> val strArray = str.trim().split(",")
> (strArray(0).toInt, strArray(1).toInt)
> }
> val conf = new SparkConf().setAppName("StreamMonitor");
> val ssc = new StreamingContext(conf, Seconds(30));
> val datastream = ssc.textFileStream("/user/yu/streaminput");
> val newstream = datastream.map(myFunc)
> newstream.saveAsTextFiles("output/", "");
> ssc.start()
> ssc.awaitTermination()
> }
>
> }
>
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, h3): java.lang.NumberFormatException: For input string: "8"
>
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> java.lang.Integer.parseInt(Integer.java:492)
> java.lang.Integer.parseInt(Integer.java:527)
>
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> So based on the above info, "8" is the first number in the file and I think
> it should be parsed to integer without any problems.
> I know it may be a very stupid question and the answer may be very easy.
> But
> I really can not find the reason. I am thankful to anyone who helps!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
Re: NumberFormatException
Posted by Imran Rashid <im...@therashids.com>.
wow, really weird. My intuition is the same as everyone else's, some
unprintable character. Here's a couple more debugging tricks I've used in
the past:
//set up an accumulator to catch the bad rows as a side-effect
val nBadRows = sc.accumulator(0)
val nGoodRows = sc.accumulator(0)
val badRows =
sc.accumulableCollection(scala.collection.mutable.Set[String]())
//flatMap so that you can skip the bad rows
datastream.flatMap{ str =>
try {
val strArray = str.trim().split(",")
val result = (strArray(0).toInt, strArray(1).toInt)
nGoodRows += 1
Some(result)
} catch {
case NumberFormatException =>
nBadRows += 1
badRows += str
None
}
}.saveAsTextFile(...)
if (badRows.value.nonEmpty) {
println("**** BAD ROWS *****")
badRows.value.foreach{str =>
//look at a bit more info from each string ... print out length & each
character one by one
println(str)
println(str.length)
str.foreach{println}
println()
}
}
// if it is some data corruption, that you just have to live with, you
might leave the flatMap / try
// even when you'e running it for real. But then you might want to add a
little check that there aren't
// toooooooo many bad rows. Note that the accumulator[Set] will run out of
mem if there are really
// a ton of bad rows, in which case you might switch to a reservoir sample
val badFrac = nBadRows.value / (nGoodRows.value + nBadRows.value.toDouble)
println(s"${nBadRows.value} bad rows; ${nGoodRows.value} good rows;
($badFrac) bad fraction")
if (badFrac > maxAllowedBadRows) {
throw new RuntimeException("too many bad rows! " + badFrac)
}
On Mon, Dec 15, 2014 at 3:49 PM, yu <yu...@iastate.edu> wrote:
>
> Hello, everyone
>
> I know 'NumberFormatException' is due to the reason that String can not be
> parsed properly, but I really can not find any mistakes for my code. I hope
> someone may kindly help me.
> My hdfs file is as follows:
> 8,22
> 3,11
> 40,10
> 49,47
> 48,29
> 24,28
> 50,30
> 33,56
> 4,20
> 30,38
> ...
>
> So each line contains an integer + "," + an integer + "\n"
> My code is as follows:
> object StreamMonitor {
> def main(args: Array[String]): Unit = {
> val myFunc = (str: String) => {
> val strArray = str.trim().split(",")
> (strArray(0).toInt, strArray(1).toInt)
> }
> val conf = new SparkConf().setAppName("StreamMonitor");
> val ssc = new StreamingContext(conf, Seconds(30));
> val datastream = ssc.textFileStream("/user/yu/streaminput");
> val newstream = datastream.map(myFunc)
> newstream.saveAsTextFiles("output/", "");
> ssc.start()
> ssc.awaitTermination()
> }
>
> }
>
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0
> (TID 0, h3): java.lang.NumberFormatException: For input string: "8"
>
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> java.lang.Integer.parseInt(Integer.java:492)
> java.lang.Integer.parseInt(Integer.java:527)
>
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> So based on the above info, "8" is the first number in the file and I think
> it should be parsed to integer without any problems.
> I know it may be a very stupid question and the answer may be very easy.
> But
> I really can not find the reason. I am thankful to anyone who helps!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
Re: NumberFormatException
Posted by Harihar Nahak <hn...@wynyardgroup.com>.
Hi Yu,
Try this :
val data = csv.map( line => line.split(",").map(elem => elem.trim)) //lines
in rows
data.map( rec => (rec(0).toInt, rec(1).toInt))
to convert into integer.
On 16 December 2014 at 10:49, yu [via Apache Spark User List] <
ml-node+s1001560n20694h78@n3.nabble.com> wrote:
>
> Hello, everyone
>
> I know 'NumberFormatException' is due to the reason that String can not be
> parsed properly, but I really can not find any mistakes for my code. I hope
> someone may kindly help me.
> My hdfs file is as follows:
> 8,22
> 3,11
> 40,10
> 49,47
> 48,29
> 24,28
> 50,30
> 33,56
> 4,20
> 30,38
> ...
>
> So each line contains an integer + "," + an integer + "\n"
> My code is as follows:
> object StreamMonitor {
> def main(args: Array[String]): Unit = {
> val myFunc = (str: String) => {
> val strArray = str.trim().split(",")
> (strArray(0).toInt, strArray(1).toInt)
> }
> val conf = new SparkConf().setAppName("StreamMonitor");
> val ssc = new StreamingContext(conf, Seconds(30));
> val datastream = ssc.textFileStream("/user/yu/streaminput");
> val newstream = datastream.map(myFunc)
> newstream.saveAsTextFiles("output/", "");
> ssc.start()
> ssc.awaitTermination()
> }
>
> }
>
> The exception info is:
> 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage
> 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: "8"
>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>
> java.lang.Integer.parseInt(Integer.java:492)
> java.lang.Integer.parseInt(Integer.java:527)
>
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
> scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9)
> StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984)
>
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
>
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> java.lang.Thread.run(Thread.java:745)
>
> So based on the above info, "8" is the first number in the file and I
> think it should be parsed to integer without any problems.
> I know it may be a very stupid question and the answer may be very easy.
> But I really can not find the reason. I am thankful to anyone who helps!
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h95@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
--
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hnahak@wynyardgroup.com | Extn: 8019
-----
--Harihar
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694p20696.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.