You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (JIRA)" <ji...@apache.org> on 2016/06/15 19:04:09 UTC

[jira] [Resolved] (SPARK-15826) PipedRDD to allow configurable char encoding

     [ https://issues.apache.org/jira/browse/SPARK-15826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shixiong Zhu resolved SPARK-15826.
----------------------------------
       Resolution: Fixed
         Assignee: Tejas Patil
    Fix Version/s: 2.0.0

> PipedRDD to allow configurable char encoding
> --------------------------------------------
>
>                 Key: SPARK-15826
>                 URL: https://issues.apache.org/jira/browse/SPARK-15826
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Tejas Patil
>            Assignee: Tejas Patil
>            Priority: Trivial
>             Fix For: 2.0.0
>
>
> Encountered an issue wherein the code works in some cluster but fails on another one for the same input. After debugging realised that PipedRDD is picking default char encoding from the JVM which may be different across different platforms. Making it use UTF-8 encoding just like `ScriptTransformation` does.
> Stack trace:
> {noformat}
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
> 	at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
> 	at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
> 	at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> 	at java.io.InputStreamReader.read(InputStreamReader.java:184)
> 	at java.io.BufferedReader.fill(BufferedReader.java:161)
> 	at java.io.BufferedReader.readLine(BufferedReader.java:324)
> 	at java.io.BufferedReader.readLine(BufferedReader.java:389)
> 	at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:67)
> 	at org.apache.spark.rdd.PipedRDD$$anon$1.hasNext(PipedRDD.scala:185)
> 	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1612)
> 	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
> 	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1160)
> 	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
> 	at org.apache.spark.SparkContext$$anonfun$runJob$6.apply(SparkContext.scala:1868)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org