You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "sam (JIRA)" <ji...@apache.org> on 2014/08/15 12:07:18 UTC

[jira] [Commented] (SPARK-1861) ArrayIndexOutOfBoundsException when reading bzip2 files

    [ https://issues.apache.org/jira/browse/SPARK-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098404#comment-14098404 ] 

sam commented on SPARK-1861:
----------------------------

It appears in Cloudera 5.1.0 this is fixed HADOOP-10614.  Does that mean if I now use Cloudera packaged Spark; 1.0.0-cdh5.1.0 that bzip2 will work?  If so, please update the Fix version.

> ArrayIndexOutOfBoundsException when reading bzip2 files
> -------------------------------------------------------
>
>                 Key: SPARK-1861
>                 URL: https://issues.apache.org/jira/browse/SPARK-1861
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> Hadoop uses CBZip2InputStream to decode bzip2 files. However, the implementation is not threadsafe and Spark may run multiple tasks in the same JVM, which leads to this error. This is not a problem for Hadoop MapReduce because Hadoop runs each task in a separate JVM.
> A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org