You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Oksana Romankova (JIRA)" <ji...@apache.org> on 2016/01/26 20:27:39 UTC

[jira] [Issue Comment Deleted] (SPARK-8697) MatchIterator not serializable exception in RegexTokenizer

     [ https://issues.apache.org/jira/browse/SPARK-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oksana Romankova updated SPARK-8697:
------------------------------------
    Comment: was deleted

(was: Spark 1.4.1

It seems like the issue happens when DataFrame is created frm existing RDD using toDF() and if RegexTokenizer is used to extract matches with setGaps(false). If you load the file from sqlContext.read.load this doesn't happen.

The exception is:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 2) had a not serializable result: scala.util.matching.Regex$MatchIterator
Serialization stack:

	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
	at scala.Option.foreach(Option.scala:236)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
)

> MatchIterator not serializable exception in RegexTokenizer
> ----------------------------------------------------------
>
>                 Key: SPARK-8697
>                 URL: https://issues.apache.org/jira/browse/SPARK-8697
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Priority: Minor
>
> I'm not sure whether this is a real bug or not. In REPL, I saw MatchIterator not serializable exception in RegexTokeinzer during some ad-hoc testing. However, I couldn't reproduce this issue. Maybe it is a REPL bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org