You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2017/03/18 18:10:42 UTC

[jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame

    [ https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931310#comment-15931310 ] 

Felix Cheung commented on SPARK-20007:
--------------------------------------

+1 - also I've been meaning to add checks for data type mismatch as well. When schema is specified but it doesn't match the returned data.frame the error is very hard to track down

> Make SparkR apply() functions robust to workers that return empty data.frame
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-20007
>                 URL: https://issues.apache.org/jira/browse/SPARK-20007
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Hossein Falaki
>
> When using {{gapply()}} (or other members of {{apply()}} family) with a schema, Spark will try to parse data returned form the R process on each worker as Spark DataFrame Rows based on the schema. In this case our provided schema suggests that we have six column. When an R worker returns results to JVM, SparkSQL will try to access its columns one by one and cast them to proper types. If R worker returns nothing, JVM will throw {{ArrayIndexOutOfBoundsException}} exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org