You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/10/18 09:10:00 UTC

[jira] [Resolved] (SPARK-17973) is there any way to split Dataset into 2 or more based on the given condition

     [ https://issues.apache.org/jira/browse/SPARK-17973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-17973.
-------------------------------
    Resolution: Not A Problem

Questions should go on the mailing list: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

I think you're looking for an operation like partition() in Scala. There isn't a way to do this. You can filter the Dataset twice, which ends up being about the same thing as you'd get with something like partition(). Either way the two child Datasets would have to evaluate the parent twice. You can cache the parent to avoid recomputing.

> is there any way to split Dataset into 2 or more based on the given condition
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-17973
>                 URL: https://issues.apache.org/jira/browse/SPARK-17973
>             Project: Spark
>          Issue Type: Question
>          Components: Java API
>            Reporter: sriram kumar
>            Priority: Critical
>
> i cannot able to split Dataset exactly with condition.  i have a scenario where i need to split single Dataset into 4 dataset and non of matched to be in 5th dataset. here bellow i am taking some baby steps. 
> this is my data. 
> +---------------+-----+--------------+----+----+
> |           Name|Class|          Dorm|Room| GPA|
> +---------------+-----+--------------+----+----+
> |Sally Whittaker| 2018|McCarren House| 312|3.75|
> |Belinda Jameson| 2017| Cushing House| 148|3.52|
> |     Jeff Smith| 2018|Prescott House|17-D| 3.2|
> |    Sandy Allen| 2019|  Oliver House| 108|3.48|
> +---------------+-----+--------------+----+----+
> Dataset<Row> s1 = s.selectExpr("upper(Name) as Name" , "Class");
> s1.filter("Class > 2017 and Room > 200").show();
> +---------------+-----+
> |           Name|Class|
> +---------------+-----+
> |SALLY WHITTAKER| 2018|
> +---------------+-----+
> then for what code can i get remaining data in the Dataset.?
> i tryed below one. but this going wrong. 
> s1.filter("!(Class > 2017 and Room > 200)").show();
> 	
> +---------------+-----+
> |           Name|Class|
> +---------------+-----+
> |BELINDA JAMESON| 2017|
> |    SANDY ALLEN| 2019|
> +---------------+-----+
> i do got to know why it going wrong. but i don't get answer , how to get those filter data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org