You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2016/04/05 04:04:25 UTC

[jira] [Resolved] (SPARK-14287) Method to determine if Dataset is bounded or not

     [ https://issues.apache.org/jira/browse/SPARK-14287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Armbrust resolved SPARK-14287.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Issue resolved by pull request 12080
[https://github.com/apache/spark/pull/12080]

> Method to determine if Dataset is bounded or not
> ------------------------------------------------
>
>                 Key: SPARK-14287
>                 URL: https://issues.apache.org/jira/browse/SPARK-14287
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Streaming
>            Reporter: Burak Yavuz
>             Fix For: 2.0.0
>
>
> With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not make sense, e.g. Dataset.count().
> A simple API is required to check whether the data in a Dataset is bounded or unbounded. This will allow users to check whether their Dataset is in streaming mode or not. ML algorithms may check if the data is unbounded and throw an exception for example.
> The implementation of this method is simple, however naming it is the challenge. Some possible names for this method are:
>  - isStreaming
>  - isContinuous
>  - isBounded
>  - isUnbounded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org