You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Bryan Jeffrey <br...@gmail.com> on 2016/07/25 18:23:30 UTC

Spark 2.0

All,

I had three questions:

(1) Is there a timeline for stable Spark 2.0 release?  I know the 'preview'
build is out there, but was curious what the timeline was for full
release. Jira seems to indicate that there should be a release 7/27.

(2)  For 'continuous' datasets there has been a lot of discussion. One item
that came up in tickets was the idea that 'count()' and other functions do
not apply to continuous datasets: https://github.com/apache/spark/pull/12080.
In this case what is the intended procedure to calculate a streaming
statistic based on an interval (e.g. count the number of records in a 2
minute window every 2 minutes)?

(3) In previous releases (1.6.1) the call to DStream / RDD repartition w/ a
number of partitions set to zero silently deletes data.  I have looked in
Jira for a similar issue, but I do not see one.  I would like to address
this (and would likely be willing to go fix it myself).  Should I just
create a ticket?

Thank you,

Bryan Jeffrey

Re: Spark 2.0

Posted by Pedro Rodriguez <sk...@gmail.com>.

Spark 2.0 vote for RC5 passed last Friday night so it will probably be
released early this week if I had to guess.

On Mon, Jul 25, 2016 at 12:23 PM, Bryan Jeffrey <br...@gmail.com>
wrote:

> All,
>
> I had three questions:
>
> (1) Is there a timeline for stable Spark 2.0 release?  I know the
> 'preview' build is out there, but was curious what the timeline was for
> full release. Jira seems to indicate that there should be a release 7/27.
>
> (2)  For 'continuous' datasets there has been a lot of discussion. One
> item that came up in tickets was the idea that 'count()' and other
> functions do not apply to continuous datasets:
> https://github.com/apache/spark/pull/12080.  In this case what is the
> intended procedure to calculate a streaming statistic based on an interval
> (e.g. count the number of records in a 2 minute window every 2 minutes)?
>
> (3) In previous releases (1.6.1) the call to DStream / RDD repartition w/
> a number of partitions set to zero silently deletes data.  I have looked in
> Jira for a similar issue, but I do not see one.  I would like to address
> this (and would likely be willing to go fix it myself).  Should I just
> create a ticket?
>
> Thank you,
>
> Bryan Jeffrey
>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Re: Spark 2.0

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi Bryan,

Excellent questions about the upcoming 2.0! Took me a while to find
the answer about structured streaming.

Seen http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time
? That may be relevant to your question 2.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Jul 25, 2016 at 8:23 PM, Bryan Jeffrey <br...@gmail.com> wrote:
> All,
>
> I had three questions:
>
> (1) Is there a timeline for stable Spark 2.0 release?  I know the 'preview'
> build is out there, but was curious what the timeline was for full release.
> Jira seems to indicate that there should be a release 7/27.
>
> (2)  For 'continuous' datasets there has been a lot of discussion. One item
> that came up in tickets was the idea that 'count()' and other functions do
> not apply to continuous datasets:
> https://github.com/apache/spark/pull/12080.  In this case what is the
> intended procedure to calculate a streaming statistic based on an interval
> (e.g. count the number of records in a 2 minute window every 2 minutes)?
>
> (3) In previous releases (1.6.1) the call to DStream / RDD repartition w/ a
> number of partitions set to zero silently deletes data.  I have looked in
> Jira for a similar issue, but I do not see one.  I would like to address
> this (and would likely be willing to go fix it myself).  Should I just
> create a ticket?
>
> Thank you,
>
> Bryan Jeffrey
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org