You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2016/05/24 23:02:47 UTC

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/13286

    [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming

    ## What changes were proposed in this pull request?
    Currently structured streaming only supports append output mode.  This PR adds the following.
    
    - Added support for Complete output mode in the internal state store, analyzer and planner.
    - Added public API in Scala and Python for users to specify output mode
    - Added checks for unsupported combinations of output mode and DF operations
      - Plans with no aggregation should support only Append mode
      - Plans with aggregation should support only Update and Complete modes
      - Default output mode is Append mode (should we change this to automatically set to complete mode when there is aggregation?)
    - Added support for Complete output mode in Memory Sink. So Memory Sink supports append and complete, not update.
    
    ## How was this patch tested?
    Unit tests in various test suites
    - StreamingAggregationSuite: tests for complete mode
    - MemorySinkSuite: tests for checking behavior in Append and Complete modes. 
    - UnsupportedOperationSuite: tests for checking unsupported combinations of DF ops and output modes
    - DataFrameReaderWriterSuite: tests for checking that output mode cannot be called on static DFs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark complete-mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13286
    
----
commit 469d69aefea17abbb889a8983a59d83988aaff45
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-05-23T23:40:13Z

    First commit to support complete mode

commit 49746f4b5f8a5167fe033f858711fa5643031097
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-05-24T21:33:48Z

    Add public API for output mode and upgraded memory sink to support complete mode

commit 2786090bccd64945c273f9344c0493e4d93eec14
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-05-24T22:11:32Z

    Added unit test for MemorySink

commit 02b10ac4419f657e3756a95f352a29e20d01ad7d
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-05-24T22:25:41Z

    Added unit test to DataFrameReaderWriterSuite

commit 61af0573a112a54bab05c070de7a36b8c74703dc
Author: Tathagata Das <ta...@gmail.com>
Date:   2016-05-24T22:53:12Z

    Added python API for output mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13286: [SPARK-15517][SQL][STREAMING] Add support for complete o...

Posted by techaddict <gi...@git.apache.org>.

Github user techaddict commented on the issue:

    https://github.com/apache/spark/pull/13286
  
    @tdas updated my PR with exclusions for Append and Complete


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221434090
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59235/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222280806
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221434083
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222009156
  
    **[Test build #59420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59420/consoleFull)** for PR 13286 at commit [`bbd6022`](https://github.com/apache/spark/commit/bbd6022cf3238a78604c8ad43df4c917d792f37f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222286598
  
    @rxin @marmbrus Are you okay with current OutputMode design? Add unit test for Java compatibility of OutputMode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222036203
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59425/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64498404
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala ---
    @@ -82,40 +82,62 @@ case class StateStoreRestoreExec(
     case class StateStoreSaveExec(
         keyExpressions: Seq[Attribute],
         stateId: Option[OperatorStateId],
    +    returnAllStates: Option[Boolean],
    --- End diff --
    
    Why is returnAllStates not just `Boolean`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222045976
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59444/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222022295
  
    **[Test build #59420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59420/consoleFull)** for PR 13286 at commit [`bbd6022`](https://github.com/apache/spark/commit/bbd6022cf3238a78604c8ad43df4c917d792f37f).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222019450
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490915
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala ---
    @@ -35,15 +35,36 @@ class IncrementalExecution private[sql](
         val currentBatchId: Long)
       extends QueryExecution(sparkSession, logicalPlan) {
     
    -  // TODO: make this always part of planning.
    -  val stateStrategy = sparkSession.sessionState.planner.StatefulAggregationStrategy :: Nil
    -
       // Modified planner with stateful operations.
       override def planner: SparkPlanner =
         new SparkPlanner(
           sparkSession.sparkContext,
           sparkSession.sessionState.conf,
    -      stateStrategy)
    +      Nil) {
    +
    +      override def strategies: Seq[Strategy] = {
    +        StatefulAggregationStrategy +: super.strategies
    +      }
    +
    +      /**
    +       * Used to plan aggregation queries that are computed incrementally as part of a
    +       * [[org.apache.spark.sql.ContinuousQuery]].
    +       */
    +      object StatefulAggregationStrategy extends Strategy {
    +        override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +          case PhysicalAggregation(
    +            namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
    +            execution.aggregate.Utils.planStreamingAggregation(
    +              namedGroupingExpressions,
    +              aggregateExpressions,
    +              rewrittenResultExpressions,
    +              outputMode,
    --- End diff --
    
    The strategy has been moved from SparkStrategies to IncrementalExecution to allow StatefulAggregationStrategy to access the output mode. Is there a better design that involves minimal changes?
    
    An alternative to passing the output mode explicitly could have been a logic plan node that stores the output mode in the plan itself, and the StatefulAggregationStrategy somehow finding it out from the plan itself. But I think this change is a little complicated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221433717
  
    **[Test build #59235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59235/consoleFull)** for PR 13286 at commit [`a6e2bb5`](https://github.com/apache/spark/commit/a6e2bb5b6f0838b18c84086d671b518701cbc26a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64976140
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    We also need a trait OutputMode that InternalOutputModes.Append will extend. Should that be a Scala trait or Java interface?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64838491
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -15,9 +15,10 @@
      * limitations under the License.
      */
     
    -package org.apache.spark.sql.catalyst.analysis
    +package org.apache.spark.sql
     
    -sealed trait OutputMode
    -
    -case object Append extends OutputMode
    -case object Update extends OutputMode
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    add docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221444771
  
    **[Test build #59236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59236/consoleFull)** for PR 13286 at commit [`bb0314d`](https://github.com/apache/spark/commit/bb0314d32dfd42ea6081427042e182706a83b6ff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64976281
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    I believe they are equivalent when there are no implemented methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64966038
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +/**
    + * Internal helper class to generate objects representing various [[OutputMode]]s,
    + */
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    If this is going to be internal we should probably just move it to `execution.streaming`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64961894
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,26 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    +        Options include:
    +
    +        * `append`:Only the new rows in the streaming DataFrame/Dataset will be written to
    +           the sink
    +        * `complete`:All the rows in the streaming DataFrame/Dataset will be written to the sink
    +           every time these is some updates
    --- End diff --
    
    each time the trigger fires?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64966732
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +/**
    + * Internal helper class to generate objects representing various [[OutputMode]]s,
    + */
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    They are in catalyst. Still move it to execution.streaming?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221520299
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59268/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221446196
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59239/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222019451
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59424/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64966959
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +/**
    + * Internal helper class to generate objects representing various [[OutputMode]]s,
    + */
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    oh, mostly I don't think they need to be in `org.apache.spark.sql` they could live in catalyst too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222285015
  
    **[Test build #59542 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59542/consoleFull)** for PR 13286 at commit [`e951798`](https://github.com/apache/spark/commit/e951798bf4511cabc08d242e7f1a3d7d1e653263).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64496407
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala ---
    @@ -35,15 +35,36 @@ class IncrementalExecution private[sql](
         val currentBatchId: Long)
       extends QueryExecution(sparkSession, logicalPlan) {
     
    -  // TODO: make this always part of planning.
    -  val stateStrategy = sparkSession.sessionState.planner.StatefulAggregationStrategy :: Nil
    -
       // Modified planner with stateful operations.
       override def planner: SparkPlanner =
         new SparkPlanner(
           sparkSession.sparkContext,
           sparkSession.sessionState.conf,
    -      stateStrategy)
    +      Nil) {
    +
    +      override def strategies: Seq[Strategy] = {
    +        StatefulAggregationStrategy +: super.strategies
    +      }
    +
    +      /**
    +       * Used to plan aggregation queries that are computed incrementally as part of a
    +       * [[org.apache.spark.sql.ContinuousQuery]].
    +       */
    +      object StatefulAggregationStrategy extends Strategy {
    +        override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
    +          case PhysicalAggregation(
    +            namedGroupingExpressions, aggregateExpressions, rewrittenResultExpressions, child) =>
    +            execution.aggregate.Utils.planStreamingAggregation(
    +              namedGroupingExpressions,
    +              aggregateExpressions,
    +              rewrittenResultExpressions,
    +              outputMode,
    --- End diff --
    
    Nvm. I simplified the logic even further to avoid this refactoring. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala ---
    @@ -82,40 +82,60 @@ case class StateStoreRestoreExec(
     case class StateStoreSaveExec(
         keyExpressions: Seq[Attribute],
         stateId: Option[OperatorStateId],
    +    returnAllStates: Boolean,
         child: SparkPlan)
       extends execution.UnaryExecNode with StatefulOperator {
     
       override protected def doExecute(): RDD[InternalRow] = {
    +    val saveAndReturnFunc = if (returnAllStates) saveAndReturnAll _ else saveAndReturnUpdated _
         child.execute().mapPartitionsWithStateStore(
           getStateId.checkpointLocation,
           operatorId = getStateId.operatorId,
           storeVersion = getStateId.batchId,
           keyExpressions.toStructType,
           child.output.toStructType,
           sqlContext.sessionState,
    -      Some(sqlContext.streams.stateStoreCoordinator)) { case (store, iter) =>
    -        new Iterator[InternalRow] {
    -          private[this] val baseIterator = iter
    -          private[this] val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
    +      Some(sqlContext.streams.stateStoreCoordinator)
    +    )(saveAndReturnFunc)
    +  }
    +
    +  override def output: Seq[Attribute] = child.output
     
    -          override def hasNext: Boolean = {
    -            if (!baseIterator.hasNext) {
    -              store.commit()
    -              false
    -            } else {
    -              true
    -            }
    -          }
    +  private def saveAndReturnUpdated(
    +      store: StateStore,
    +      iter: Iterator[InternalRow]): Iterator[InternalRow] = {
    +    new Iterator[InternalRow] {
    +      private[this] val baseIterator = iter
    +      private[this] val getKey = GenerateUnsafeProjection.generate(keyExpressions, child.output)
     
    -          override def next(): InternalRow = {
    -            val row = baseIterator.next().asInstanceOf[UnsafeRow]
    -            val key = getKey(row)
    -            store.put(key.copy(), row.copy())
    -            row
    -          }
    +      override def hasNext: Boolean = {
    +        if (!baseIterator.hasNext) {
    +          store.commit()
    +          false
    +        } else {
    +          true
             }
    +      }
    +
    +      override def next(): InternalRow = {
    +        val row = baseIterator.next().asInstanceOf[UnsafeRow]
    +        val key = getKey(row)
    +        store.put(key.copy(), row.copy())
    +        row
    +      }
         }
       }
     
    -  override def output: Seq[Attribute] = child.output
    +  private def saveAndReturnAll(
    --- End diff --
    
    nit: add docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221444874
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222045851
  
    **[Test build #59444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59444/consoleFull)** for PR 13286 at commit [`85ce263`](https://github.com/apache/spark/commit/85ce2638cf9c9150e2258749bb894a39779d24cc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221442177
  
    Finished my first round. Looks pretty good. Just some nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13286: [SPARK-15517][SQL][STREAMING] Add support for com...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r65569068
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    +
    +  /**
    +   * OutputMode in which only the new rows in the streaming DataFrame/Dataset will be
    +   * written to the sink. This output mode can be only be used in queries that do not
    +   * contain any aggregation.
    +   *
    +   * @since 2.0.0
    +   */
    +  public static OutputMode Append() {
    --- End diff --
    
    See https://github.com/apache/spark/pull/13464 -- this fails Java lint. Can this be `append()` as would be conventional in Java? I don't see that it's there to implement some interface


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64848125
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala ---
    @@ -237,15 +237,15 @@ class ContinuousQueryManagerSuite extends StreamTest with SharedSQLContext with
               try {
                 val df = ds.toDF
                 val metadataRoot =
    -              Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
    -            query = spark
    -              .streams
    -              .startQuery(
    -                StreamExecution.nextName,
    -                metadataRoot,
    -                df,
    -                new MemorySink(df.schema))
    -              .asInstanceOf[StreamExecution]
    +              Utils.createTempDir(namePrefix = "streaming.checkpoint").getCanonicalPath
    +            query =
    +              df.write
    +                .format("memory")
    +                .queryName(s"query${Random.nextInt(100000)}")
    +                .option("checkpointLocation", metadataRoot)
    +                .outputMode("append")
    +                .startStream()
    +                .asInstanceOf[StreamExecution]
    --- End diff --
    
    This change is to make more unit tests use memory format ==> more testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64498668
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/OutputMode.java ---
    @@ -15,9 +15,10 @@
      * limitations under the License.
      */
     
    -package org.apache.spark.sql.catalyst.analysis
    +package org.apache.spark.sql;
     
    -sealed trait OutputMode
    -
    -case object Append extends OutputMode
    -case object Update extends OutputMode
    +public enum OutputMode {
    +  Append,
    +  Update,
    --- End diff --
    
    This actually raises a good question.  I'm not sure if we can use enums here as I think that we need to have a notion of a `key` in order to do an `Update` mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495145
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala ---
    @@ -114,35 +114,48 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
      * A sink that stores the results in memory. This [[Sink]] is primarily intended for use in unit
      * tests and does not provide durability.
      */
    -class MemorySink(val schema: StructType) extends Sink with Logging {
    +class MemorySink(val schema: StructType, outputMode: OutputMode) extends Sink with Logging {
    +
    +  private case class AddedData(batchId: Long, data: Array[Row])
    +
       /** An order list of batches that have been written to this [[Sink]]. */
       @GuardedBy("this")
    -  private val batches = new ArrayBuffer[Array[Row]]()
    +  private val batches = new ArrayBuffer[AddedData]()
     
       /** Returns all rows that are stored in this [[Sink]]. */
       def allData: Seq[Row] = synchronized {
    -    batches.flatten
    +    batches.map(_.data).flatten
       }
     
    -  def latestBatchId: Option[Int] = synchronized {
    -    if (batches.size == 0) None else Some(batches.size - 1)
    +  def latestBatchId: Option[Long] = synchronized {
    +    batches.lastOption.map(_.batchId)
       }
     
    -  def lastBatch: Seq[Row] = synchronized { batches.last }
    +  def latestBatchData: Seq[Row] = synchronized { batches.lastOption.toSeq.flatten(_.data) }
     
       def toDebugString: String = synchronized {
    -    batches.zipWithIndex.map { case (b, i) =>
    -      val dataStr = try b.mkString(" ") catch {
    +    batches.map { case AddedData(batchId, data) =>
    +      val dataStr = try data.mkString(" ") catch {
             case NonFatal(e) => "[Error converting to string]"
           }
    -      s"$i: $dataStr"
    +      s"$batchId: $dataStr"
         }.mkString("\n")
       }
     
       override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    -    if (batchId == batches.size) {
    -      logDebug(s"Committing batch $batchId")
    -      batches.append(data.collect())
    +    if (latestBatchId.isEmpty || batchId > latestBatchId.get) {
    +      logDebug(s"Committing batch $batchId to $this")
    +      outputMode match {
    +        case OutputMode.Append | OutputMode.Update =>
    --- End diff --
    
    nit: Since we don't support `OutputMode.Update`, could you remove it? I think it will have a different logic even if we add it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495492
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala ---
    @@ -237,15 +237,14 @@ class ContinuousQueryManagerSuite extends StreamTest with SharedSQLContext with
               try {
                 val df = ds.toDF
                 val metadataRoot =
    -              Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
    -            query = spark
    -              .streams
    -              .startQuery(
    -                StreamExecution.nextName,
    -                metadataRoot,
    -                df,
    -                new MemorySink(df.schema))
    -              .asInstanceOf[StreamExecution]
    +              Utils.createTempDir(namePrefix = "streaming.checkpoint").getCanonicalPath
    +            query =
    +              df.write
    +                .format("memory")
    +                .option("checkpointLocation", "memory")
    --- End diff --
    
    nit: `"memory"` -> `metadataRoot`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221464878
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64838480
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,29 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +public class OutputMode {
    --- End diff --
    
    add docs



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221429136
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59232/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490597
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala ---
    @@ -311,8 +313,9 @@ object Utils {
                 aggregateExpressions.flatMap(_.aggregateFunction.inputAggBufferAttributes),
             child = restored)
         }
    +    val returnAllStates = if (outputMode == OutputMode.Complete) true else false
     
    -    val saved = StateStoreSaveExec(groupingAttributes, None, partialMerged2)
    +    val saved = StateStoreSaveExec(groupingAttributes, None, returnAllStates, partialMerged2)
    --- End diff --
    
    nit: extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222280807
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59541/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64968033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +/**
    + * Internal helper class to generate objects representing various [[OutputMode]]s,
    + */
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    Moving them to org.apache.spark.streaming in catalyst.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64796991
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala ---
    @@ -82,40 +82,62 @@ case class StateStoreRestoreExec(
     case class StateStoreSaveExec(
         keyExpressions: Seq[Attribute],
         stateId: Option[OperatorStateId],
    +    returnAllStates: Option[Boolean],
    --- End diff --
    
    Similar to stateId, it is injected later after creation of the physical Spark plan, and before execution. See IncrementalExecution, and how stateId is injected using the prepare rule. 
    
    I agree that this structure is something we can improve, but that would require quite a bit of refactoring and better done later. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490480
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala ---
    @@ -55,21 +56,6 @@ object UnsupportedOperationChecker {
             case _: InsertIntoTable =>
               throwError("InsertIntoTable is not supported with streaming DataFrames/Datasets")
     
    -        case Aggregate(_, _, child) if child.isStreaming =>
    -          if (outputMode == Append) {
    -            throwError(
    -              "Aggregations are not supported on streaming DataFrames/Datasets in " +
    -                "Append output mode. Consider changing output mode to Update.")
    -          }
    -          val moreStreamingAggregates = child.find {
    -            case Aggregate(_, _, grandchild) if grandchild.isStreaming => true
    -            case _ => false
    -          }
    -          if (moreStreamingAggregates.nonEmpty) {
    -            throwError("Multiple streaming aggregations are not supported with " +
    -              "streaming DataFrames/Datasets")
    -          }
    --- End diff --
    
    This has been moved around to better consolidate all the logic related to output modes and aggregations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64966816
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -77,7 +77,47 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           case "ignore" => SaveMode.Ignore
           case "error" | "default" => SaveMode.ErrorIfExists
           case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
    -        "Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
    +        "Accepted save modes are 'overwrite', 'append', 'ignore', 'error'.")
    --- End diff --
    
    I was thinking the same. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222280740
  
    **[Test build #59541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59541/consoleFull)** for PR 13286 at commit [`4784e18`](https://github.com/apache/spark/commit/4784e18efcc552fc99c2fd9d5ccfe1c78e479c79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222045973
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221444878
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59236/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64502603
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/OutputMode.java ---
    @@ -15,9 +15,10 @@
      * limitations under the License.
      */
     
    -package org.apache.spark.sql.catalyst.analysis
    +package org.apache.spark.sql;
    --- End diff --
    
    before that.... i realize that making this java enum prevents us from having output modes like `UpdateInPlace("key")` in the future. So we have to think about this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64968245
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    one downside of writing this in java is that it doesn't show up in scaladocs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64966890
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala ---
    @@ -237,15 +237,15 @@ class ContinuousQueryManagerSuite extends StreamTest with SharedSQLContext with
               try {
                 val df = ds.toDF
                 val metadataRoot =
    -              Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
    -            query = spark
    -              .streams
    -              .startQuery(
    -                StreamExecution.nextName,
    -                metadataRoot,
    -                df,
    -                new MemorySink(df.schema))
    -              .asInstanceOf[StreamExecution]
    +              Utils.createTempDir(namePrefix = "streaming.checkpoint").getCanonicalPath
    +            query =
    +              df.write
    +                .format("memory")
    +                .queryName(s"query${Random.nextInt(100000)}")
    --- End diff --
    
    Will make it a counter. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221520107
  
    **[Test build #59268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59268/consoleFull)** for PR 13286 at commit [`58f88b8`](https://github.com/apache/spark/commit/58f88b8f829b7a1408b2ce471b4d5d0a23031ee7).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64842979
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -77,7 +77,47 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           case "ignore" => SaveMode.Ignore
           case "error" | "default" => SaveMode.ErrorIfExists
           case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
    -        "Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
    +        "Accepted save modes are 'overwrite', 'append', 'ignore', 'error'.")
    +    }
    +    this
    +  }
    +
    +  /**
    +   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +   *   - `OutputMode.Append()`:   only the new rows in the streaming DataFrame/Dataset will be
    +   *                            written to the sink
    +   *   - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written
    +   *                            to the sink every time these is some updates
    +   *
    +   * @since 2.0.0
    +   */
    +  @Experimental
    +  def outputMode(outputMode: OutputMode): DataFrameWriter = {
    +    assertStreaming("outputMode() can only be called on continuous queries")
    +    this.outputMode = outputMode
    +    this
    +  }
    +
    +  /**
    +   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +   *   - `append`:   only the new rows in the streaming DataFrame/Dataset will be written to
    +   *                 the sink
    +   *   - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
    +   *                 every time these is some updates
    +   *
    +   * @since 2.0.0
    +   */
    +  @Experimental
    +  def outputMode(outputMode: String): DataFrameWriter = {
    --- End diff --
    
    @tdas do we need to think about how to support the `update` mode for this method? "update(columnName)"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495148
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala ---
    @@ -114,35 +114,48 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
      * A sink that stores the results in memory. This [[Sink]] is primarily intended for use in unit
      * tests and does not provide durability.
      */
    -class MemorySink(val schema: StructType) extends Sink with Logging {
    +class MemorySink(val schema: StructType, outputMode: OutputMode) extends Sink with Logging {
    +
    +  private case class AddedData(batchId: Long, data: Array[Row])
    +
       /** An order list of batches that have been written to this [[Sink]]. */
       @GuardedBy("this")
    -  private val batches = new ArrayBuffer[Array[Row]]()
    +  private val batches = new ArrayBuffer[AddedData]()
     
       /** Returns all rows that are stored in this [[Sink]]. */
       def allData: Seq[Row] = synchronized {
    -    batches.flatten
    +    batches.map(_.data).flatten
       }
     
    -  def latestBatchId: Option[Int] = synchronized {
    -    if (batches.size == 0) None else Some(batches.size - 1)
    +  def latestBatchId: Option[Long] = synchronized {
    +    batches.lastOption.map(_.batchId)
       }
     
    -  def lastBatch: Seq[Row] = synchronized { batches.last }
    +  def latestBatchData: Seq[Row] = synchronized { batches.lastOption.toSeq.flatten(_.data) }
     
       def toDebugString: String = synchronized {
    -    batches.zipWithIndex.map { case (b, i) =>
    -      val dataStr = try b.mkString(" ") catch {
    +    batches.map { case AddedData(batchId, data) =>
    +      val dataStr = try data.mkString(" ") catch {
             case NonFatal(e) => "[Error converting to string]"
           }
    -      s"$i: $dataStr"
    +      s"$batchId: $dataStr"
         }.mkString("\n")
       }
     
       override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    -    if (batchId == batches.size) {
    -      logDebug(s"Committing batch $batchId")
    -      batches.append(data.collect())
    +    if (latestBatchId.isEmpty || batchId > latestBatchId.get) {
    +      logDebug(s"Committing batch $batchId to $this")
    +      outputMode match {
    +        case OutputMode.Append | OutputMode.Update =>
    +          batches.append(AddedData(batchId, data.collect()))
    +
    +        case OutputMode.Complete =>
    +          batches.clear()
    +          batches.append(AddedData(batchId, data.collect()))
    +
    +        case _ =>
    +          throw new IllegalArgumentException("Data source ")
    --- End diff --
    
    nit: Although we won't reach here, let's still add a better message such as `s"Memory sink does not support output mode $outputMode"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64499074
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,25 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    --- End diff --
    
    nit: add `.. note:: Experimental.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222019179
  
    **[Test build #59424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59424/consoleFull)** for PR 13286 at commit [`4973621`](https://github.com/apache/spark/commit/4973621c23aac441c26c172a6e6322454b9797f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222280804
  
    **[Test build #59541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59541/consoleFull)** for PR 13286 at commit [`4784e18`](https://github.com/apache/spark/commit/4784e18efcc552fc99c2fd9d5ccfe1c78e479c79).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222019440
  
    **[Test build #59424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59424/consoleFull)** for PR 13286 at commit [`4973621`](https://github.com/apache/spark/commit/4973621c23aac441c26c172a6e6322454b9797f7).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for complete o...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13286
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222022414
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59420/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64980368
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    Okay, writing it in Scala does not work because we need define  static methods on OutputMode ('Append()', etc.) as well as the OutputMode interface that singleton object Append will extend. To do this in Scala we have to do 
    ```
    trait OutputMode
    object OutputMode {
       def Append(): OutputMode = ???
    }
    ```
    But this makes the OutputMode static object unusable from Java, as you will have to call `OutputMode$.MODULE$.Append()`. 
    
    So doing this in Java is the cleanest way for it to be usable in both Java and Scala. I am including a JavaOutputModeSuite as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495984
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala ---
    @@ -114,35 +114,48 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
      * A sink that stores the results in memory. This [[Sink]] is primarily intended for use in unit
      * tests and does not provide durability.
      */
    -class MemorySink(val schema: StructType) extends Sink with Logging {
    +class MemorySink(val schema: StructType, outputMode: OutputMode) extends Sink with Logging {
    +
    +  private case class AddedData(batchId: Long, data: Array[Row])
    +
       /** An order list of batches that have been written to this [[Sink]]. */
       @GuardedBy("this")
    -  private val batches = new ArrayBuffer[Array[Row]]()
    +  private val batches = new ArrayBuffer[AddedData]()
     
       /** Returns all rows that are stored in this [[Sink]]. */
       def allData: Seq[Row] = synchronized {
    -    batches.flatten
    +    batches.map(_.data).flatten
       }
     
    -  def latestBatchId: Option[Int] = synchronized {
    -    if (batches.size == 0) None else Some(batches.size - 1)
    +  def latestBatchId: Option[Long] = synchronized {
    +    batches.lastOption.map(_.batchId)
       }
     
    -  def lastBatch: Seq[Row] = synchronized { batches.last }
    +  def latestBatchData: Seq[Row] = synchronized { batches.lastOption.toSeq.flatten(_.data) }
     
       def toDebugString: String = synchronized {
    -    batches.zipWithIndex.map { case (b, i) =>
    -      val dataStr = try b.mkString(" ") catch {
    +    batches.map { case AddedData(batchId, data) =>
    +      val dataStr = try data.mkString(" ") catch {
             case NonFatal(e) => "[Error converting to string]"
           }
    -      s"$i: $dataStr"
    +      s"$batchId: $dataStr"
         }.mkString("\n")
       }
     
       override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    -    if (batchId == batches.size) {
    -      logDebug(s"Committing batch $batchId")
    -      batches.append(data.collect())
    +    if (latestBatchId.isEmpty || batchId > latestBatchId.get) {
    +      logDebug(s"Committing batch $batchId to $this")
    +      outputMode match {
    +        case OutputMode.Append | OutputMode.Update =>
    +          batches.append(AddedData(batchId, data.collect()))
    +
    +        case OutputMode.Complete =>
    +          batches.clear()
    +          batches.append(AddedData(batchId, data.collect()))
    +
    +        case _ =>
    +          throw new IllegalArgumentException("Data source ")
    --- End diff --
    
    thats a mistake, i left it incomplete. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222074331
  
    **[Test build #3024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3024/consoleFull)** for PR 13286 at commit [`85ce263`](https://github.com/apache/spark/commit/85ce2638cf9c9150e2258749bb894a39779d24cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221428703
  
    **[Test build #59232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59232/consoleFull)** for PR 13286 at commit [`61af057`](https://github.com/apache/spark/commit/61af0573a112a54bab05c070de7a36b8c74703dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222036038
  
    **[Test build #59425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59425/consoleFull)** for PR 13286 at commit [`369e9d5`](https://github.com/apache/spark/commit/369e9d5d58a762d9af794ed791410249d7032fa5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221446192
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64848139
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/MemorySinkSuite.scala ---
    @@ -88,4 +244,15 @@ class MemorySinkSuite extends StreamTest with SharedSQLContext {
             .startStream()
         }
       }
    +
    +  private def checkAnswer(rows: Seq[Row], expected: Seq[Int])(implicit schema: StructType): Unit = {
    +    checkAnswer(
    +      sqlContext.createDataFrame(sparkContext.makeRDD(rows), schema),
    +      intsToDF(expected)(schema))
    +  }
    +
    +  implicit def intsToDF(seq: Seq[Int])(implicit schema: StructType): DataFrame = {
    --- End diff --
    
    nit: add private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64492042
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala ---
    @@ -67,6 +64,43 @@ class StreamingAggregationSuite extends StreamTest with SharedSQLContext with Be
         )
       }
     
    +  test("simple count, complete mode") {
    +    val inputData = MemoryStream[Int]
    +
    +    val aggregated =
    +      inputData.toDF()
    +        .groupBy($"value")
    +        .agg(count("*"))
    +        .as[(Int, Long)]
    +
    +    testStream(aggregated, OutputMode.Complete)(
    +      AddData(inputData, 3),
    +      CheckLastBatch((3, 1)),
    +      AddData(inputData, 2),
    +      CheckLastBatch((3, 1), (2, 1)),
    +      StopStream,
    +      StartStream(),
    +      AddData(inputData, 3, 2, 1),
    +      CheckLastBatch((3, 2), (2, 2), (1, 1)),
    +      AddData(inputData, 4, 4, 4, 4),
    +      CheckLastBatch((4, 4), (3, 2), (2, 2), (1, 1))
    +    )
    +  }
    +
    +  test("simple count, append mode") {
    +    val inputData = MemoryStream[Int]
    +
    +    val aggregated =
    +      inputData.toDF()
    +        .groupBy($"value")
    +        .agg(count("*"))
    +        .as[(Int, Long)]
    +
    +    intercept[AnalysisException] {
    --- End diff --
    
    add checks on message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221436824
  
    **[Test build #59239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59239/consoleFull)** for PR 13286 at commit [`3a79d41`](https://github.com/apache/spark/commit/3a79d41b14535ffe4f65595126614832a094bf9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490460
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala ---
    @@ -55,21 +56,6 @@ object UnsupportedOperationChecker {
             case _: InsertIntoTable =>
               throwError("InsertIntoTable is not supported with streaming DataFrames/Datasets")
     
    --- End diff --
    
    This has been moved around to better consolidate all the logic related to output modes and aggregations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221457219
  
    **[Test build #59253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59253/consoleFull)** for PR 13286 at commit [`074299c`](https://github.com/apache/spark/commit/074299ca9bf04a3b14d9c54ba7fc2cd2b4bce94b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221464882
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59253/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222022356
  
    **[Test build #59425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59425/consoleFull)** for PR 13286 at commit [`369e9d5`](https://github.com/apache/spark/commit/369e9d5d58a762d9af794ed791410249d7032fa5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221520294
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64496153
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -319,14 +362,19 @@ final class DataFrameWriter private[sql](df: DataFrame) {
             checkpointPath.toUri.toString
           }
     
    -      val sink = new MemorySink(df.schema)
    +      if (!Seq(OutputMode.Append, OutputMode.Complete).contains(outputMode)) {
    --- End diff --
    
    nit: maybe move this logic to the constructor of `MemorySink` so that we can make sure no place will pass a wrong OutputMode to MemorySink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490566
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala ---
    @@ -33,7 +34,7 @@ object Utils {
           resultExpressions: Seq[NamedExpression],
           child: SparkPlan): Seq[SparkPlan] = {
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    +    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = aggregate.Complete))
    --- End diff --
    
    nit: not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222285060
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64976955
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/InternalOutputModes.scala ---
    @@ -0,0 +1,45 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql
    +
    +/**
    + * Internal helper class to generate objects representing various [[OutputMode]]s,
    + */
    +private[sql] object InternalOutputModes {
    --- End diff --
    
    Based on offline discussion with @rxin, will do this move in a later PR once we decide globally which classes should be moved to what package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64967749
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    But there are no tests written in java :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64490527
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -77,7 +77,50 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           case "ignore" => SaveMode.Ignore
           case "error" | "default" => SaveMode.ErrorIfExists
           case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
    -        "Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
    +        "Accepted save modes are 'overwrite', 'append', 'ignore', 'error'.")
    +    }
    +    this
    +  }
    +
    +  /**
    +   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +   *   - `OutputMode.Append`:   only the new rows in the streaming DataFrame/Dataset will be
    +   *                            written to the sink
    +   *   - `OutputMode.Update`:   only the changed rows in the streaming DataFrame/Dataset will be
    +   *                            written to the sink every time there is some updates
    +   *   - `OutputMode.Complete`: all the rows in the streaming DataFrame/Dataset will be written
    +   *                            to the sink every time these is some updates
    +   *
    +   * @since 2.0.0
    +   */
    +  @Experimental
    +  def outputMode(outputMode: OutputMode): DataFrameWriter = {
    +    assertStreaming("outputMode() can only be called on continuous queries")
    +    this.outputMode = outputMode
    +    this
    +  }
    +
    --- End diff --
    
    nit: extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495962
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala ---
    @@ -114,35 +114,48 @@ case class MemoryStream[A : Encoder](id: Int, sqlContext: SQLContext)
      * A sink that stores the results in memory. This [[Sink]] is primarily intended for use in unit
      * tests and does not provide durability.
      */
    -class MemorySink(val schema: StructType) extends Sink with Logging {
    +class MemorySink(val schema: StructType, outputMode: OutputMode) extends Sink with Logging {
    +
    +  private case class AddedData(batchId: Long, data: Array[Row])
    +
       /** An order list of batches that have been written to this [[Sink]]. */
       @GuardedBy("this")
    -  private val batches = new ArrayBuffer[Array[Row]]()
    +  private val batches = new ArrayBuffer[AddedData]()
     
       /** Returns all rows that are stored in this [[Sink]]. */
       def allData: Seq[Row] = synchronized {
    -    batches.flatten
    +    batches.map(_.data).flatten
       }
     
    -  def latestBatchId: Option[Int] = synchronized {
    -    if (batches.size == 0) None else Some(batches.size - 1)
    +  def latestBatchId: Option[Long] = synchronized {
    +    batches.lastOption.map(_.batchId)
       }
     
    -  def lastBatch: Seq[Row] = synchronized { batches.last }
    +  def latestBatchData: Seq[Row] = synchronized { batches.lastOption.toSeq.flatten(_.data) }
     
       def toDebugString: String = synchronized {
    -    batches.zipWithIndex.map { case (b, i) =>
    -      val dataStr = try b.mkString(" ") catch {
    +    batches.map { case AddedData(batchId, data) =>
    +      val dataStr = try data.mkString(" ") catch {
             case NonFatal(e) => "[Error converting to string]"
           }
    -      s"$i: $dataStr"
    +      s"$batchId: $dataStr"
         }.mkString("\n")
       }
     
       override def addBatch(batchId: Long, data: DataFrame): Unit = synchronized {
    -    if (batchId == batches.size) {
    -      logDebug(s"Committing batch $batchId")
    -      batches.append(data.collect())
    +    if (latestBatchId.isEmpty || batchId > latestBatchId.get) {
    +      logDebug(s"Committing batch $batchId to $this")
    +      outputMode match {
    +        case OutputMode.Append | OutputMode.Update =>
    --- End diff --
    
    I am wondering whether we should support update mode for memory sink. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64843204
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -77,7 +77,47 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           case "ignore" => SaveMode.Ignore
           case "error" | "default" => SaveMode.ErrorIfExists
           case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
    -        "Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
    +        "Accepted save modes are 'overwrite', 'append', 'ignore', 'error'.")
    +    }
    +    this
    +  }
    +
    +  /**
    +   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +   *   - `OutputMode.Append()`:   only the new rows in the streaming DataFrame/Dataset will be
    +   *                            written to the sink
    +   *   - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written
    +   *                            to the sink every time these is some updates
    +   *
    +   * @since 2.0.0
    +   */
    +  @Experimental
    +  def outputMode(outputMode: OutputMode): DataFrameWriter = {
    +    assertStreaming("outputMode() can only be called on continuous queries")
    +    this.outputMode = outputMode
    +    this
    +  }
    +
    +  /**
    +   * Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +   *   - `append`:   only the new rows in the streaming DataFrame/Dataset will be written to
    +   *                 the sink
    +   *   - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
    +   *                 every time these is some updates
    +   *
    +   * @since 2.0.0
    +   */
    +  @Experimental
    +  def outputMode(outputMode: String): DataFrameWriter = {
    --- End diff --
    
    That we can decide later. That sounds too complicated to reason about right now when we have not even finalized how to specify Update mode. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64967195
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    Sure. Just wanted to be extra-cautious for java-safety.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64796289
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,25 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    --- End diff --
    
    done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64495685
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/OutputMode.java ---
    @@ -15,9 +15,10 @@
      * limitations under the License.
      */
     
    -package org.apache.spark.sql.catalyst.analysis
    +package org.apache.spark.sql;
    --- End diff --
    
    nit: move this file to sql/catalyst/src/main/**java**/org/apache/spark/sql/OutputMode.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222285061
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59542/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64962858
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala ---
    @@ -237,15 +237,15 @@ class ContinuousQueryManagerSuite extends StreamTest with SharedSQLContext with
               try {
                 val df = ds.toDF
                 val metadataRoot =
    -              Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
    -            query = spark
    -              .streams
    -              .startQuery(
    -                StreamExecution.nextName,
    -                metadataRoot,
    -                df,
    -                new MemorySink(df.schema))
    -              .asInstanceOf[StreamExecution]
    +              Utils.createTempDir(namePrefix = "streaming.checkpoint").getCanonicalPath
    +            query =
    +              df.write
    +                .format("memory")
    +                .queryName(s"query${Random.nextInt(100000)}")
    --- End diff --
    
    There could still be conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222063441
  
    **[Test build #3024 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3024/consoleFull)** for PR 13286 at commit [`85ce263`](https://github.com/apache/spark/commit/85ce2638cf9c9150e2258749bb894a39779d24cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64979710
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/ContinuousQueryManagerSuite.scala ---
    @@ -237,15 +237,15 @@ class ContinuousQueryManagerSuite extends StreamTest with SharedSQLContext with
               try {
                 val df = ds.toDF
                 val metadataRoot =
    -              Utils.createTempDir(namePrefix = "streaming.metadata").getCanonicalPath
    -            query = spark
    -              .streams
    -              .startQuery(
    -                StreamExecution.nextName,
    -                metadataRoot,
    -                df,
    -                new MemorySink(df.schema))
    -              .asInstanceOf[StreamExecution]
    +              Utils.createTempDir(namePrefix = "streaming.checkpoint").getCanonicalPath
    +            query =
    +              df.write
    +                .format("memory")
    +                .queryName(s"query${Random.nextInt(100000)}")
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64976764
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,26 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    +        Options include:
    +
    +        * `append`:Only the new rows in the streaming DataFrame/Dataset will be written to
    +           the sink
    +        * `complete`:All the rows in the streaming DataFrame/Dataset will be written to the sink
    +           every time these is some updates
    --- End diff --
    
    I want to write something that makes sense generally, without understanding trigger and all. As is, since the trigger is optional, one does not need to know about triggers at all to start running stuff in structured streaming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64796697
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala ---
    @@ -33,7 +34,7 @@ object Utils {
           resultExpressions: Seq[NamedExpression],
           child: SparkPlan): Seq[SparkPlan] = {
     
    -    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = Complete))
    +    val completeAggregateExpressions = aggregateExpressions.map(_.copy(mode = aggregate.Complete))
    --- End diff --
    
    removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222022412
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13286: [SPARK-15517][SQL][STREAMING] Add support for complete o...

Posted by techaddict <gi...@git.apache.org>.

Github user techaddict commented on the issue:

    https://github.com/apache/spark/pull/13286
  
    @tdas @marmbrus this is failing `dev/lint-java`
    So we should change `Append` and `Complete` to `append` and `complete`
    ```
    [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
    [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
    ```
    
    I've created a PR to fix this #13464


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222254170
  
    LGTM with a few comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64843274
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,28 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    +        Options include:
    +
    +        * `append`:Only the new rows in the streaming DataFrame/Dataset will be written to
    +           the sink
    +        * `update`:Only the changed rows in the streaming DataFrame/Dataset will be written to
    --- End diff --
    
    nit: remove this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64796643
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -319,14 +362,19 @@ final class DataFrameWriter private[sql](df: DataFrame) {
             checkpointPath.toUri.toString
           }
     
    -      val sink = new MemorySink(df.schema)
    +      if (!Seq(OutputMode.Append, OutputMode.Complete).contains(outputMode)) {
    --- End diff --
    
    The tricky thing is that we want to make the Memory Sink compatible with update internally, but we may not want public API to support update mode yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64962057
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/OutputMode.java ---
    @@ -0,0 +1,54 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql;
    +
    +import org.apache.spark.annotation.Experimental;
    +
    +/**
    + * :: Experimental ::
    + *
    + * OutputMode is used to what data will be written to a streaming sink when there is
    + * new data available in a streaming DataFrame/Dataset.
    + *
    + * @since 2.0.0
    + */
    +@Experimental
    +public class OutputMode {
    --- End diff --
    
    This doesn't have to be in `java`, see [Encoders.scala](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64846040
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -500,6 +500,28 @@ def mode(self, saveMode):
                 self._jwrite = self._jwrite.mode(saveMode)
             return self
     
    +    @since(2.0)
    +    def outputMode(self, outputMode):
    +        """Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
    +
    +        Options include:
    +
    +        * `append`:Only the new rows in the streaming DataFrame/Dataset will be written to
    +           the sink
    +        * `update`:Only the changed rows in the streaming DataFrame/Dataset will be written to
    --- End diff --
    
    good catch. fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221429134
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64962236
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
    @@ -77,7 +77,47 @@ final class DataFrameWriter private[sql](df: DataFrame) {
           case "ignore" => SaveMode.Ignore
           case "error" | "default" => SaveMode.ErrorIfExists
           case _ => throw new IllegalArgumentException(s"Unknown save mode: $saveMode. " +
    -        "Accepted modes are 'overwrite', 'append', 'ignore', 'error'.")
    +        "Accepted save modes are 'overwrite', 'append', 'ignore', 'error'.")
    --- End diff --
    
    We might consider aliasing `mode` as `saveMode` and deprecating `mode`.
    
    /cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221464810
  
    **[Test build #59253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59253/consoleFull)** for PR 13286 at commit [`074299c`](https://github.com/apache/spark/commit/074299ca9bf04a3b14d9c54ba7fc2cd2b4bce94b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64494863
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ContinuousQueryManager.scala ---
    @@ -175,9 +175,9 @@ class ContinuousQueryManager(sparkSession: SparkSession) {
           checkpointLocation: String,
           df: DataFrame,
           sink: Sink,
    +      outputMode: OutputMode,
           trigger: Trigger = ProcessingTime(0),
    -      triggerClock: Clock = new SystemClock(),
    -      outputMode: OutputMode = Append): ContinuousQuery = {
    +      triggerClock: Clock = new SystemClock()): ContinuousQuery = {
    --- End diff --
    
    I just realized we should make the constructor of `ContinuousQueryManager` be `private[sql]` since we don't want people to call it. Could you also fix it in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221434079
  
    **[Test build #59235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59235/consoleFull)** for PR 13286 at commit [`a6e2bb5`](https://github.com/apache/spark/commit/a6e2bb5b6f0838b18c84086d671b518701cbc26a).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends RunnableCommand `
      * `case class ListJarsCommand(jars: Seq[String] = Seq.empty[String]) extends RunnableCommand `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221446102
  
    **[Test build #59239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59239/consoleFull)** for PR 13286 at commit [`3a79d41`](https://github.com/apache/spark/commit/3a79d41b14535ffe4f65595126614832a094bf9b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222281071
  
    **[Test build #59542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59542/consoleFull)** for PR 13286 at commit [`e951798`](https://github.com/apache/spark/commit/e951798bf4511cabc08d242e7f1a3d7d1e653263).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for complete o...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13286


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222036201
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13286: [SPARK-15517][SQL][STREAMING] Add support for complete o...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13286
  
    The method naming with caps was intentional. We need to introduce exception
    rule for lint-java in this case.
    On Jun 2, 2016 7:36 AM, "Sandeep Singh" <no...@github.com> wrote:
    
    > @tdas <https://github.com/tdas> @marmbrus <https://github.com/marmbrus>
    > this is failing dev/lint-java
    > So we should change Append and Complete to append and complete
    >
    > [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[41,28] (naming) MethodName: Method name 'Append' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
    > [ERROR] src/main/java/org/apache/spark/sql/streaming/OutputMode.java:[52,28] (naming) MethodName: Method name 'Complete' must match pattern '^[a-z][a-z0-9][a-zA-Z0-9_]*$'.
    >
    > I've created a PR to fix this #13464
    > <https://github.com/apache/spark/pull/13464>
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/13286#issuecomment-223207602>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe/AAoerO0453BJM9uhP5Wb3fAVIDKHyv9fks5qHnnegaJpZM4ImAy->
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-222037903
  
    **[Test build #59444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59444/consoleFull)** for PR 13286 at commit [`85ce263`](https://github.com/apache/spark/commit/85ce2638cf9c9150e2258749bb894a39779d24cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13286#discussion_r64499026
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/OutputMode.java ---
    @@ -15,9 +15,10 @@
      * limitations under the License.
      */
     
    -package org.apache.spark.sql.catalyst.analysis
    +package org.apache.spark.sql;
     
    -sealed trait OutputMode
    -
    -case object Append extends OutputMode
    -case object Update extends OutputMode
    +public enum OutputMode {
    --- End diff --
    
    nit: `@Experimental`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221503471
  
    **[Test build #59268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59268/consoleFull)** for PR 13286 at commit [`58f88b8`](https://github.com/apache/spark/commit/58f88b8f829b7a1408b2ce471b4d5d0a23031ee7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221429129
  
    **[Test build #59232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59232/consoleFull)** for PR 13286 at commit [`61af057`](https://github.com/apache/spark/commit/61af0573a112a54bab05c070de7a36b8c74703dc).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15517][SQL][STREAMING] Add support for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13286#issuecomment-221435335
  
    **[Test build #59236 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59236/consoleFull)** for PR 13286 at commit [`bb0314d`](https://github.com/apache/spark/commit/bb0314d32dfd42ea6081427042e182706a83b6ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org