You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by raghavgautam <gi...@git.apache.org> on 2018/09/13 19:59:25 UTC

[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

GitHub user raghavgautam opened a pull request:

    https://github.com/apache/spark/pull/22414

    [SPARK-25424][SQL] Window duration and slide duration with negative v…

    …alues should fail fast
    
    (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25424)
    
    ## What changes were proposed in this pull request?
    
    The constructors of TimeWindow class and apply method of companion object has been changed to ensure that window duration and slide duration is not be allowed to take negative values.
    
    ## How was this patch tested?
    
    I have added UTs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/raghavgautam/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22414.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22414
    
----
commit e63c65ca2bf9eb6633dc59418b6bc27bb0042216
Author: Raghav Kumar Gautam <ra...@...>
Date:   2018-09-13T17:42:42Z

    [SPARK-25424][SQL] Window duration and slide duration with negative values should fail fast
    
    (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25424)
    
    ## What changes were proposed in this pull request?
    
    The constructors of TimeWindow class and apply method of companion object has been changed to ensure that window duration and slide duration is not be allowed to take negative values.
    
    ## How was this patch tested?
    
    I have added UTs.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Please write the current behaviour (you described above) in the PR description?
    Also, can you format the PR description, too? you don't need the link to the jira.
    ```
    …alues should fail fast
    
    (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25424)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217793474
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    +
    +      val thrown = intercept[IllegalArgumentException] {
    +        (windowDuration, slideDuration) match {
    +          case (wd: String, sd: String) => TimeWindow(Literal(10L), wd, sd, "0 seconds")
    +          case (wd: Long, sd: Long) => TimeWindow(Literal(10L), wd, sd, 0)
    +          case (wd: Int, sd: Int) => TimeWindow(Literal(10L), wd, sd, 0)
    +        }
    +
    +      }
    +      def isNonPositive(s: Any): Boolean = {
    +        val trimmed = s.toString.trim
    +        trimmed.startsWith("-") || trimmed.startsWith("0")
    +      }
    +      val expectedMsg =
    +        if (isNonPositive(windowDuration)) {
    +          s"requirement failed: The window duration must be a " +
    +            s"positive integer, long or string literal, found: ${windowDuration}"
    +        } else if (isNonPositive(slideDuration)) {
    +          s"requirement failed: The slide duration must be a " +
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @srowen Can you give examples ? From what I have seen, invalid values because of typo say `1 sec` throw exception at groupby().


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Yeah, the test that failed here asserts that it's an `AnalysisException`. I guess it could be removed. The thing is, many other cases are still handled as `AnalysisException`. Maybe it's best to stay consistent; I didn't realize this. Is there any other advantage? seems like it fails just as fast either way?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218153205
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,6 +35,10 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
    --- End diff --
    
    The constructor has no information about the unit of the duration e.g. -1 day would get converted to -86,400,000,000 in the error message, which is confusing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    How about this? https://github.com/apache/spark/compare/master...maropu:pr22414
    IMO simple fixes and tests are better.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @maropu Please continue with the review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    **[Test build #4374 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4374/testReport)** for PR 22414 at commit [`89e05f2`](https://github.com/apache/spark/commit/89e05f261c9d9495ef04d4d3cccb49c6b9a587fb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218635678
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,42 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long, slideDurationMicroSec: Long,
    +                                          windowDuration: Any, slideDuration: Any): Unit = {
    --- End diff --
    
    ```
    private def checkWindowAndSlideDuration(
        windowDurationMicroSec: Long,
        slideDurationMicroSec: Long,
        windowDuration: Any,
        slideDuration: Any): Unit = {
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218530222
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,6 +123,51 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    --- End diff --
    
    This will call the following function.
    https://github.com/apache/spark/blob/0bdbefe9dd1e7c95c58ea6b52d3b264794abbc0e/core/src/main/scala/org/apache/spark/internal/Logging.scala#L53


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @maropu Can you continue with the review ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r220380587
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,6 +123,51 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    --- End diff --
    
    Deleted.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    The code below throws following error. Note that the error is produced at the time of count() call instead of window() call.
    ```java
    val df = spark.readStream
      .format("rate")
      .option("numPartitions", "2")
      .option("rowsPerSecond", "10")
      .load()
      .filter("value % 20 == 0")
      .withWatermark("timestamp", "10 seconds")
      .groupBy(window($"timestamp", "-10 seconds", "5 seconds"))
      .count()
    ```
    Error:
    ```txt
    cannot resolve 'timewindow(timestamp, -10000000, 5000000, 0)' due to data type mismatch: The window duration (-10000000) must be greater than 0.;;
    'Aggregate [timewindow(timestamp#47-T10000ms, -10000000, 5000000, 0)], [timewindow(timestamp#47-T10000ms, -10000000, 5000000, 0) AS window#53, count(1) AS count#57L]
    +- AnalysisBarrier
          +- EventTimeWatermark timestamp#47: timestamp, interval 10 seconds
             +- Filter ((value#48L % cast(20 as bigint)) = cast(0 as bigint))
                +- StreamingRelationV2 org.apache.spark.sql.execution.streaming.RateSourceProvider@52e44f71, rate, Map(rowsPerSecond -> 10, numPartitions -> 2), [timestamp#47, value#48L], StreamingRelation DataSource(org.apache.spark.sql.SparkSession@221961f2,rate,List(),None,List(),None,Map(rowsPerSecond -> 10, numPartitions -> 2),None), rate, [timestamp#45, value#46L]
    
    org.apache.spark.sql.AnalysisException: cannot resolve 'timewindow(timestamp, -10000000, 5000000, 0)' due to data type mismatch: The window duration (-10000000) must be greater than 0.;;
    'Aggregate [timewindow(timestamp#47-T10000ms, -10000000, 5000000, 0)], [timewindow(timestamp#47-T10000ms, -10000000, 5000000, 0) AS window#53, count(1) AS count#57L]
    +- AnalysisBarrier
          +- EventTimeWatermark timestamp#47: timestamp, interval 10 seconds
             +- Filter ((value#48L % cast(20 as bigint)) = cast(0 as bigint))
                +- StreamingRelationV2 org.apache.spark.sql.execution.streaming.RateSourceProvider@52e44f71, rate, Map(rowsPerSecond -> 10, numPartitions -> 2), [timestamp#47, value#48L], StreamingRelation DataSource(org.apache.spark.sql.SparkSession@221961f2,rate,List(),None,List(),None,Map(rowsPerSecond -> 10, numPartitions -> 2),None), rate, [timestamp#45, value#46L]
    
    	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:93)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:95)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:107)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:106)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:118)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:122)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:122)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:127)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:127)
    	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:95)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
    	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
    	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
    	at org.apache.spark.sql.RelationalGroupedDataset.toDF(RelationalGroupedDataset.scala:66)
    	at org.apache.spark.sql.RelationalGroupedDataset.count(RelationalGroupedDataset.scala:239)
    	at com.hortonworks.qe.HelloScalaTest$$anonfun$2.apply$mcV$sp(HelloScalaTest.scala:39)
    	at com.hortonworks.qe.HelloScalaTest$$anonfun$2.apply(HelloScalaTest.scala:28)
    	at com.hortonworks.qe.HelloScalaTest$$anonfun$2.apply(HelloScalaTest.scala:28)
    	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    	at org.scalatest.Transformer.apply(Transformer.scala:22)
    	at org.scalatest.Transformer.apply(Transformer.scala:20)
    	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
    	at org.scalatest.TestSuite$class.withFixture(TestSuite.scala:196)
    	at org.scalatest.FunSuite.withFixture(FunSuite.scala:1560)
    	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
    	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
    	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
    	at com.hortonworks.qe.AbstractTest.org$scalatest$BeforeAndAfter$$super$runTest(AbstractTest.scala:11)
    	at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
    	at com.hortonworks.qe.AbstractTest.runTest(AbstractTest.scala:11)
    	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
    	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
    	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
    	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
    	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
    	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
    	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
    	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
    	at org.scalatest.Suite$class.run(Suite.scala:1147)
    	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
    	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
    	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
    	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
    	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
    	at com.hortonworks.qe.AbstractTest.org$scalatest$BeforeAndAfter$$super$run(AbstractTest.scala:11)
    	at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:258)
    	at com.hortonworks.qe.AbstractTest.run(AbstractTest.scala:11)
    	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
    	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
    	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
    	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
    	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
    	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
    	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
    	at org.scalatest.tools.Runner$.run(Runner.scala:850)
    	at org.scalatest.tools.Runner.run(Runner.scala)
    	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
    	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218280792
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,6 +35,10 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
    --- End diff --
    
    Either way, we'd be better to avoid duplicate error checks. Can we make it simpler, e.g., how about making a helper func to check these requirements?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217576554
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,29 +35,15 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
     
       //////////////////////////
       // SQL Constructors
       //////////////////////////
     
    -  def this(
    -      timeColumn: Expression,
    -      windowDuration: Expression,
    -      slideDuration: Expression,
    -      startTime: Expression) = {
    -    this(timeColumn, TimeWindow.parseExpression(windowDuration),
    -      TimeWindow.parseExpression(slideDuration), TimeWindow.parseExpression(startTime))
    -  }
    -
    -  def this(timeColumn: Expression, windowDuration: Expression, slideDuration: Expression) = {
    -    this(timeColumn, TimeWindow.parseExpression(windowDuration),
    -      TimeWindow.parseExpression(slideDuration), 0)
    -  }
    -
    -  def this(timeColumn: Expression, windowDuration: Expression) = {
    -    this(timeColumn, windowDuration, windowDuration)
    -  }
    --- End diff --
    
    You cannot remove these constructors.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    ... look at the test failure at https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4374/testReport/org.apache.spark.sql.catalyst.analysis/AnalysisErrorSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ . Look at the code that generated that exception and all the other things it checks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r220380561
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,42 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long, slideDurationMicroSec: Long,
    +                                          windowDuration: Any, slideDuration: Any): Unit = {
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @tdas Can you please take a look ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @srowen Ok. I will close this PR and the bug.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217576901
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    +
    +      val thrown = intercept[IllegalArgumentException] {
    +        (windowDuration, slideDuration) match {
    +          case (wd: String, sd: String) => TimeWindow(Literal(10L), wd, sd, "0 seconds")
    +          case (wd: Long, sd: Long) => TimeWindow(Literal(10L), wd, sd, 0)
    +          case (wd: Int, sd: Int) => TimeWindow(Literal(10L), wd, sd, 0)
    +        }
    +
    +      }
    +      def isNonPositive(s: Any): Boolean = {
    +        val trimmed = s.toString.trim
    +        trimmed.startsWith("-") || trimmed.startsWith("0")
    +      }
    +      val expectedMsg =
    +        if (isNonPositive(windowDuration)) {
    +          s"requirement failed: The window duration must be a " +
    --- End diff --
    
    remove `s`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217898513
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,6 +35,10 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
    --- End diff --
    
    We cannot check these requirements only in the constructor?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    **[Test build #4374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4374/testReport)** for PR 22414 at commit [`89e05f2`](https://github.com/apache/spark/commit/89e05f261c9d9495ef04d4d3cccb49c6b9a587fb).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r223865925
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,44 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long,
    +      slideDurationMicroSec: Long,
    +      windowDuration: Any,
    --- End diff --
    
    @srowen Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217576774
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    --- End diff --
    
    Plz move this import into the head.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    btw, what's the current behaivour of Spark in case of the nagative value?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Plz do not change the format: `Problem` -> `What changes were proposed in this pull request?`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218613023
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,6 +35,10 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
    --- End diff --
    
    In scala, this() call has to be first statement in a constructor, this limits the extent to which code deduplication can be done. I have made changes to use a helper function for other instances.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r222614709
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,44 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long,
    +      slideDurationMicroSec: Long,
    +      windowDuration: Any,
    --- End diff --
    
    Why Any instead of String? do we even need this method vs just putting its two lines in the apply method above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @raghavgautam Can you update this pr based on my patch? (sorry, but I have no time to take this over...)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @maropu I have updated the PR description and answered your question, can you continue with your review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217576920
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    +
    +      val thrown = intercept[IllegalArgumentException] {
    +        (windowDuration, slideDuration) match {
    +          case (wd: String, sd: String) => TimeWindow(Literal(10L), wd, sd, "0 seconds")
    +          case (wd: Long, sd: Long) => TimeWindow(Literal(10L), wd, sd, 0)
    +          case (wd: Int, sd: Int) => TimeWindow(Literal(10L), wd, sd, 0)
    +        }
    +
    +      }
    +      def isNonPositive(s: Any): Boolean = {
    +        val trimmed = s.toString.trim
    +        trimmed.startsWith("-") || trimmed.startsWith("0")
    +      }
    +      val expectedMsg =
    +        if (isNonPositive(windowDuration)) {
    +          s"requirement failed: The window duration must be a " +
    +            s"positive integer, long or string literal, found: ${windowDuration}"
    +        } else if (isNonPositive(slideDuration)) {
    +          s"requirement failed: The slide duration must be a " +
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r223853069
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,44 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long,
    +      slideDurationMicroSec: Long,
    +      windowDuration: Any,
    --- End diff --
    
    @srowen The method has been called in two places:
    https://github.com/apache/spark/blob/3ba3252c9bf56f9cde04c1aa757e9f50826abbd2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L38
    and
    https://github.com/apache/spark/blob/3ba3252c9bf56f9cde04c1aa757e9f50826abbd2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L163
    
    windowDuration, slideDuration argument that is passed is Long in one case and String in another, that is why I have used type Any.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217793345
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    +
    +      val thrown = intercept[IllegalArgumentException] {
    +        (windowDuration, slideDuration) match {
    +          case (wd: String, sd: String) => TimeWindow(Literal(10L), wd, sd, "0 seconds")
    +          case (wd: Long, sd: Long) => TimeWindow(Literal(10L), wd, sd, 0)
    +          case (wd: Int, sd: Int) => TimeWindow(Literal(10L), wd, sd, 0)
    +        }
    +
    +      }
    +      def isNonPositive(s: Any): Boolean = {
    +        val trimmed = s.toString.trim
    +        trimmed.startsWith("-") || trimmed.startsWith("0")
    +      }
    +      val expectedMsg =
    +        if (isNonPositive(windowDuration)) {
    +          s"requirement failed: The window duration must be a " +
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    The problem is that the error message that we start getting is:
    ```
    requirement failed: The window duration must be positive, but found: -1209600000000
    ```
    Where as the error that I would expect is:
    ```
    requirement failed: The window duration must be a positive integer, long or string literal, found: -2 week
    ```
    
    By looking at -1209600000000 it's not clear that source of problem is -2 week.
    
    However it does simplify the code. If this is acceptable please commit your patch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217793214
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -35,29 +35,15 @@ case class TimeWindow(
       with ImplicitCastInputTypes
       with Unevaluable
       with NonSQLExpression {
    +  require(windowDuration > 0, "The window duration must be " +
    +    s"a positive integer, long or string literal, found: $windowDuration")
    +  require(slideDuration > 0, "The slide duration must be " +
    +    s"a positive integer, long or string literal, found: $slideDuration")
     
       //////////////////////////
       // SQL Constructors
       //////////////////////////
     
    -  def this(
    -      timeColumn: Expression,
    -      windowDuration: Expression,
    -      slideDuration: Expression,
    -      startTime: Expression) = {
    -    this(timeColumn, TimeWindow.parseExpression(windowDuration),
    -      TimeWindow.parseExpression(slideDuration), TimeWindow.parseExpression(startTime))
    -  }
    -
    -  def this(timeColumn: Expression, windowDuration: Expression, slideDuration: Expression) = {
    -    this(timeColumn, TimeWindow.parseExpression(windowDuration),
    -      TimeWindow.parseExpression(slideDuration), 0)
    -  }
    -
    -  def this(timeColumn: Expression, windowDuration: Expression) = {
    -    this(timeColumn, windowDuration, windowDuration)
    -  }
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    I get it, but this becomes inconsistent, right? other invalid window values aren't handled the same way. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r223861364
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala ---
    @@ -137,16 +139,44 @@ object TimeWindow {
           "an integer, long or string literal.")
       }
     
    +  private def parseWindowDuration(windowDuration: Expression): Long = {
    +    val windowDurationMicroSec = parseExpression(windowDuration)
    +    require(windowDurationMicroSec > 0, "The window duration must be " +
    +      s"a positive integer, long or string literal, found: $windowDuration")
    +    windowDurationMicroSec
    +  }
    +
    +  private def parseSlideDuration(slideDuration: Expression): Long = {
    +    val slideDurationMicroSec = parseExpression(slideDuration)
    +    require(slideDurationMicroSec > 0, "The slide duration must be " +
    +      s"a positive integer, long or string literal, found: $slideDuration")
    +    slideDurationMicroSec
    +  }
    +
       def apply(
           timeColumn: Expression,
           windowDuration: String,
           slideDuration: String,
           startTime: String): TimeWindow = {
    +    val windowDurationMicroSec = getIntervalInMicroSeconds(windowDuration)
    +    val slideDurationMicroSec = getIntervalInMicroSeconds(slideDuration)
    +    checkWindowAndSlideDuration(windowDurationMicroSec, slideDurationMicroSec,
    +      windowDuration, slideDuration)
         TimeWindow(timeColumn,
    -      getIntervalInMicroSeconds(windowDuration),
    -      getIntervalInMicroSeconds(slideDuration),
    +      windowDurationMicroSec,
    +      slideDurationMicroSec,
           getIntervalInMicroSeconds(startTime))
       }
    +
    +  private def checkWindowAndSlideDuration(windowDurationMicroSec: Long,
    +      slideDurationMicroSec: Long,
    +      windowDuration: Any,
    --- End diff --
    
    Hm but the simplest answer isn't to ignore types. One caller can call with strings not longs. It's just a toString


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam closed the pull request at:

    https://github.com/apache/spark/pull/22414


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @srowen Key benefit is fail fast particularly when running from spark-shell.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r217793251
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,11 +122,57 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    import org.scalatest.prop.TableDrivenPropertyChecks.{Table, forAll => forAllRows}
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218634964
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,6 +123,51 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    --- End diff --
    
    I meant we really need this log even in tests?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22414: [SPARK-25424][SQL] Window duration and slide duration wi...

Posted by raghavgautam <gi...@git.apache.org>.
Github user raghavgautam commented on the issue:

    https://github.com/apache/spark/pull/22414
  
    @maropu Can you take a look ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22414: [SPARK-25424][SQL] Window duration and slide dura...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22414#discussion_r218280823
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeWindowSuite.scala ---
    @@ -122,6 +123,51 @@ class TimeWindowSuite extends SparkFunSuite with ExpressionEvalHelper with Priva
         }
       }
     
    +  test("windowDuration and slideDuration should be positive.") {
    +    val fractions = Table(
    +      ("windowDuration", "slideDuration"), // First tuple defines column names
    +      ("-2 seconds", "1 seconds"),
    +      ("1 seconds", "-2 seconds"),
    +      ("0 seconds", "1 seconds"),
    +      ("1 seconds", "0 seconds"),
    +      ("-2 seconds", "-2 seconds"),
    +      ("-2 seconds", "-2 hours"),
    +      ("0 seconds", "0 seconds"),
    +      (-2L, 2L),
    +      (2L, -2L),
    +      (-2, 2),
    +      (2, -2)
    +    )
    +    forAllRows(fractions) { (windowDuration: Any, slideDuration: Any) =>
    +      logInfo(s"windowDuration = $windowDuration slideDuration = $slideDuration")
    --- End diff --
    
    What does this log means?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org