You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by setjet <gi...@git.apache.org> on 2017/05/24 00:28:28 UTC

[GitHub] spark pull request #18080: [Spark-20771][SQL] Make weekofyear more intuitive

GitHub user setjet opened a pull request:

    https://github.com/apache/spark/pull/18080

    [Spark-20771][SQL] Make weekofyear more intuitive

    ## What changes were proposed in this pull request?
    The current implementation of weekofyear implements ISO8601, which results in the following unintuitive behaviour: 
    
    weekofyear("2017-01-01") returns 52 
    
    In MySQL, this would return 1 (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_weekofyear), although it could return 52 if specified specifically (https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_week).
    
    I therefore think instead of only changing the behavior as specified in the JIRA, it would be better to support both. Hence  I've added an additional function.
    
    ## How was this patch tested?
    Added some unit tests
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/setjet/spark SPARK-20771

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18080
    
----
commit 7235f4a731f83a3a81fd65846179efaf38354bfa
Author: setjet <ru...@gmail.com>
Date:   2017-05-24T00:20:30Z

    added additional weekofyear function

commit 057ede5b68cc7980987ae181156f376f84c41809
Author: setjet <ru...@gmail.com>
Date:   2017-05-24T00:22:54Z

    updated desc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by setjet <gi...@git.apache.org>.
Github user setjet commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Coming to think of it, it might actually be better to switch it around: have ISO8601 as function weekofyear, and make a separate function for gregorian because ISO is more of a commonly used term.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by setjet <gi...@git.apache.org>.
Github user setjet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18080#discussion_r118820456
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -402,23 +402,40 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa
       }
     }
     
    +// scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_(date) - Returns the week of the year of the given date.",
    +  usage = "_FUNC_(date[, format]) - Returns the week of the year of the given date. Defaults to ISO 8601 standard, but can be gregorian specific",
       extended = """
         Examples:
           > SELECT _FUNC_('2008-02-20');
            8
    +      > SELECT _FUNC_('2017-01-01', 'gregorian');
    +       1
    +      > SELECT _FUNC_('2017-01-01', 'iso');
    +       52
    +      > SELECT _FUNC_('2017-01-01');
    +       52
       """)
    -case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
    +// scalastyle:on line.size.limit
    +case class WeekOfYear(child: Expression, format: Expression) extends
    +  UnaryExpression with ImplicitCastInputTypes {
    +
    +  def this(child: Expression) = {
    +    this(child, Literal("iso"))
    +  }
     
       override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
     
       override def dataType: DataType = IntegerType
     
    +  @transient private lazy val minimalDays = {
    +    if ("gregorian".equalsIgnoreCase(format.toString)) 1 else 4
    --- End diff --
    
    I did a bit of research, and there seem to be no other formats. However, some systems (such as MySQL and Java), allow the first day of the week to be defined  as well. Some countries in the middle east have a week on Friday/Saturday, or even Thursday/Friday. 
    I will update the PR to allow users to override the first day of the week, as well as specify how the first week is defined (1 iso standard: week with more than half of the days, i.e. Thursday in a Monday-Sunday week. 2 gregorian: week with first day of the new year)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by setjet <gi...@git.apache.org>.
Github user setjet commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    I agree that we shouldn't change the behavior, hence I suggested we could do it the other way around: make a new function for gregorian  instead and leave weekofyear as is.
    
    I suppose we could define the function as follows: _FUNC_(date[, gregorian])


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18080


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Is this variant available in any other DB? A lot of the goal of providing built-in functions is compatibility. Beyond that a lot of things are better handled with UDFs for special cases, not new built-ins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    I think that if Spark's behavior matches Hive's, that's what we want here. Other variations can be implemented in UDFs, which provide all the flexibility you'd want. These functions exist in all kinds of variations in SQL databases because UDFs are hard or unavailable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18080#discussion_r118803850
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -402,23 +402,40 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa
       }
     }
     
    +// scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_(date) - Returns the week of the year of the given date.",
    +  usage = "_FUNC_(date[, format]) - Returns the week of the year of the given date. Defaults to ISO 8601 standard, but can be gregorian specific",
       extended = """
         Examples:
           > SELECT _FUNC_('2008-02-20');
            8
    +      > SELECT _FUNC_('2017-01-01', 'gregorian');
    +       1
    +      > SELECT _FUNC_('2017-01-01', 'iso');
    +       52
    +      > SELECT _FUNC_('2017-01-01');
    +       52
       """)
    -case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
    +// scalastyle:on line.size.limit
    +case class WeekOfYear(child: Expression, format: Expression) extends
    +  UnaryExpression with ImplicitCastInputTypes {
    +
    +  def this(child: Expression) = {
    +    this(child, Literal("iso"))
    +  }
     
       override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
     
       override def dataType: DataType = IntegerType
     
    +  @transient private lazy val minimalDays = {
    +    if ("gregorian".equalsIgnoreCase(format.toString)) 1 else 4
    --- End diff --
    
    How many formats the other DB/systems allow? Could you do a search? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by setjet <gi...@git.apache.org>.
Github user setjet commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18080#discussion_r118820467
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -402,23 +402,40 @@ case class DayOfMonth(child: Expression) extends UnaryExpression with ImplicitCa
       }
     }
     
    +// scalastyle:off line.size.limit
     @ExpressionDescription(
    -  usage = "_FUNC_(date) - Returns the week of the year of the given date.",
    +  usage = "_FUNC_(date[, format]) - Returns the week of the year of the given date. Defaults to ISO 8601 standard, but can be gregorian specific",
       extended = """
         Examples:
           > SELECT _FUNC_('2008-02-20');
            8
    +      > SELECT _FUNC_('2017-01-01', 'gregorian');
    +       1
    +      > SELECT _FUNC_('2017-01-01', 'iso');
    +       52
    +      > SELECT _FUNC_('2017-01-01');
    +       52
       """)
    -case class WeekOfYear(child: Expression) extends UnaryExpression with ImplicitCastInputTypes {
    +// scalastyle:on line.size.limit
    +case class WeekOfYear(child: Expression, format: Expression) extends
    +  UnaryExpression with ImplicitCastInputTypes {
    +
    +  def this(child: Expression) = {
    +    this(child, Literal("iso"))
    +  }
     
       override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
     
       override def dataType: DataType = IntegerType
     
    +  @transient private lazy val minimalDays = {
    +    if ("gregorian".equalsIgnoreCase(format.toString)) 1 else 4
    --- End diff --
    
    It will still default to ISO stanards with Monday-Sunday week of course, but now users can override it in any way they would like


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    I don't think you can just change the behavior. It would possibly break apps and I presume no longer matches Hive. If it already implements a standard too,  it sounds like it is correct. A second method seems like API clutter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18080: [Spark-20771][SQL] Make weekofyear more intuitive

Posted by setjet <gi...@git.apache.org>.
Github user setjet commented on the issue:

    https://github.com/apache/spark/pull/18080
  
    This variant is available in other DB's, albeit with slightly different function and parameter naming. For example, MySQL allows it via the `week()` function: http://www.w3resource.com/mysql/date-and-time-functions/mysql-week-function.php
    
    In this case, you pass in an integer that specifies which permutation you want. Please note that if you look at the table, the 'Week 1 is the first week …' column is the difference between gregorian and iso.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org