You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/02 10:49:56 UTC

[GitHub] [spark] MaxGekk opened a new pull request #25998: [SPARK-29328][SQL] Fix calculation of mean seconds per month

MaxGekk opened a new pull request #25998: [SPARK-29328][SQL] Fix calculation of mean seconds per month
URL: https://github.com/apache/spark/pull/25998
 
 
   ### What changes were proposed in this pull request?
   I introduced new constants `SECONDS_PER_MONTH` and `MILLIS_PER_MONTH`, and reused it in calculations of seconds/milliseconds per month. `SECONDS_PER_MONTH` is 2629746 because the average year of the Gregorian calendar is 365.2425 days long or 60 * 60 * 24 * 365.2425 = 31556952.0 = 12 * 2629746 seconds per year.
   
   ### Why are the changes needed?
   Spark uses the proleptic Gregorian calendar (see https://issues.apache.org/jira/browse/SPARK-26651) in which the average year is 365.2425 days (see https://en.wikipedia.org/wiki/Gregorian_calendar) but existing implementation assumes 31 days per months or 12 * 31 = 372 days. That's far away from the the truth.
   
   ### Does this PR introduce any user-facing change?
    Yes, the changes affect at least 3 methods in `GroupStateImpl`, `EventTimeWatermark` and `MonthsBetween`. For example, the `month_between()` function will return different result in some cases.
   
   Before:
   ```sql
   spark-sql> select months_between('2019-09-15', '1970-01-01');
   596.4516129
   ```
   After:
   ```sql
   spark-sql> select months_between('2019-09-15', '1970-01-01');
   596.45996838
   ```
   
   ### How was this patch tested?
   By existing test suite `DateTimeUtilsSuite`, `DateFunctionsSuite` and `DateExpressionsSuite`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org