You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by snuyanzin <gi...@git.apache.org> on 2018/05/14 09:56:25 UTC

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW, EPOCH, ...

GitHub user snuyanzin opened a pull request:

    https://github.com/apache/flink/pull/6007

    [FLINK-8518][Table API & SQL] Support DOW, EPOCH, DECADE for EXTRACT

    dow extraction support implemented + tests
    
    
    *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
    
    *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*
    
    ## Contribution Checklist
    
      - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
      
      - Name the pull request in the form "[FLINK-XXXX] [component] Title of the pull request", where *FLINK-XXXX* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component.
      Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
    
      - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
      
      - Make sure that the change passes the automated tests, i.e., `mvn clean verify` passes. You can set up Travis CI to do that following [this guide](http://flink.apache.org/contribute-code.html#best-practices).
    
      - Each pull request should address only one issue, not mix up code from multiple issues.
      
      - Each commit in the pull request has a meaningful commit message (including the JIRA id)
    
      - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
    
    
    **(The sections below can be removed for hotfixes of typos)**
    
    ## What is the purpose of the change
    
    *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)*
    
    
    ## Brief change log
    
    *(for example:)*
      - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact*
      - *Deployments RPC transmits only the blob storage reference*
      - *TaskManagers retrieve the TaskInfo from the blob cache*
    
    
    ## Verifying this change
    
    *(Please pick either of the following options)*
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    *(or)*
    
    This change is already covered by existing tests, such as *(please describe tests)*.
    
    *(or)*
    
    This change added tests and can be verified as follows:
    
    *(example:)*
      - *Added integration tests for end-to-end deployment with large payloads (100MB)*
      - *Extended integration test for recovery after master (JobManager) failure*
      - *Added test that validates that TaskInfo is transferred only once across recoveries*
      - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes)
      - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/snuyanzin/flink _FLINK-8518]_Table_API_SQL]_Support_DOW,_EPOCH,_DECADE_for_EXTRACT

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6007
    
----
commit 4ab2389d544b9033035cb043e8436b989f2d9ee3
Author: snuyanzin <sn...@...>
Date:   2018-05-14T09:47:09Z

    [FLINK-8518][Table API and SQL] Support DOW, EPOCH, DECADE for EXTRACT
    dow extraction support implemented + tests

----


---

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTR...

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6007#discussion_r190299515
  
    --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ---
    @@ -1473,6 +1473,14 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
           "EXTRACT(DOY FROM f16)",
           "315")
     
    +    testSqlApi(
    +      "EXTRACT(DOW FROM f18)",
    +      "1")
    +
    +    testSqlApi(
    +      "EXTRACT(DOW FROM f16)",
    --- End diff --
    
    Yes sure, just have done it for DOW and DOY as well


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    sorry, closed by mistake, restored


---

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTR...

Posted by snuyanzin <gi...@git.apache.org>.

GitHub user snuyanzin reopened a pull request:

    https://github.com/apache/flink/pull/6007

    [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

    dow extraction support implemented + tests
    
    
    *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
    
    *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*
    
    ## Contribution Checklist
    
      - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
      
      - Name the pull request in the form "[FLINK-XXXX] [component] Title of the pull request", where *FLINK-XXXX* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component.
      Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
    
      - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
      
      - Make sure that the change passes the automated tests, i.e., `mvn clean verify` passes. You can set up Travis CI to do that following [this guide](http://flink.apache.org/contribute-code.html#best-practices).
    
      - Each pull request should address only one issue, not mix up code from multiple issues.
      
      - Each commit in the pull request has a meaningful commit message (including the JIRA id)
    
      - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
    
    
    **(The sections below can be removed for hotfixes of typos)**
    
    ## What is the purpose of the change
    
    *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)*
    
    
    ## Brief change log
    
    *(for example:)*
      - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact*
      - *Deployments RPC transmits only the blob storage reference*
      - *TaskManagers retrieve the TaskInfo from the blob cache*
    
    
    ## Verifying this change
    
    *(Please pick either of the following options)*
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    *(or)*
    
    This change is already covered by existing tests, such as *(please describe tests)*.
    
    *(or)*
    
    This change added tests and can be verified as follows:
    
    *(example:)*
      - *Added integration tests for end-to-end deployment with large payloads (100MB)*
      - *Extended integration test for recovery after master (JobManager) failure*
      - *Added test that validates that TaskInfo is transferred only once across recoveries*
      - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes)
      - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/snuyanzin/flink _FLINK-8518]_Table_API_SQL]_Support_DOW,_EPOCH,_DECADE_for_EXTRACT

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6007.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6007
    
----
commit 4ab2389d544b9033035cb043e8436b989f2d9ee3
Author: snuyanzin <sn...@...>
Date:   2018-05-14T09:47:09Z

    [FLINK-8518][Table API and SQL] Support DOW, EPOCH, DECADE for EXTRACT
    dow extraction support implemented + tests

----


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW, EPOCH, DECADE ...

Posted by walterddr <gi...@git.apache.org>.

Github user walterddr commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    Wasn't sure this is the full PR or just partial implementation, but I don't see `EPOCH` or `DECADE` in main or test though. 


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by twalthr <gi...@git.apache.org>.

Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    Hi @snuyanzin,
    yes, I meant `docs/dev/table/sql.md`. Right now it is not documented which units are supported by `EXTRACT`. Just because `DOW` is in the reserved keywords section doesn't mean that we support it. I'm fine with having the same schema than Calcite. We can create one or more issues for the docs and the additional synonyms. Issues that need a newer Calcite version can be converted to a subtask of FLINK-9134.


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW, EPOCH, DECADE ...

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    @walterddr thank you for your comment
    it is just DOW implementation because to implement EPOCH we have to wait for CALCITE-2303 resolution
    about DECADE - there are 2 ways: one is also to wait for CALCITE-2303
    another one is to do the same changes in _org.apache.calcite.avatica.util.DateTimeUtils#julianExtract_ (Flink has its own version of this class) as proposed in CALCITE-2303 and then do other stuff in Flink



---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    @twalthr thank you for your comment.
    I took a similar ticket in Calcite/Avatica from which the current depends on and I pointed this discrepancy in the [comment ](https://issues.apache.org/jira/browse/CALCITE-2303?focusedCommentId=16480420&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16480420). To fix i  t the class `org.apache.calcite.avatica.util.DateTimeUtils` should be changed which is presented in both avatica and flink.
    At the same time different db's provide a little bit different behavior (please have a look at the links below)
    looks like in case of day of week extraction for Oracle, MySql Sunday = 1, for Postgresql Sunday = 0, for Microsoft SQL Server it depends on property set by user
    On the other hand there is a standard [ISO8601](https://en.wikipedia.org/wiki/ISO_week_date) which also defines weekday, day of years e.g. This is also a part of [CALCITE-2303](https://issues.apache.org/jira/browse/CALCITE-2303).
    So my suggestion is within this ticket provide support for all operations which could come from [CALCITE-2303](https://issues.apache.org/jira/browse/CALCITE-2303): `dow`, `decade`, `epoch`, `isodow`, `isoyear`, `microsecond` and `millisecond`
    However yes you are right it is required to choose what approach for dayOfWeek to use. IMHO the simplest way is to use whatever Calcite/avatica provides
    At the same time [here](https://issues.apache.org/jira/browse/CALCITE-2303?focusedCommentId=16482767&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16482767)  Julian Hyde says that 
    > In short, yes, do whatever PostgreSQL does. 
    
    So they would like to align the behavior with Postrgresql
    
    About different db's day of week
    So the Postgresql's behavior is described [here ](https://www.postgresql.org/docs/9.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT) with Sunday = 0 + it supports ISO8601 via `isodow `and `isoyear`
    the Oracle's behavior is described [here ](https://docs.oracle.com/cd/E37483_01/server.751/es_eql/src/ceql_functions_date_extract.html) with Sunday = 1, so far have not found info about support of 8601 via `extract` while it is via `to_char/to_date`
    Microsoft SQL Server [allows to set up the first weekday](https://docs.microsoft.com/en-us/sql/t-sql/statements/set-datefirst-transact-sql?view=sql-server-2017) at the same time extraction is done via `datepart` not `extract`
    MySQL [provides weekday](https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_extract) with Sunday = 1



---

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTR...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/6007


---

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTR...

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin closed the pull request at:

    https://github.com/apache/flink/pull/6007


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by twalthr <gi...@git.apache.org>.

Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    Sorry, for the delay @snuyanzin. I was on vacation. I will have a look at your update.


---

[GitHub] flink pull request #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTR...

Posted by twalthr <gi...@git.apache.org>.

Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6007#discussion_r190226613
  
    --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/expressions/ScalarFunctionsTest.scala ---
    @@ -1473,6 +1473,14 @@ class ScalarFunctionsTest extends ScalarTypesTestBase {
           "EXTRACT(DOY FROM f16)",
           "315")
     
    +    testSqlApi(
    +      "EXTRACT(DOW FROM f18)",
    +      "1")
    +
    +    testSqlApi(
    +      "EXTRACT(DOW FROM f16)",
    --- End diff --
    
    Can you add a test (maybe in `ScalarFunctionsValidationTest`?) for checking the behavior on a `TIME` type? 


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by twalthr <gi...@git.apache.org>.

Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    Thank you for the effort @snuyanzin. I was not aware about the amount of work you put into this issue before opening the PR. I'm fine with sticking to Calcite's behavior. Can you add some documentation to `sqlApi.md` such that users know about the semantics?


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    yes, done


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    @twalthr thank you for your comment but could you please clarify this
    
    > Can you add some documentation to sqlApi.md such that users know about the semantics?
    
    1. Am I right that you mean `docs/dev/table/sql.md`?
    2. Currently I see the general explanation of extract and reserved words where DOW already specified.
    From this point of view I do not see what could be updated right now. At the same time I have a proposal to go to the similar way as Calcite does. [Here](https://calcite.apache.org/docs/reference.html) there is a link to their functions including date/time. Among extract they also have synonyms e.g. 
    >MONTH(date) | Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12.
    >WEEK(date) | Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53.
    >DAYOFYEAR(date) | Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366.
    and etc.
    So I suggest to introduce the same synonyms in flink (just via usage of existing in Calcite) and organize documentation for them in a similar way
    
    
    
    
    



---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by snuyanzin <gi...@git.apache.org>.

Github user snuyanzin commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    > Just because DOW is in the reserved keywords section doesn't mean that we support it.
    Definitely agree however my point was that the same situation is for DOY, CENTURY and others...
    Now I added information about date/time functions synonyms which currently work in flink. I was surprised but they work without any efforts => I just added more tests related to them and documentation.
    @twalthr could you please have a look?


---

[GitHub] flink issue #6007: [FLINK-8518][Table API & SQL] Support DOW for EXTRACT

Posted by twalthr <gi...@git.apache.org>.

Github user twalthr commented on the issue:

    https://github.com/apache/flink/pull/6007
  
    Thanks for the PR @snuyanzin. I created the issue because it seems that this is more than a one line change. I tested your queries in Postgres SQL and it returns 0 instead of 1. We should have the same semantics as popular database vendors. How is Oracle defining this feature?


---