You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by ijokarumawak <gi...@git.apache.org> on 2018/01/23 07:41:16 UTC

[GitHub] nifi pull request #2424: NIFI-4393: Handle database specific identifier esca...

GitHub user ijokarumawak opened a pull request:

    https://github.com/apache/nifi/pull/2424

    NIFI-4393: Handle database specific identifier escape characters

    QueryDatabaseTable and GenerateTableFetch processors were not able to
    use max value state as expected, if max value column was wrapped with
    escape characters. Due to a mis-match between computed state keys
    and actual keys used in the managed state. State keys computed by
    getStateKey method included escape characters while actual stored keys
    did not. Resulted querying the same dataset again and again.
    
    This commit added unwrapIdentifier method to DatabaseAdapter class to
    remove database specific escape characters for identifiers such as table
    and column names, so that max value state keys are populated correctly
    even if identifiers are wrapped with escape characters.
    
    This commit also added new DatabaseAdapter for MySQL, to handle MySQL
    specific identifier escape with back-ticks.
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijokarumawak/nifi nifi-4393

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2424
    
----
commit 93fbb976a246a75aaa6206fdafe68c6ba3ed571a
Author: Koji Kawamura <ij...@...>
Date:   2018-01-23T06:15:36Z

    NIFI-4393: Handle database specific identifier escape characters
    
    QueryDatabaseTable and GenerateTableFetch processors were not able to
    use max value state as expected, if max value column was wrapped with
    escape characters. Due to a mis-match between computed state keys
    and actual keys used in the managed state. State keys computed by
    getStateKey method included escape characters while actual stored keys
    did not. Resulted querying the same dataset again and again.
    
    This commit added unwrapIdentifier method to DatabaseAdapter class to
    remove database specific escape characters for identifiers such as table
    and column names, so that max value state keys are populated correctly
    even if identifiers are wrapped with escape characters.
    
    This commit also added new DatabaseAdapter for MySQL, to handle MySQL
    specific identifier escape with back-ticks.

----


---

[GitHub] nifi issue #2424: NIFI-4393: Handle database specific identifier escape char...

Posted by MikeThomsen <gi...@git.apache.org>.

Github user MikeThomsen commented on the issue:

    https://github.com/apache/nifi/pull/2424
  
    +1 LGTM merged.


---

[GitHub] nifi issue #2424: NIFI-4393: Handle database specific identifier escape char...

Posted by pvillard31 <gi...@git.apache.org>.

Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/2424
  
    Code LGTM @ijokarumawak. I'm a +1 on this PR but will wait a bit in case @mattyb149 or someone else wants to double check.


---

[GitHub] nifi issue #2424: NIFI-4393: Handle database specific identifier escape char...

Posted by mattyb149 <gi...@git.apache.org>.

Github user mattyb149 commented on the issue:

    https://github.com/apache/nifi/pull/2424
  
    Another approach, rather than "unwrapping" column names that are specified with dialect-specific characters in the properties, is to have them specified without the characters in the property, and the generated SQL would "wrap" them with the characters. This is the way I've seen it done before, by adding "getStartQuote" and "getEndQuote" methods to the DatabaseAdapter (possibly one set for table/DB names and one for column names in case they are specified differently in the dialect, I think at least one DB has different quotes for tables vs columns but I can't think of which one offhand). With the "wrap" approach, there are two downsides:
    
    1) Users will have to change their properties to remove the special characters. However if this bug is saying that the special-character approach already doesn't work, then perhaps this is a non-issue.
    2) We won't support column names that have commas in them. Again, we don't currently support this unless it happens to work with the special characters.
    
    The upside is that the property values would be more "natural", just a comma-separated list of column names (possibly with spaces or other values in them), and we don't have to rely on a regex to "unwrap" them, rather we would only "wrap" them internally when needed for the generated SQL.


---

[GitHub] nifi issue #2424: NIFI-4393: Handle database specific identifier escape char...

Posted by ijokarumawak <gi...@git.apache.org>.

Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/2424
  
    @mattyb149 Thanks for your suggestion. I considered that path, too. But I did it this way as it seems simpler to the objective to address the issue. However, I think your comment is a good point. And I prefer having wrap method in DatabaseAdapters instead of unwrap.
    
    Let's discuss a bit more. I noticed that table and column names are spilled into few places. I think we should define how we want those to be set.
    
    - Processor properties:
        - Columns to Return
        - Maximum Value Columns
        - Additional WHERE clause
        - Custom SQL (to be added by #2162)
        - Additional WHERE clause and Custom SQL might be challenging to wrap problematic identifies by Processor automatically.
    - Processor State: Stored with a key in `{tableName}!@!{columnName}` format. Do we want to keep database specific characters here?
    - Output FlowFile Attributes:
        - max.{columnName} = {maxValue}
        - tableName = {tableName]
        - Do we want to keep database specific characters here?
    
    Even if we change DatabaseAdapter method from unwrap to wrap, it's still possible to use wrap character to unwrap. But we need to make a consensus on how above listed variables are written.


---

[GitHub] nifi pull request #2424: NIFI-4393: Handle database specific identifier esca...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2424


---

[GitHub] nifi issue #2424: NIFI-4393: Handle database specific identifier escape char...

Posted by patricker <gi...@git.apache.org>.

Github user patricker commented on the issue:

    https://github.com/apache/nifi/pull/2424
  
    @ijokarumawak I think column names/table names should be kept unwrapped in all locations, and wrapped as needed by the Database Adapter.  This will allow all columns to be stored uniformly no matter which adapter is used. Also in cases like `initial.maxvalue` property use, the user won't need to worry about wrapping the column name.


---