You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/05/18 18:21:35 UTC

[GitHub] [nifi] markap14 opened a new pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

markap14 opened a new pull request #4282:
URL: https://github.com/apache/nifi/pull/4282


   Thank you for submitting a contribution to Apache NiFi.
   
   Please provide a short description of the PR here:
   
   #### Description of PR
   
   _Enables X functionality; fixes bug NIFI-YYYY._
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [ ] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `master`)?
   
   - [ ] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on both JDK 8 and JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] markap14 commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

markap14 commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r428138798



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       @pcgrenier I do see your argument. But I think I disagree. Let's say that we have the following CSV then:
   ```
   name, other
   markap14, 48
   pcgrenier, 19
   ```
   And the data's schema indicates that "other" is a CHOICE between STRING or INT. Fundamentally, it comes down to one question: in this case, should we allow the user to run a query like `SELECT SUM(other) as total FROM FLOWFILE`.
   
   Your argument is yes, because I happen to know that for this particular FlowFile it will succeed.
   
   My argument, though, is that we should not allow it. Yes, it would succeed for this FlowFile, but what will happen is that it will fail for other FlowFiles. Other FlowFiles that have the same schema and other FlowFiles for which the data is exactly correct according to the schema. This makes things more confusing because some FlowFiles succeed and others fail. All the while, the schema clearly indicates that the data may be a STRING, and you shouldn't really be able to sum together STRING data. If you do sort out all the number values, it makes sense to use a schema that reflects that. Otherwise, if we allow summing together a CHOICE[int, string] we're not really honoring the schema, in my point of view.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] mattyb149 commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

mattyb149 commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631730497


   That's a good point, if an inferred schema is being handed around but the flow file changes (partitioned, filtered, e.g.) such that the desired fields are all numeric, then even the inference stuff can pick up the more specific schema (without having to necessarily and manually update it)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] markap14 edited a comment on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

markap14 edited a comment on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631724090


   Yeah, I think those should work fine. For something like "concat" it should work with both numbers and strings - and it does. In that case, what gets returned to calcite for the table's schema is an Object for that column, and concat will work against any type of object.
   
   > I also think taking this choice out of the hands of the flow developer is bad.
   
   I don't believe we are taking the choice out of the hands of the flow developer. We're simply saying that if you want to do some sort of numeric-only aggregate function, your schema must indicate that the field is a number. I think this is fair game. The user can, in this case, simply use a schema that indicates that field is numeric.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] markap14 commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

markap14 commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r427522041



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       I don't think we want to combine String types here. The idea here is to have a "parent" type that can be used for all numbers. Unfortunately, if the class is `Number.class` calcite does not recognize this, so it translates it into a SQL "OTHER" type. That prevents us still from doing things like SUM() on that column. By using a String type here, we can use SUM and other functions because all numbers can be converted to Strings and those particular Strings can also be converted to numbers under the hood.
   
   However, if we allowed STRINGs to be included, it means we could have a schema whose field is a CHOICE between STRING and INT. It doesn't really make sense to perform aggregate functions such as SUM() across those values.
   
   Or, said another way, String is the internal Java representation that we need to use in order for Calcite to allow both INT and FLOAT to be okay in the same column. But if our schema says that a value should be a CHOICE of STRING or INT, we shouldn't allow things like SUM() over that because you can't sum together STRINGs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] markap14 commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

markap14 commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r427525012



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       As an example, consider a csv like:
   ```
   name, other
   markap14, 48
   pcgrenier, computer
   ```
   In this case, the schema has a field with name 'name' and a type String. But 'other' field is a CHOICE[INT, STRING]. So what should best represent that field in terms of java objects? I'd say `Object.class`. The `String.class` being used here is honestly a bit of a hack because Calcite doesn't give us a better way to represent Number - or, at least, not to my knowledge :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] markap14 commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

markap14 commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631724090


   Yeah, I think those should work fine. For something like "concat" it should work with both numbers and strings - and it does. In that case, what gets returned to calcite for the table's schema is an Object for that column, and concat will work against any type of object.
   
   > I also think taking this choice out of the hands of the flow developer is bad.
   I don't believe we are taking the choice out of the hands of the flow developer. We're simply saying that if you want to do some sort of numeric-only aggregate function, your schema must indicate that the field is a number. I think this is fair game. The user can, in this case, simply use a schema that indicates that field is numeric.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] mattyb149 commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

mattyb149 commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-632232135


   +1 LGTM, thanks for the improvement! Merging to master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] pcgrenier commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

pcgrenier commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r427542274



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       While I do agree, but given the example if you have split the records to sort out all the "number" values, the schema and the data could disagree. So logically you could know the data is all numbers but the schema disagrees. So if we are hacking to a string anyway, I don't think it would hurt to just allow it to pass on to let Calcite try and process it. This is just allowing a "best case" situation where Calcite isn't passed something it has no idea what to do with. So I do see a benefit in allowing the string to pass through, since if it is not a number it will still fail at query time but will at least attempt it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] pcgrenier commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

pcgrenier commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r427311324



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       Maybe allow String types too? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] pcgrenier commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

pcgrenier commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631588369


   I do agree, but there are other functions like concat, that do make more
   sense. I also think taking this choice out of the hands of the flow
   developer is bad. I think you are correct in that it is a bad idea to allow
   the sum of strings and ints but also not allowing it at all if the
   developer needs to do it is worse. Really I'm cool with it either way and
   happy to see the fix.
   
   On Wed, May 20, 2020, 12:16 PM markap14 <no...@github.com> wrote:
   
   > *@markap14* commented on this pull request.
   > ------------------------------
   >
   > In
   > nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
   > <https://github.com/apache/nifi/pull/4282#discussion_r428138798>:
   >
   > > +                    return getRelDataType(widestDataType, typeFactory);
   > +                }
   > +
   > +                // None of the data types is strictly the widest. Check if all data types are numeric.
   > +                // This would happen, for instance, if the data type is a choice between float and integer.
   > +                // If that is the case, we can use a String type for the table schema because all values will fit
   > +                // into a String. This will still allow for casting, etc. if the query requires it.
   > +                boolean allNumeric = true;
   > +                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
   > +                    if (!isNumeric(possibleType)) {
   > +                        allNumeric = false;
   > +                        break;
   > +                    }
   > +                }
   > +
   > +                if (allNumeric) {
   >
   > @pcgrenier <https://github.com/pcgrenier> I do see your argument. But I
   > think I disagree. Let's say that we have the following CSV then:
   >
   > name, other
   > markap14, 48
   > pcgrenier, 19
   >
   > And the data's schema indicates that "other" is a CHOICE between STRING or
   > INT. Fundamentally, it comes down to one question: in this case, should we
   > allow the user to run a query like SELECT SUM(other) as total FROM
   > FLOWFILE.
   >
   > Your argument is yes, because I happen to know that for this particular
   > FlowFile it will succeed.
   >
   > My argument, though, is that we should not allow it. Yes, it would succeed
   > for this FlowFile, but what will happen is that it will fail for other
   > FlowFiles. Other FlowFiles that have the same schema and other FlowFiles
   > for which the data is exactly correct according to the schema. This makes
   > things more confusing because some FlowFiles succeed and others fail. All
   > the while, the schema clearly indicates that the data may be a STRING, and
   > you shouldn't really be able to sum together STRING data. If you do sort
   > out all the number values, it makes sense to use a schema that reflects
   > that. Otherwise, if we allow summing together a CHOICE[int, string] we're
   > not really honoring the schema, in my point of view.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/nifi/pull/4282#discussion_r428138798>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AA7ZC6TSH6KKCGHPIA4BY63RSP65RANCNFSM4NEKANRA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] mattyb149 commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

mattyb149 commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631695622


   I tried `LISTAGG` and the concat operator `||` with the current PR and they work on choice (int,string) fields. @pcgrenier do you have an example of a query that doesn't work with the current PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] mattyb149 closed pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

mattyb149 closed pull request #4282:
URL: https://github.com/apache/nifi/pull/4282


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [nifi] pcgrenier commented on pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

Posted by GitBox <gi...@apache.org>.

pcgrenier commented on pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#issuecomment-631815567


   > Yeah, I think those should work fine. For something like "concat" it should work with both numbers and strings - and it does. In that case, what gets returned to calcite for the table's schema is an Object for that column, and concat will work against any type of object.
   > 
   > > I also think taking this choice out of the hands of the flow developer is bad.
   > 
   > I don't believe we are taking the choice out of the hands of the flow developer. We're simply saying that if you want to do some sort of numeric-only aggregate function, your schema must indicate that the field is a number. I think this is fair game. The user can, in this case, simply use a schema that indicates that field is numeric.
   
   My bad, I did test a few other string based aggregations like the LISTAGG functions and they all seem to work as intended. I just assumed if the number based functions failed on Choice returning an Object type so would the String based ones. So yeah if calcite, which it looks to be doing, is properly calling the toString on objects then yeah. This is awesome thanks for following up on this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org