You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/05/19 18:44:40 UTC

[GitHub] [nifi] markap14 commented on a change in pull request #4282: NIFI-7462: Update to allow FlowFile Table's schema to be more intelligent when using CHOICE types

markap14 commented on a change in pull request #4282:
URL: https://github.com/apache/nifi/pull/4282#discussion_r427522041



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/queryrecord/FlowFileTable.java
##########
@@ -223,12 +225,69 @@ private RelDataType getRelDataType(final DataType fieldType, final JavaTypeFacto
             case BIGINT:
                 return typeFactory.createJavaType(BigInteger.class);
             case CHOICE:
+                final ChoiceDataType choiceDataType = (ChoiceDataType) fieldType;
+                DataType widestDataType = choiceDataType.getPossibleSubTypes().get(0);
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (possibleType == widestDataType) {
+                        continue;
+                    }
+                    if (possibleType.getFieldType().isWiderThan(widestDataType.getFieldType())) {
+                        widestDataType = possibleType;
+                        continue;
+                    }
+                    if (widestDataType.getFieldType().isWiderThan(possibleType.getFieldType())) {
+                        continue;
+                    }
+
+                    // Neither is wider than the other.
+                    widestDataType = null;
+                    break;
+                }
+
+                // If one of the CHOICE data types is the widest, use it.
+                if (widestDataType != null) {
+                    return getRelDataType(widestDataType, typeFactory);
+                }
+
+                // None of the data types is strictly the widest. Check if all data types are numeric.
+                // This would happen, for instance, if the data type is a choice between float and integer.
+                // If that is the case, we can use a String type for the table schema because all values will fit
+                // into a String. This will still allow for casting, etc. if the query requires it.
+                boolean allNumeric = true;
+                for (final DataType possibleType : choiceDataType.getPossibleSubTypes()) {
+                    if (!isNumeric(possibleType)) {
+                        allNumeric = false;
+                        break;
+                    }
+                }
+
+                if (allNumeric) {

Review comment:
       I don't think we want to combine String types here. The idea here is to have a "parent" type that can be used for all numbers. Unfortunately, if the class is `Number.class` calcite does not recognize this, so it translates it into a SQL "OTHER" type. That prevents us still from doing things like SUM() on that column. By using a String type here, we can use SUM and other functions because all numbers can be converted to Strings and those particular Strings can also be converted to numbers under the hood.
   
   However, if we allowed STRINGs to be included, it means we could have a schema whose field is a CHOICE between STRING and INT. It doesn't really make sense to perform aggregate functions such as SUM() across those values.
   
   Or, said another way, String is the internal Java representation that we need to use in order for Calcite to allow both INT and FLOAT to be okay in the same column. But if our schema says that a value should be a CHOICE of STRING or INT, we shouldn't allow things like SUM() over that because you can't sum together STRINGs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org