You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/11/27 16:43:55 UTC
[GitHub] [drill] jnturton commented on a diff in pull request #2710: DRILL-8360: Add Provided Schema for XML Reader

jnturton commented on code in PR #2710:
URL: https://github.com/apache/drill/pull/2710#discussion_r1032969131


##########
contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java:
##########
@@ -428,8 +435,67 @@ private void writeFieldData(String fieldName, String fieldValue, TupleWriter wri
       index = writer.addColumn(colSchema);
     }
     ScalarWriter colWriter = writer.scalar(index);
+    ColumnMetadata columnMetadata = writer.tupleSchema().metadata(index);
+    MinorType dataType = columnMetadata.schema().getType().getMinorType();
+    String dateFormat;
+
+    // Write the values depending on their data type.  This only applies to scalar fields.
     if (fieldValue != null && (currentState != xmlState.ROW_ENDED && currentState != xmlState.FIELD_ENDED)) {
-      colWriter.setString(fieldValue);
+      switch (dataType) {
+        case BIT:
+          colWriter.setBoolean(Boolean.parseBoolean(fieldValue));
+          break;
+        case TINYINT:
+        case SMALLINT:
+        case INT:
+          colWriter.setInt(Integer.parseInt(fieldValue));
+          break;
+        case BIGINT:
+          colWriter.setLong(Long.parseLong(fieldValue));
+          break;
+        case FLOAT4:
+        case FLOAT8:
+          colWriter.setDouble(Double.parseDouble(fieldValue));
+          break;
+        case DATE:
+          dateFormat = columnMetadata.property("drill.format");
+          LocalDate localDate;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            localDate = LocalDate.parse(fieldValue);
+          } else {
+            localDate = LocalDate.parse(fieldValue, DateTimeFormatter.ofPattern(dateFormat));
+          }
+          colWriter.setDate(localDate);
+          break;
+        case TIME:
+          dateFormat = columnMetadata.property("drill.format");
+          LocalTime localTime;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            localTime = LocalTime.parse(fieldValue);
+          } else {
+            localTime = LocalTime.parse(fieldValue, DateTimeFormatter.ofPattern(dateFormat));
+          }
+          colWriter.setTime(localTime);
+          break;
+        case TIMESTAMP:
+          dateFormat = columnMetadata.property("drill.format");
+          Instant timestamp = null;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            timestamp = Instant.parse(fieldValue);
+          } else {
+            try {
+              SimpleDateFormat simpleDateFormat = new SimpleDateFormat(dateFormat);
+              Date parsedDate = simpleDateFormat.parse(fieldValue);
+              timestamp = Instant.ofEpochMilli(parsedDate.getTime());
+            } catch (ParseException e) {

Review Comment:
   I think we should operate in one of two definite modes controlled by an option that we try to standardise across plugins.
   
   1. Invalid data fails hard and fast. Your results will not be silently distorted.
   2. Invalid data is swallowed and null is emitted. You know what you're doing and you'll provide appropriate logic for the nulls later.
   
   Users that want a default substituted have the simple `ifnull(x,-1)` to combine with (2). My opinion is that if we don't have a `nullInvalidData: true` option set then we should continue to fail the query on the spot, as we do elsewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org