You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/08/10 21:59:39 UTC

[GitHub] [nifi] mattyb149 commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

mattyb149 commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r468208253



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/test/java/org/apache/nifi/processors/cassandra/QueryCassandraTest.java
##########
@@ -368,6 +380,42 @@ public void testConvertToJSONStream() throws Exception {
         assertEquals(2, numberOfRows);
     }
 
+    @Test
+    public void testDefaultDateFormatInConvertToJSONStream() throws Exception {
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        DateFormat df = new SimpleDateFormat(QueryCassandra.DATE_FORMAT_PATTERN.getDefaultValue());
+        df.setTimeZone(TimeZone.getTimeZone("UTC"));
+
+        long numberOfRows = QueryCassandra.convertToJsonStream(Optional.of(testRunner.getProcessContext()), rs, baos,
+            StandardCharsets.UTF_8, 0, null);
+        assertEquals(1, numberOfRows);
+
+        Map<String, List<Map<String, String>>> map = new ObjectMapper().readValue(baos.toByteArray(), HashMap.class);
+        String date = map.get("results").get(0).get("date");
+        assertEquals(df.format(CassandraQueryTestUtil.TEST_DATE), date);
+    }
+
+    @Test
+    public void testCustomDateFormatInConvertToJSONStream() throws Exception {
+        MockProcessContext context = (MockProcessContext) testRunner.getProcessContext();
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        final String customDateFormat = "yyyy-MM-dd HH:mm:ss.SSSZ";

Review comment:
       I might be reading this incorrectly, but shouldn't we try a non-default value here, such as a timezone -1 hour from UTC?

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,23 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()

Review comment:
       TL;DR I think this should be called "Timestamp Format Pattern for JSON output":
   
   I think (for now) we have to be pretty specific about the fields this property works upon. For example there is a separate bug (that should be written up as a Jira) where the CQL `DATE` type is not fully supported. For example, their `DATE` type returns a Cassandra-specific `LocalDate` class, which cannot currently be translated to an Avro schema, and in the current code (PR included) it drops through the JSON `instanceof Date` clause and is issued as `value.toString()`. So being a `java.util.Date` type lends itself to a Cassandra `TIMESTAMP` type, which is reflected in the unit tests. It's odd to me that they return a `java.util.Date` when (from [other sources](http://itdoc.hitachi.co.jp/manuals/3020/30203V0300e/BV030040.HTM), a vendor not the community) it appears they have a full-fledged `java.sql.Timestamp` object under the hood, yet we can only access a Date object so we can't get things like nanoseconds.
   
   We should revisit the `DATE` and `TIME` datatypes under a separate Jira but since this only seems to apply to the `TIMESTAMP` type, I'm thinking we should name it as such. I believe the JsonRecordSetWriter does something similar (i.e. has properties for date, time, timestamp formats)

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -467,19 +494,30 @@ public static long convertToJsonStream(final ResultSet rs, final OutputStream ou
     }
 
     protected static String getJsonElement(Object value) {
+        return getJsonElement(Optional.empty(), value);
+    }
+
+    protected static String getJsonElement(final Optional<ProcessContext> context, Object value) {
         if (value instanceof Number) {
             return value.toString();
         } else if (value instanceof Date) {
-            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
-            dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
-            return "\"" + dateFormat.format((Date) value) + "\"";
+            return "\"" + getFormattedDate(context, (Date) value) + "\"";
         } else if (value instanceof String) {
             return "\"" + StringEscapeUtils.escapeJson((String) value) + "\"";
         } else {
             return "\"" + value.toString() + "\"";
         }
     }
 
+    private static String getFormattedDate(final Optional<ProcessContext> context, Date value) {
+        final String dateFormatPattern = context
+                .map(_context -> _context.getProperty(DATE_FORMAT_PATTERN).getValue())
+                .orElse(DATE_FORMAT_PATTERN.getDefaultValue());
+        SimpleDateFormat dateFormat = new SimpleDateFormat(dateFormatPattern);
+        dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));

Review comment:
       Can/do we need to get the timezone from the specified format? In any case let's make sure the doc is clear on what is output. Also should we migrate from SimpleDateFormat to the newer `java.time` classes? It can definitely be a pain but I wonder if it is more accommodating in the long run.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org