You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/08/07 20:50:31 UTC

[GitHub] [nifi] adenes opened a new pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

adenes opened a new pull request #4463:
URL: https://github.com/apache/nifi/pull/4463


   Thank you for submitting a contribution to Apache NiFi.
   
   Please provide a short description of the PR here:
   
   #### Description of PR
   
   Using QueryCassandra with JSON output format strips the milliseconds from timestamp fields.
   Added a new property where the user can customize the format pattern for these fields.
   
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically `main`)?
   
   - [x] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [x] Have you written or updated unit tests to verify your changes?
   - [x] Have you verified that the full build is successful on JDK 8?
   - [x] Have you verified that the full build is successful on JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] adenes commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
adenes commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r468642036



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,23 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()

Review comment:
       Thanks @mattyb149 for the review.
   I have updated the name/displayname according to your suggestion.
   I agree that the time/date handling should be improved, currently the CQL `timestamp` works correctly, but I'm not sure about the `date` and `time` types, to be honest. The aim of this PR was to add the ability to change the timestamp's formatting in JSON output as with the current formatting it discards the milliseconds, but I did not want to change anything else.
   I'll file a jira for this.

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/test/java/org/apache/nifi/processors/cassandra/QueryCassandraTest.java
##########
@@ -368,6 +380,42 @@ public void testConvertToJSONStream() throws Exception {
         assertEquals(2, numberOfRows);
     }
 
+    @Test
+    public void testDefaultDateFormatInConvertToJSONStream() throws Exception {
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        DateFormat df = new SimpleDateFormat(QueryCassandra.DATE_FORMAT_PATTERN.getDefaultValue());
+        df.setTimeZone(TimeZone.getTimeZone("UTC"));
+
+        long numberOfRows = QueryCassandra.convertToJsonStream(Optional.of(testRunner.getProcessContext()), rs, baos,
+            StandardCharsets.UTF_8, 0, null);
+        assertEquals(1, numberOfRows);
+
+        Map<String, List<Map<String, String>>> map = new ObjectMapper().readValue(baos.toByteArray(), HashMap.class);
+        String date = map.get("results").get(0).get("date");
+        assertEquals(df.format(CassandraQueryTestUtil.TEST_DATE), date);
+    }
+
+    @Test
+    public void testCustomDateFormatInConvertToJSONStream() throws Exception {
+        MockProcessContext context = (MockProcessContext) testRunner.getProcessContext();
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        final String customDateFormat = "yyyy-MM-dd HH:mm:ss.SSSZ";

Review comment:
       The current implementation formats the date with UTC timezone, it's hardcoded. I have updated the test though, to use a PST date/time as input and validate that it's properly formatted with UTC tz.

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -467,19 +494,30 @@ public static long convertToJsonStream(final ResultSet rs, final OutputStream ou
     }
 
     protected static String getJsonElement(Object value) {
+        return getJsonElement(Optional.empty(), value);
+    }
+
+    protected static String getJsonElement(final Optional<ProcessContext> context, Object value) {
         if (value instanceof Number) {
             return value.toString();
         } else if (value instanceof Date) {
-            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
-            dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
-            return "\"" + dateFormat.format((Date) value) + "\"";
+            return "\"" + getFormattedDate(context, (Date) value) + "\"";
         } else if (value instanceof String) {
             return "\"" + StringEscapeUtils.escapeJson((String) value) + "\"";
         } else {
             return "\"" + value.toString() + "\"";
         }
     }
 
+    private static String getFormattedDate(final Optional<ProcessContext> context, Date value) {
+        final String dateFormatPattern = context
+                .map(_context -> _context.getProperty(DATE_FORMAT_PATTERN).getValue())
+                .orElse(DATE_FORMAT_PATTERN.getDefaultValue());
+        SimpleDateFormat dateFormat = new SimpleDateFormat(dateFormatPattern);
+        dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));

Review comment:
       The output timezone is hardcoded to UTC both in the current and in this updated implementation.
   I've added this to the newly added property's description.
   I'd take care of migrating to the `java.time` classes in a separate jira.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#issuecomment-681994766


   Merging to main...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
mattyb149 commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r468208253



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/test/java/org/apache/nifi/processors/cassandra/QueryCassandraTest.java
##########
@@ -368,6 +380,42 @@ public void testConvertToJSONStream() throws Exception {
         assertEquals(2, numberOfRows);
     }
 
+    @Test
+    public void testDefaultDateFormatInConvertToJSONStream() throws Exception {
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        DateFormat df = new SimpleDateFormat(QueryCassandra.DATE_FORMAT_PATTERN.getDefaultValue());
+        df.setTimeZone(TimeZone.getTimeZone("UTC"));
+
+        long numberOfRows = QueryCassandra.convertToJsonStream(Optional.of(testRunner.getProcessContext()), rs, baos,
+            StandardCharsets.UTF_8, 0, null);
+        assertEquals(1, numberOfRows);
+
+        Map<String, List<Map<String, String>>> map = new ObjectMapper().readValue(baos.toByteArray(), HashMap.class);
+        String date = map.get("results").get(0).get("date");
+        assertEquals(df.format(CassandraQueryTestUtil.TEST_DATE), date);
+    }
+
+    @Test
+    public void testCustomDateFormatInConvertToJSONStream() throws Exception {
+        MockProcessContext context = (MockProcessContext) testRunner.getProcessContext();
+        ResultSet rs = CassandraQueryTestUtil.createMockDateResultSet();
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+
+        final String customDateFormat = "yyyy-MM-dd HH:mm:ss.SSSZ";

Review comment:
       I might be reading this incorrectly, but shouldn't we try a non-default value here, such as a timezone -1 hour from UTC?

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,23 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()

Review comment:
       TL;DR I think this should be called "Timestamp Format Pattern for JSON output":
   
   I think (for now) we have to be pretty specific about the fields this property works upon. For example there is a separate bug (that should be written up as a Jira) where the CQL `DATE` type is not fully supported. For example, their `DATE` type returns a Cassandra-specific `LocalDate` class, which cannot currently be translated to an Avro schema, and in the current code (PR included) it drops through the JSON `instanceof Date` clause and is issued as `value.toString()`. So being a `java.util.Date` type lends itself to a Cassandra `TIMESTAMP` type, which is reflected in the unit tests. It's odd to me that they return a `java.util.Date` when (from [other sources](http://itdoc.hitachi.co.jp/manuals/3020/30203V0300e/BV030040.HTM), a vendor not the community) it appears they have a full-fledged `java.sql.Timestamp` object under the hood, yet we can only access a Date object so we can't get things like nanoseconds.
   
   We should revisit the `DATE` and `TIME` datatypes under a separate Jira but since this only seems to apply to the `TIMESTAMP` type, I'm thinking we should name it as such. I believe the JsonRecordSetWriter does something similar (i.e. has properties for date, time, timestamp formats)

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -467,19 +494,30 @@ public static long convertToJsonStream(final ResultSet rs, final OutputStream ou
     }
 
     protected static String getJsonElement(Object value) {
+        return getJsonElement(Optional.empty(), value);
+    }
+
+    protected static String getJsonElement(final Optional<ProcessContext> context, Object value) {
         if (value instanceof Number) {
             return value.toString();
         } else if (value instanceof Date) {
-            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
-            dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
-            return "\"" + dateFormat.format((Date) value) + "\"";
+            return "\"" + getFormattedDate(context, (Date) value) + "\"";
         } else if (value instanceof String) {
             return "\"" + StringEscapeUtils.escapeJson((String) value) + "\"";
         } else {
             return "\"" + value.toString() + "\"";
         }
     }
 
+    private static String getFormattedDate(final Optional<ProcessContext> context, Date value) {
+        final String dateFormatPattern = context
+                .map(_context -> _context.getProperty(DATE_FORMAT_PATTERN).getValue())
+                .orElse(DATE_FORMAT_PATTERN.getDefaultValue());
+        SimpleDateFormat dateFormat = new SimpleDateFormat(dateFormatPattern);
+        dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));

Review comment:
       Can/do we need to get the timezone from the specified format? In any case let's make sure the doc is clear on what is output. Also should we migrate from SimpleDateFormat to the newer `java.time` classes? It can definitely be a pain but I wonder if it is more accommodating in the long run.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r475533064



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,24 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()
+            .name("timestamp-format-pattern")

Review comment:
       Minor: I think there is only a small chance for having different "timestamp format pattern" property other than the one for JSON, but as property names should not change later, I would suggest to keep the name consequent to the display name and involve json in it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] asfgit closed pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4463:
URL: https://github.com/apache/nifi/pull/4463


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 commented on pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
mattyb149 commented on pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#issuecomment-681988642


   +1 LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] adenes commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
adenes commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r477255009



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -379,7 +406,7 @@ public static long convertToAvroStream(final ResultSet rs, final OutputStream ou
      * @throws TimeoutException     If a result set fetch has taken longer than the specified timeout
      * @throws ExecutionException   If any error occurs during the result set fetch
      */
-    public static long convertToJsonStream(final ResultSet rs, final OutputStream outStream,
+    public static long convertToJsonStream(final Optional<ProcessContext> context, final ResultSet rs, final OutputStream outStream,

Review comment:
       Thanks @turcsanyip , I've updated the code according to your comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] tpalfy commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
tpalfy commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r467916778



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,23 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()
+            .name("Date Format Pattern for JSON output")

Review comment:
       ```suggestion
               .name("date-format-pattern")
               .displayName("Date Format Pattern for JSON output")
   ```

##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -467,19 +493,33 @@ public static long convertToJsonStream(final ResultSet rs, final OutputStream ou
     }
 
     protected static String getJsonElement(Object value) {
+        return getJsonElement(Optional.empty(), value);
+    }
+
+    protected static String getJsonElement(final Optional<ProcessContext> context, Object value) {
         if (value instanceof Number) {
             return value.toString();
         } else if (value instanceof Date) {
-            SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ");
-            dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
-            return "\"" + dateFormat.format((Date) value) + "\"";
+            return "\"" + getFormattedDate(context, (Date) value) + "\"";
         } else if (value instanceof String) {
             return "\"" + StringEscapeUtils.escapeJson((String) value) + "\"";
         } else {
             return "\"" + value.toString() + "\"";
         }
     }
 
+    private static String getFormattedDate(final Optional<ProcessContext> context, Date value) {
+        final String dateFormatPattern;
+        if (context.isPresent()) {
+            dateFormatPattern = context.get().getProperty(DATE_FORMAT_PATTERN).getValue();
+        } else {
+            dateFormatPattern = DATE_FORMAT_PATTERN.getDefaultValue();
+        }

Review comment:
       ```suggestion
           final String dateFormatPattern = context
               .map(_context -> _context.getProperty(DATE_FORMAT_PATTERN).getValue())
               .orElse(DATE_FORMAT_PATTERN.getDefaultValue());
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] adenes commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
adenes commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r476356114



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -130,6 +132,24 @@
             .defaultValue(AVRO_FORMAT)
             .build();
 
+    public static final PropertyDescriptor DATE_FORMAT_PATTERN = new PropertyDescriptor.Builder()
+            .name("timestamp-format-pattern")

Review comment:
       Thanks @simonbence , I have updated the PR according to your suggestion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on a change in pull request #4463: NIFI-7714: QueryCassandra loses precision when converting timestamps to JSON

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #4463:
URL: https://github.com/apache/nifi/pull/4463#discussion_r477208816



##########
File path: nifi-nar-bundles/nifi-cassandra-bundle/nifi-cassandra-processors/src/main/java/org/apache/nifi/processors/cassandra/QueryCassandra.java
##########
@@ -379,7 +406,7 @@ public static long convertToAvroStream(final ResultSet rs, final OutputStream ou
      * @throws TimeoutException     If a result set fetch has taken longer than the specified timeout
      * @throws ExecutionException   If any error occurs during the result set fetch
      */
-    public static long convertToJsonStream(final ResultSet rs, final OutputStream outStream,
+    public static long convertToJsonStream(final Optional<ProcessContext> context, final ResultSet rs, final OutputStream outStream,

Review comment:
       In my opinion we should not expose this method as `public`. It is unnecessary in general and I believe the signature will change when the SimpleDateFormat code is migrated to `java.time`.
   The javadoc should be moved to the original method (otherwise the new parameter should be added here).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org