You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "kgyrtkirk (via GitHub)" <gi...@apache.org> on 2023/10/13 05:58:49 UTC

[PR] Fix json inputs for drill windowing tests (druid)

kgyrtkirk opened a new pull request, #15148:
URL: https://github.com/apache/druid/pull/15148

   Apparently Drill had some issue with parquet files up to 1.10 ; they have added a feature to fix those corruptions.
   
   Unfortunately - we are working with parquet files which do had that issue...
   
   This PR:
   * adds a flag to `JsonToParquet` to do the fix during conversion 
   * updates the json files to more correct conents
   * some resultset mismatches were fixed by this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on code in PR #15148:
URL: https://github.com/apache/druid/pull/15148#discussion_r1361726925


##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -492,10 +510,12 @@ private static BigDecimal convertBinaryToDecimal(Binary value, int precision, in
   }
 
   private final boolean binaryAsString;
+  private final boolean convertCorruptDates;
 
-  public ParquetGroupConverter(boolean binaryAsString)
+  public ParquetGroupConverter(boolean binaryAsString, boolean convertCorruptDates)
   {
     this.binaryAsString = binaryAsString;

Review Comment:
   yes; it should be used around [here](https://github.com/apache/druid/blob/c3bab1685db50d03809e33d249c47aad594ef043/extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java#L436)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on PR #15148:
URL: https://github.com/apache/druid/pull/15148#issuecomment-1765044552

   @imply-cheddar , @rohangarg could you please take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on code in PR #15148:
URL: https://github.com/apache/druid/pull/15148#discussion_r1361750791


##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -52,10 +52,14 @@ public class ParquetGroupConverter
   private static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1);
 
   /**
-   * See {@link ParquetGroupConverter#convertField(Group, String)}
+   * Convert a parquet group field as though it were a map. Logical types of 'list' and 'map' will be transformed
+   * into java lists and maps respectively ({@link ParquetGroupConverter#convertLogicalList} and
+   * {@link ParquetGroupConverter#convertLogicalMap}), repeated fields will also be translated to lists, and
+   * primitive types will be extracted into an ingestion friendly state (e.g. 'int' and 'long'). Finally,
+   * if a field is not present, this method will return null.
    */
   @Nullable
-  private static Object convertField(Group g, String fieldName, boolean binaryAsString)

Review Comment:
   `binaryAsString` was passed around in `private static` methods in a class on which an `instance method` was called first....so I've choosen to remove them (and use the implict class access - to get it where needed) ; instead of adding another boolean to every static method
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "imply-cheddar (via GitHub)" <gi...@apache.org>.
imply-cheddar commented on code in PR #15148:
URL: https://github.com/apache/druid/pull/15148#discussion_r1361687034


##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -52,10 +52,14 @@ public class ParquetGroupConverter
   private static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1);
 
   /**
-   * See {@link ParquetGroupConverter#convertField(Group, String)}
+   * Convert a parquet group field as though it were a map. Logical types of 'list' and 'map' will be transformed
+   * into java lists and maps respectively ({@link ParquetGroupConverter#convertLogicalList} and
+   * {@link ParquetGroupConverter#convertLogicalMap}), repeated fields will also be translated to lists, and
+   * primitive types will be extracted into an ingestion friendly state (e.g. 'int' and 'long'). Finally,
+   * if a field is not present, this method will return null.
    */
   @Nullable
-  private static Object convertField(Group g, String fieldName, boolean binaryAsString)

Review Comment:
   Just double checking, this `binaryAsString` argument.  That was being passed around but never actually used?
   
   Any changes you want to make to `ParquetToJson` are good, that code is just for our own dev purposes, but these changes in `ParquetGroupConverter` is changing the code that our current parquet-based file ingestions end up using.  So, I just want to double check that this cleanup is truly a redundant argument and not just something that wasn't needed for `ParquetToJson` but used for the other production code paths.



##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########


Review Comment:
   This is a design nit: with the change to remove the `static` label from the methods, I would expect them to start coming after the constructor.  That is, we tend to follow the code flow of static first, then constructor then class methods.  I noticed that it was inverted because I kept searching for the constructor at the top of the file and didn't see it, and then realized it was at the bottom and that's because the methods used to be static but now are not.
   
   The current structure reads really nicely for the diff though, I hope that the whitespace change that I'm nit picking doesn't screw up the diff...



##########
extensions-core/parquet-extensions/src/main/java/org/apache/druid/data/input/parquet/simple/ParquetGroupConverter.java:
##########
@@ -492,10 +510,12 @@ private static BigDecimal convertBinaryToDecimal(Binary value, int precision, in
   }
 
   private final boolean binaryAsString;
+  private final boolean convertCorruptDates;
 
-  public ParquetGroupConverter(boolean binaryAsString)
+  public ParquetGroupConverter(boolean binaryAsString, boolean convertCorruptDates)
   {
     this.binaryAsString = binaryAsString;

Review Comment:
   is `binaryAsString` still used somewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "kgyrtkirk (via GitHub)" <gi...@apache.org>.
kgyrtkirk commented on PR #15148:
URL: https://github.com/apache/druid/pull/15148#issuecomment-1770343624

   Thank you @imply-cheddar and @abhishekagarwal87  for reviewing and merging the changes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fix json inputs for drill windowing tests (druid)

Posted by "abhishekagarwal87 (via GitHub)" <gi...@apache.org>.
abhishekagarwal87 merged PR #15148:
URL: https://github.com/apache/druid/pull/15148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org