You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2020/08/03 12:45:06 UTC

[GitHub] [hive] HunterL opened a new pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

HunterL opened a new pull request #1313:
URL: https://github.com/apache/hive/pull/1313


   Updated the LazySimple SerDe to no longer attempt to auto-detect if Binary columns were Base64 and instead use a table property. The previous way this was done was expensive and did not correctly check if the values were valid Base64 which in niche cases could result in statistics being computed incorrectly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] HunterL closed pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
HunterL closed pull request #1313:
URL: https://github.com/apache/hive/pull/1313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-680044199


   @HunterL Really great stuff.  Need one test with `hive.serialization.decode.binary.as.base64` set to `true`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr closed pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr closed pull request #1313:
URL: https://github.com/apache/hive/pull/1313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr merged pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr merged pull request #1313:
URL: https://github.com/apache/hive/pull/1313


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-716076826


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr edited a comment on pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr edited a comment on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-680044199


   @HunterL Really great stuff.  Need one test with `hive.serialization.decode.binary.as.base64` set to `true`.
   
   Edit: The default is `true` so presumably some test have this flag enabled.  Are there any examples of this being exercised? (i.e., doing base-64 conversion on a data set?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr commented on pull request #1313:
URL: https://github.com/apache/hive/pull/1313#issuecomment-717934374


   @HunterL Merged to master.  Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] belugabehr commented on a change in pull request #1313: HIVE-23829: Compute Stats Incorrect for Binary Columns

Posted by GitBox <gi...@apache.org>.
belugabehr commented on a change in pull request #1313:
URL: https://github.com/apache/hive/pull/1313#discussion_r461115069



##########
File path: serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySerDeParameters.java
##########
@@ -53,7 +53,9 @@
   	= "hive.serialization.extend.nesting.levels";
   public static final String SERIALIZATION_EXTEND_ADDITIONAL_NESTING_LEVELS
 	= "hive.serialization.extend.additional.nesting.levels";
-
+  public static final String SERIALIZATION_DECODE_BINARY_AS_BASE64
+  	= "hive.serialization.decode.binary.as.base.64";

Review comment:
       Just make this `base64` (one word)

##########
File path: serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySerDeParameters.java
##########
@@ -116,7 +121,7 @@ public LazySerDeParameters(Configuration job, Properties tbl, String serdeName)
 
     extendedBooleanLiteral = (job == null ? false :
         job.getBoolean(ConfVars.HIVE_LAZYSIMPLE_EXTENDED_BOOLEAN_LITERAL.varname, false));
-    
+

Review comment:
       Remove empty space change




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org