You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "VenuReddy2103 (via GitHub)" <gi...@apache.org> on 2023/02/10 10:13:41 UTC

[GitHub] [hive] VenuReddy2103 commented on pull request #4050: HIVE-27065: Exception in partition column statistics update with SQL Server db when histogram statistics is not enabled

VenuReddy2103 commented on PR #4050:
URL: https://github.com/apache/hive/pull/4050#issuecomment-1425563880

   > @VenuReddy2103 thanks for your PR, I have a couple of questions: did you try with other backend DBs as well? Did you encounter a similar error? I am curious because this part of the code does not seem to be DB specific, so I'd rather confirm quickly if that's an issue only with MSSQL or if it's more general, and that we don't regress on other databases by using `setBytes()` in place of `setObject()`.
   > 
   > Have you tried the same code when histogram is actually computed? Did it fail? I am asking because it might be an artifact of the uninitialized `histogram` field.
   > 
   > Since the `bitvector` field is totally analogous (same data type, `byte array`, and written with `setObject()`), apart from the fact that it is always computed, I am wondering if that's not the real difference, as it's successfully written to MSSQL with the `.setObject()` call.
   > 
   > Can you update here with the actual return value of `mPartitionColumnStatistics.getHistogram()` and of `mPartitionColumnStatistics.getBitVector()` at the time of the call by putting a breakpoint in the `insertIntoPartColStatTable` function? That would help shed some light on what's actually going on there.
   > 
   > Thanks!
   
   @asolimando Thanks for reviewing PR. Please find reply for queries.
   1. I've tried with Mysql, Postgresql, Oracle and Derby as well. This issue is not observed with them. It is observed only with MSSql. When histogram statistics is disabled, `histogram` field in `MPartitionColumnStatistics` is initialized default with empty byte array. And `mPartitionColumnStatistics.getHistogram()` returns null if `histogram.length` is 0. With `preparedStatement.setObject(18, mPartitionColumnStatistics.getHistogram())` invocation, since the object is null, `SQLServerPreparedStatement.setObject()` inferred the jdbc type as `JDBCType.CHAR`. Hence this issue. Whereas with `preparedStatement.setBytes()`, though null is passed, it don't try to infer type again. Have verified this change with MSSql, MySql, PostgreSql, Oracle and Derby.
   Have found a similar issue fix here - https://github.com/trinodb/trino/pull/4846/files#:~:text=statement.setBytes(index%2C%20null)%3B
   
   2. When the histogram is computed, it didn't fail because value is not null and driver could infer the type correctly.
   3. Yes this issue is not seen with `bitvector` as it is always computed.
   4. Debugger screenshots for MPartitionColumnStatistics.getBitVector() and MPartitionColumnStatistics.getHistogram()
   <img width="974" alt="image" src="https://user-images.githubusercontent.com/35334869/218059642-40495bcf-a547-48cb-bc5d-0a21697e3c07.png">
   <img width="800" alt="image" src="https://user-images.githubusercontent.com/35334869/218060043-960bcad7-0578-4b9b-8183-068f9672eaf7.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org