You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/05/10 07:00:44 UTC

[GitHub] [hive] zhangbutao opened a new pull request, #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

zhangbutao opened a new pull request, #3276:
URL: https://github.com/apache/hive/pull/3276

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   MySQl'bit type should be mapped to a suitable datatype in hive.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=dataconnector_mysql_bit_datatype.q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r877690343


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return 56 which visually doesn't represent how it is originally stored
   
   I think users migiht be able to convert the result **56** to bit **"111000"** themselves instead hive internally.  :)
   
   I will research how to do convert result using hive functions or UDFs as you suggsted. or maybe we can rewrite query sql to "select bin(col)" after hive compile.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on PR #3276:
URL: https://github.com/apache/hive/pull/3276#issuecomment-1127373486

   @nrg4878 Could you please take a look? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r950656597


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @nrg4878 I agree option 1 is the best and native way to handle datasouce specific datatype. But there is need to explore more info about JdbcStorageHandler, e.g. Jdbc pushdown or sth else.
   Anyway, I will try to research and implement option 1.  I will update here if there is any progress. Thx.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957951934


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > option 1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns?
   
   @nrg4878 I have tried set the hive.sql.query to include query like` select bin(id) from testmysqlbit,` and it worked as expected. That is to say, hive.sql.query can push down MySQL's native query to remote MySQL datasouce, and we can get MySQL's bit datatype values using a visible hive's bigint type or hive's string type.
   
   Step to test:
   1. create jdbc-mapping table with **hive.sql.query** in hive:
   `   CREATE EXTERNAL TABLE jdbc_testmysqlbit_with_query`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "root",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1",`
   `"hive.sql.query" = "select bin(id) from testmysqlbit"`
   `);`
   
   2. `select * from jdbc_testmysqlbit_with_query;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187336443-51942986-1284-4fc2-90b8-eae02db3beb2.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive
URL: https://github.com/apache/hive/pull/3276


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #3276:
URL: https://github.com/apache/hive/pull/3276#issuecomment-1211432971

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957963875


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @nrg4878 Another alternative is the one I mentioned earlier, that is we use hive's internal udf bin() to convert the bigint values to bit type. This one needs users to be aware of hive's bin() udf:
   
   Step to test:
   1. Create jdbc-mapping table in hive. Use hive's bigint to map MySQL's bit datatype:
    ` CREATE EXTERNAL TABLE jdbc_testmysqlbit_use_bigint`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "user",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1"`
   `);`
    
   2. `select * from jdbc_testmysqlbit_use_bigint;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187339777-16cc48f7-af07-4e31-8095-660f8e8e0c58.png)
   
   3. `select bin(id) from jdbc_testmysqlbit_use_bigint;` using hive beeline:
     
   ![image](https://user-images.githubusercontent.com/9760681/187339967-bd33c930-85de-4da5-ae4c-8a3f13d7763d.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r876134671


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   so bit datatype in MYSQL stores and array of bits. for any column, bit-type, with size > 1, the proposal is to use bigint. If I understand this correctly, b'111000' will be read as '111000' (numerical). Isnt this incorrect as these are not equivalent? But also to be able to read this datatype, should we read as "select bin(col)" while reading from remote table? Can you give me an example of how you tested this and what the results were? Thank you



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r876791141


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   for bit-type with size <= 1 of mysql column, users usually use this column to express false or true, so it make sense to be recognized boolean type in hive.
   
   for bit-type with size >1 of mysql column, users may want to express different semantics, eg weekly working calendar, bit(7) , and in this case we can not simply convert bit(7) to boolean type. I convert this bit(n) to bigint in hive, eg **b'111000' will be read as 56**.
   
   in this pr, i  added qtest to select bit datatype, but HIVE-26192 blocked me, i hope you can give advice about HIVE-26192。
   
   >  But also to be able to read this datatype, should we read as "select bin(col)" while reading from remote table?
   I think it is diffcult and not worth to read as "select bin(col)".  Becaue if we want do this, we may need to mak type check when executing query and then convert column type to bit.  Also, hive can not recognize bit type, and i think hive will compile failed with bit type before the query is submited to mysql.  That's just my basic thought. 
    
   In addtion, i have checked  the presto and spark code. presto will convert all bit type to bolean. Spark have the similar ideas with this pr. I think spark idea is more reasonable.
   https://github.com/trinodb/trino/blob/543ae143cadeb47ab03af4197dae9d00ff5baf7c/plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java#L377-L379
   
   https://github.com/apache/spark/blob/7309e76d8b95e306d6f3d2f611316b748949e9cf/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L64-#L71
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r876791141


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   for bit-type with size <= 1 of mysql column, users usually use this column to express false or true, so it make sense to be recognized boolean type in hive.
   
   for bit-type with size >1 of mysql column, users may want to express different semantics, eg weekly working calendar, bit(7) , and in this case we can not simply convert bit(7) to boolean type. I convert this bit(n) to bigint in hive, eg **b'111000' will be read as 56**.
   
   in this pr, i  added qtest to select bit datatype, but HIVE-26192 blocked me, i hope you can give advice about HIVE-26192。
   
   > But also to be able to read this datatype, should we read as "select bin(col)" while reading from remote  table?
   
   I think it is diffcult and not worth to read as "select bin(col)".  Becaue if we want do this, we may need to mak type check when executing query and then convert column type to bit.  Also, hive can not recognize bit type, and i think hive will compile failed with bit type before the query is submited to mysql.  That's just my basic thought. 
   
    
   In addtion, i have checked  the presto and spark code. presto will convert all bit type to bolean. Spark have the similar ideas with this pr. I think spark idea is more reasonable.
   https://github.com/trinodb/trino/blob/543ae143cadeb47ab03af4197dae9d00ff5baf7c/plugin/trino-mysql/src/main/java/io/trino/plugin/mysql/MySqlClient.java#L377-L379
   
   https://github.com/apache/spark/blob/7309e76d8b95e306d6f3d2f611316b748949e9cf/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala#L64-#L71
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957951934


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > option 1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns?
   
   @nrg4878 I have tried set the hive.sql.query to include query like` select bin(id) from testmysqlbit,` and it worked as expected. That is to say, hive.sql.query can push down MySQL's native query to remote MySQL datasouce, and we can get MySQL's bit datatype values using a visible hive's bigint type or hive's string type.
   
   Step to test:
   1. create jdbc-mapping table with **hive.sql.query** in hive:
   `   CREATE EXTERNAL TABLE jdbc_testmysqlbit_with_query`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "user",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.dbcp.maxActive" = "1",`
   `"hive.sql.query" = "select bin(id) from testmysqlbit"`
   `);`
   
   2. `select * from jdbc_testmysqlbit_with_query;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187336443-51942986-1284-4fc2-90b8-eae02db3beb2.png)
   
   
   
   However, one concern is that the _**hive.sql.query**_ is a fixed values and can not be rewrited with more optimized pushdown computation on top of the table. As the doc descripted:
   https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler#JDBCStorageHandler-ComputationPushdown
   
   > Computation pushdown will only happen when the jdbc table is defined by “hive.sql.table”. Hive will rewrite the data source with a “hive.sql.query” property with more computation on top of the table. In the above example, mysql will run the query and retrieve the join result, rather than fetch both tables and do the join in Hive.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on PR #3276:
URL: https://github.com/apache/hive/pull/3276#issuecomment-1125617331

   @zabetak thx. I'll run a full test and try to fix this test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r895032467


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @nrg4878 Sorry for not updating the issue for a long time. 
   In my opinion, we have two ways to get origina MySQL BITs values, that is b'111000' in MySQL will be read as '111000'(binary number) in hive.
   
   First way:
   In MySQL, we can get bit values using 'select bin(days) from work_calendar'. So i was thinking how we use MySQL native language to get bit values in hive. After some research, I found we could use jdbc table computation pushdown(hive.sql.query) to rewrite query sql. That is to say, when we query a bit type table using 'select days from work_calendar' in hive, we can rewrite the pushdown query to 'select bin(days) from work_calendar'. In this way, we may need check original MySQL data type and rewrite bit type query when pushdown. Maybe we should do some change in JdbcSerde.java.
   https://cwiki.apache.org/confluence/display/Hive/JDBC+Storage+Handler#JDBCStorageHandler-ComputationPushdown
   
   Second way:
   I was also thinking using udf to to cast valus to BITs as you suggested. That is, 56(bigint) is cast to '111000'(binary number) in hive internal. Fortunately, I found hive already has this udf bin:
   https://github.com/apache/hive/blob/f29cb2245c97102975ea0dd73783049eaa0947a0/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFBin.java#L28-L37
   So, based on this pr, we can get bit values using hive query 'select bin(days) from work_calendar'. And we no need change current jdbc code(JdbcSerDe.java or others). I like this simple way as we can reuse bin udf to acheive our goal.
   
   wdyt? if you have better ways, feel free to come up with. thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r959360941


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   Converting to bigint may have problems in joining or pushing predicate down on the bit column. I wondering if we take the trino way to deal with such column: converts the bit(1) to boolean type, and for those greater than 1, throws MetaException indicating that Hive is unable to map that remote table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r959386377


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   Not sure if this way(to bigint) affects the joining or pushing predicate down in hive. I have checked Spark related code earlier and that's what spark did.
   
   The Trino's way is simple and straightforward, but maybe lose the real meaning of MySQL's bit(n>1). However, i also think this is an optional solution because maybe we don't want  spare much time to deal with the special and not commonly used data type. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r946175305


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @zhangbutao Sorry for a delayed response. 
   So with option #1, HMS would replace the bit type columns for MySQL with bin(column_name). I kinda like this option in that it simplifies the logic and pushes the onus on the JDBC datasource natively. 
   Option #2 uses hive UDF which maybe different from the native implementation. And it also involves computation on the hive side.
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r955715950


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   Can we convert the bit(n>1) to hive binary type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957963875


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @nrg4878 Another alternative is the one I mentioned earlier, that is we use hive's interal udf bin() to convert the bigint values to bit type. This one needs users to be aware of hive's bin() udf:
   
   Step to test:
   1. Create jdbc-mapping table in hive. Use hive's bigint to map MySQL's bit datatype:
    ` CREATE EXTERNAL TABLE jdbc_testmysqlbit_use_bigint`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "user",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1"`
   `);`
    
   2. `select * from jdbc_testmysqlbit_use_bigint;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187339777-16cc48f7-af07-4e31-8095-660f8e8e0c58.png)
   
   3. `select bin(id) from jdbc_testmysqlbit_use_bigint;` using hive beeline:
     
   ![image](https://user-images.githubusercontent.com/9760681/187339967-bd33c930-85de-4da5-ae4c-8a3f13d7763d.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957954979


##########
ql/src/test/queries/clientpositive/dataconnector_mysql_bit_datatype.q:
##########
@@ -0,0 +1,24 @@
+--!qt:database:mysql:q_test_mysql_datatype_mapping.sql
+-- CREATE with comment
+CREATE CONNECTOR mysql_data_type_dc
+TYPE 'mysql'
+URL 'jdbc:mysql://localhost:3306/qtestDB'
+COMMENT 'test connector'
+WITH DCPROPERTIES (
+"hive.sql.dbcp.username"="root",
+"hive.sql.dbcp.password"="qtestpassword");
+
+CREATE REMOTE DATABASE mysql_data_type_db USING mysql_data_type_dc with DBPROPERTIES("connector.remoteDbName"="qtestDB");
+
+USE mysql_data_type_db;
+SHOW TABLES;
+
+SHOW CREATE TABLE work_calendar;
+-- SELECT * FROM work_calendar;  //Wait for HIVE-26192 to be fixed

Review Comment:
   Yes, i will uncomment this once we reach an agreement how to deal with this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on PR #3276:
URL: https://github.com/apache/hive/pull/3276#issuecomment-1124603589

   This seems like a weird failed test. I can ran it successfully in my local env with command:
   `mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=dataconnector_mysql_bit_datatype.q`
   
   But in the pr integration test it failed with error:
   `fname=dataconnector_mysql_bit_datatype.q
   See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ for specific test cases logs.
    org.apache.hadoop.hive.ql.parse.SemanticException: Unable to fetch table work_calendar. Error retrieving remote table:java.sql.SQLNonTransientConnectionException: (conn=4) Connection is closed
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1866)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1851)
    at org.apache.hadoop.hive.ql.ddl.table.create.show.ShowCreateTableAnalyzer.analyzeInternal(ShowCreateTableAnalyzer.java:53)`
   
   I think the integration test env may has something wrong. It may be a database driver compatibility problem or something others.  
   Do you have any suggestions ? @nrg4878 @zabetak @kgyrtkirk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] dengzhhu653 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
dengzhhu653 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r955718045


##########
ql/src/test/queries/clientpositive/dataconnector_mysql_bit_datatype.q:
##########
@@ -0,0 +1,24 @@
+--!qt:database:mysql:q_test_mysql_datatype_mapping.sql
+-- CREATE with comment
+CREATE CONNECTOR mysql_data_type_dc
+TYPE 'mysql'
+URL 'jdbc:mysql://localhost:3306/qtestDB'
+COMMENT 'test connector'
+WITH DCPROPERTIES (
+"hive.sql.dbcp.username"="root",
+"hive.sql.dbcp.password"="qtestpassword");
+
+CREATE REMOTE DATABASE mysql_data_type_db USING mysql_data_type_dc with DBPROPERTIES("connector.remoteDbName"="qtestDB");
+
+USE mysql_data_type_db;
+SHOW TABLES;
+
+SHOW CREATE TABLE work_calendar;
+-- SELECT * FROM work_calendar;  //Wait for HIVE-26192 to be fixed

Review Comment:
   Please uncomment the query and add "explain select * ... ", adding a predicate on the bit column would be good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r956125849


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @zhangbutao is option #1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns?
   
   mapping to hive binary type might work. One concern is that doesn't hive expect binary values to be within ''. So something like '0111' instead of 0111. Can you please try it out to see how this works end to end? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r894103501


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @zhangbutao Any luck with the above item related to bit datatype



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] nrg4878 commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
nrg4878 commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r877230118


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   absolutely agreed on the BIT(1) that the intent is to store a boolean.
   for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return **56**  which visually doesn't represent how it is originally stored. Just thinking out loud here, in this case are we better off representing these BITS as a string in hive. So we see something like this **"111000"** on a select.
   
   I think in both cases, we can apply functions (or custom UDFs) to cast this value back to BITs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zabetak commented on PR #3276:
URL: https://github.com/apache/hive/pull/3276#issuecomment-1125046939

   @zhangbutao Instead of running the test individually try to run the whole split (`mvn -Pitests -Pqsplits test -Dtest=org.apache.hadoop.hive.cli.split6.TestMiniLlapLocalCliDriver`) locally. I suspect there is some kind of interference with other tests (possibly `dataconnector_mysql.q`).
   
   From the error message I suppose we are trying to use an "old" connection that is no longer open. Possibly we are closing a connection that I shouldn't or something along these lines.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r877690343


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > for BIT(n), if we represent the value as bigint(decimal) in hive, will the original intent have been lost? a select would return 56 which visually doesn't represent how it is originally stored
   
   I think users migiht be able to convert the result **56** to bit **"111000"** themselves instead of hive internally.  :)
   
   I will research how to convert result using hive functions or UDFs as you suggsted. or maybe we can rewrite query sql to "select bin(col)" after hive compile.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957943881


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   @dengzhhu653 @nrg4878 Thanks for your suggestion.
   Actually, i have tried to convert MySQL's bit datatype to hive's binary, but did not get expected restult. In this case, hive stores MySQL's  bit values(b'111000') as binary type, and this binary values can not dispaly to users, that is to say users only can see a blank result.
   
   Step to test:
   1. create mysql table with bit data type in mysql:
   ![image](https://user-images.githubusercontent.com/9760681/187333226-f23705a5-50b6-40bf-b1e1-8fcecc7a065c.png)
   
   2. create jdbc-mapping table in hive:
   `   CREATE EXTERNAL TABLE jdbc_testmysqlbit`
   `(`
   `  id binary`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "user",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1"`
   `);`
   
   3.`select * from jdbc_testmysqlbit;`  using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187333975-e7d6270c-e922-4088-948f-46024cbcb6e5.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957951934


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > option 1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns?
   
   @nrg4878 I have tried set the hive.sql.query to include query like` select bin(id) from testmysqlbit,` and it worked as expected. That is to say, hive.sql.query can push down MySQL's native query to remote MySQL datasouce, and we can get MySQL's bit datatype values using a visible hive's bigint type or hive's string type.
   
   Step to test:
   1. create jdbc-mapping table with **hive.sql.query** in hive:
   `   CREATE EXTERNAL TABLE jdbc_testmysqlbit_with_query`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "user",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1",`
   `"hive.sql.query" = "select bin(id) from testmysqlbit"`
   `);`
   
   2. `select * from jdbc_testmysqlbit_with_query;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187336443-51942986-1284-4fc2-90b8-eae02db3beb2.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zhangbutao commented on a diff in pull request #3276: HIVE-26046: MySQL's bit datatype is default to void datatype in hive

Posted by GitBox <gi...@apache.org>.
zhangbutao commented on code in PR #3276:
URL: https://github.com/apache/hive/pull/3276#discussion_r957951934


##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/dataconnector/jdbc/MySQLConnectorProvider.java:
##########
@@ -90,10 +90,20 @@ protected String getDataType(String dbDataType, int size) {
     // map any db specific types here.
     switch (dbDataType.toLowerCase())
     {
+    case "bit":
+      return toHiveBitType(size);
     default:
       mappedType = ColumnType.VOID_TYPE_NAME;
       break;
     }
     return mappedType;
   }
+
+  private String toHiveBitType(int size) {
+    if (size <= 1) {
+      return ColumnType.BOOLEAN_TYPE_NAME;
+    } else {
+      return ColumnType.BIGINT_TYPE_NAME;

Review Comment:
   > option 1 above not possible where we set the hive.sql.query to include like a select bin(column_name) ... on the table containing bit type columns?
   
   @nrg4878 I have tried set the hive.sql.query to include query like` select bin(id) from testmysqlbit,` and it worked as expected. That is to say, hive.sql.query can push down MySQL's native query to remote MySQL datasouce, and we can get MySQL's bit datatype values using a visible hive's bitint type or hive's string type.
   
   Step to test:
   1. create jdbc-mapping table with **hive.sql.query** in hive:
   `   CREATE EXTERNAL TABLE jdbc_testmysqlbit_with_query`
   `(`
   `  id bigint`
   `)`
   `STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'`
   `TBLPROPERTIES (`
   `"hive.sql.database.type" = "MYSQL",`
   `"hive.sql.jdbc.driver" = "com.mysql.jdbc.Driver",`
   `"hive.sql.jdbc.url" = "jdbc:mysql://localhost:3306/testdb",`
   `"hive.sql.dbcp.username" = "root",`
   `"hive.sql.dbcp.password" = "passwd",`
   `"hive.sql.table" = "testmysqlbit",`
   `"hive.sql.dbcp.maxActive" = "1",`
   `"hive.sql.query" = "select bin(id) from testmysqlbit"`
   `);`
   
   2. `select * from jdbc_testmysqlbit_with_query;` using hive beeline:
   ![image](https://user-images.githubusercontent.com/9760681/187336443-51942986-1284-4fc2-90b8-eae02db3beb2.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org