You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "yaooqinn (via GitHub)" <gi...@apache.org> on 2024/03/22 10:51:26 UTC

[PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

yaooqinn opened a new pull request, #45666:
URL: https://github.com/apache/spark/pull/45666

   
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Read MySQL FLOAT as `FloatType` to keep consistent with the write side. In `getJDBCType`, we map `FloatType` to `JdbcType("FLOAT", java.sql.Types.FLOAT)`. So, in `getCatalystType`, we shall keep it consistent.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   read/write roundtrip for MySQL float
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   yes, please refer to the migration doc
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->new tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1535814994


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##########
@@ -123,6 +123,8 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHelper {
         // Signed values in [-8388608, 8388607] and unsigned values in [0, 16777215],
         // both of them fit IntegerType
         Some(IntegerType)
+      case Types.REAL | Types.FLOAT =>
+        if (md.build().getBoolean("isSigned")) Some(FloatType) else Some(DoubleType)

Review Comment:
   Just a question. We are safe even if MySQL community remove UNSIGNED FLOAT , right?
   - https://dev.mysql.com/doc/refman/8.3/en/numeric-type-syntax.html
   > The UNSIGNED attribute is deprecated for columns of type [FLOAT](https://dev.mysql.com/doc/refman/8.3/en/floating-point-types.html) (and any synonyms) and you should expect support for it to be removed in a future version of MySQL.
   
   Although it's a separate question, do you think we need to warn from our side, @yaooqinn ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45666:
URL: https://github.com/apache/spark/pull/45666#issuecomment-2015412507

   BTW, thank you, @yaooqinn . Thanks to your PR, I'm refreshing my knowledge in this domain.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45666:
URL: https://github.com/apache/spark/pull/45666#issuecomment-2015660943

   Opps, I added more tests for float(p) but didn't get a chance to upload it.:)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45666:
URL: https://github.com/apache/spark/pull/45666#issuecomment-2015278289

   Thank you @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1535823668


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -938,9 +938,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
       Some(BooleanType))
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ==
       Some(ByteType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(FloatType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(FloatType))
     metadata.putBoolean("isSigned", value = false)
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ===
       Some(ShortType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(DoubleType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(DoubleType))

Review Comment:
   Just for my understanding, if we use the following, do you know when `DOUBLE` is required?
   
   https://dev.mysql.com/doc/refman/8.3/en/floating-point-types.html
   > FLOAT[(M,D)] [UNSIGNED] [ZEROFILL]
   
   - A small (single-precision) floating-point number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45666:
URL: https://github.com/apache/spark/pull/45666#issuecomment-2015654046

   Merged to master for Apache Spark 4.0.0~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1535998312


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -938,9 +938,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
       Some(BooleanType))
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ==
       Some(ByteType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(FloatType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(FloatType))
     metadata.putBoolean("isSigned", value = false)
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ===
       Some(ShortType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(DoubleType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(DoubleType))

Review Comment:
   Hi @dongjoon-hyun, this is a good point.
   
   A `float(p)` can represent a single-precision float when p <= 23 and double-precision when p > 23 in MySQL. This is already handled by mysql-connect/j, so we get java.sql.Types.FLOAT/REAL for the former and java.sql.Types.DOUBLE for the latter. We do not need to worry about the case that we use our `FloatType` to get a MySQL Float(p>23).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1536012041


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -938,9 +938,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
       Some(BooleanType))
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ==
       Some(ByteType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(FloatType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(FloatType))
     metadata.putBoolean("isSigned", value = false)
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ===
       Some(ShortType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(DoubleType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(DoubleType))

Review Comment:
   You are welcome @dongjoon-hyun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #45666: [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side
URL: https://github.com/apache/spark/pull/45666


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45666:
URL: https://github.com/apache/spark/pull/45666#issuecomment-2015671826

   Thank you @dongjoon-hyun. I sent #45672 a follow-up to improve the test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1535823668


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -938,9 +938,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
       Some(BooleanType))
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ==
       Some(ByteType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(FloatType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(FloatType))
     metadata.putBoolean("isSigned", value = false)
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ===
       Some(ShortType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(DoubleType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(DoubleType))

Review Comment:
   If we use the following, do you know when `DOUBLE` is required?
   
   https://dev.mysql.com/doc/refman/8.3/en/floating-point-types.html
   > FLOAT[(M,D)] [UNSIGNED] [ZEROFILL]
   
   - A small (single-precision) floating-point number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1536002052


##########
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala:
##########
@@ -938,9 +938,17 @@ class JDBCSuite extends QueryTest with SharedSparkSession {
       Some(BooleanType))
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ==
       Some(ByteType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(FloatType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(FloatType))
     metadata.putBoolean("isSigned", value = false)
     assert(mySqlDialect.getCatalystType(java.sql.Types.TINYINT, "TINYINT", 1, metadata) ===
       Some(ShortType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.REAL, "FLOAT", 1, metadata) ===
+      Some(DoubleType))
+    assert(mySqlDialect.getCatalystType(java.sql.Types.FLOAT, "FLOAT", 1, metadata) ===
+      Some(DoubleType))

Review Comment:
   Thank you for the clarification.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1536005042


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##########
@@ -123,6 +123,8 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHelper {
         // Signed values in [-8388608, 8388607] and unsigned values in [0, 16777215],
         // both of them fit IntegerType
         Some(IntegerType)
+      case Types.REAL | Types.FLOAT =>
+        if (md.build().getBoolean("isSigned")) Some(FloatType) else Some(DoubleType)

Review Comment:
   > do you think we need to warn from our side
   
   Oh, I missed this. This won't be necessary, it looks better to be warned when users use DDLs against MySQL DB. In this part, we actually act as a reader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47522][SQL] Read MySQL FLOAT as FloatType to keep consistent with the write side [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on code in PR #45666:
URL: https://github.com/apache/spark/pull/45666#discussion_r1536002553


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala:
##########
@@ -123,6 +123,8 @@ private case class MySQLDialect() extends JdbcDialect with SQLConfHelper {
         // Signed values in [-8388608, 8388607] and unsigned values in [0, 16777215],
         // both of them fit IntegerType
         Some(IntegerType)
+      case Types.REAL | Types.FLOAT =>
+        if (md.build().getBoolean("isSigned")) Some(FloatType) else Some(DoubleType)

Review Comment:
   Hi @dongjoon-hyun. Yes, it's safe, we get `isSigned` from the JDBC standard API. It's OK for the DB side to remove this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org