You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "phanhuyn (via GitHub)" <gi...@apache.org> on 2023/12/22 08:13:05 UTC

[PR] Do not convert array type string retrieved from jdbc driver [spark]

phanhuyn opened a new pull request, #44459:
URL: https://github.com/apache/spark/pull/44459

   Hi, thanks for checking the PR. This is a small bug fix to make Scala Spark works with Clickhouse's array type. Let me know if this would cause problem on other DBs. Also let me know if I should add more details to the PR. 
   
   ### Why are the changes needed?
   
   The PR is to fix issue describe at: https://github.com/ClickHouse/clickhouse-java/issues/1505
   
   When using spark to write an array of string to Clickhouse, the Clickhouse JDBC driver throws `java.lang.IllegalArgumentException: Unknown data type: string` exception.
   
   The exception was due to Spark JDBC utils passing an invalid type value `string` (should be `String`). The original type values retrieved from Clickhouse JDBC is correct, but Spark JDBC utils attempts to convert to type string to lower case:
   
   https://github.com/apache/spark/blob/6b931530d75cb4f00236f9c6283de8ef450963ad/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L639
   
   ### What changes were proposed in this pull request?
   
   - Remove the lowercase cast. The string value retrieved from JDBC driver implementation should be passed as-is, Spark shouldn't try to modify the value.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   - Follow the reproduce step at: https://github.com/ClickHouse/clickhouse-java/issues/1505
     - Create ClickHouse table with array string type
     - Write scala Spark job to write to clickhouse
   - Verify the change fixes the issue
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "phanhuyn (via GitHub)" <gi...@apache.org>.
phanhuyn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1893099614

   > Hi @phanhuyn, Could you enable the GitHub action?
   
   I enabled GitHub action and triggered a few builds. It's failing at "Run documentation build" step. I'm not sure why this change would cause the documentation build to fail, could you help to take a look? Thank you.
   
   The action url: https://github.com/phanhuyn/spark/actions/runs/7522708428/job/20479020431
   
   ![Screenshot 2024-01-16 at 4 34 24 PM](https://github.com/apache/spark/assets/7282697/68c16042-9a50-46c0-882a-e68165c807e4)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn closed pull request #44459: [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver
URL: https://github.com/apache/spark/pull/44459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1888402779

   Hi @phanhuyn, Could you enable the GitHub action?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1869181617

   Thanks for the PR. Mind creating a JIRA please? (see also https://spark.apache.org/contributing.html).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1894870313

   Oops, I didn't pay much attention to the authorship resolution step in the merge script, the `Lead-authored-by:` shall be `Nguyen Phan Huy`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1893119206

   This build error shall be fixed, you can rebase the master branch and try again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "phanhuyn (via GitHub)" <gi...@apache.org>.
phanhuyn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1880206832

   > Thanks for the PR. Mind creating a JIRA please? (see also https://spark.apache.org/contributing.html).
   
   Thanks for the reply @HyukjinKwon. I've applied for a JIRA account at https://selfserve.apache.org/jira-account.html, will be creating the story when the account is approved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1894859917

   Thank you @phanhuyn. Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "phanhuyn (via GitHub)" <gi...@apache.org>.
phanhuyn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1894878079

   > Oops, I didn't pay much attention to the authorship resolution step in the merge script, the `Lead-authored-by:` shall be `Nguyen Phan Huy`
   
   All good, no problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

Posted by "phanhuyn (via GitHub)" <gi...@apache.org>.
phanhuyn commented on PR #44459:
URL: https://github.com/apache/spark/pull/44459#issuecomment-1894867519

   thanks @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org