You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "xleoken (via GitHub)" <gi...@apache.org> on 2024/03/20 08:31:48 UTC

[PR] [SPARK-47482] Add HiveDialect to sql module [spark]

xleoken opened a new pull request, #45609:
URL: https://github.com/apache/spark/pull/45609

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'common/utils/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Add HiveDialect to sql module
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   In scenarios with multiple hive catalogs, throw `ParseException`
   
   SQL
   ```
   bin/spark-sql \
     --conf "spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data" \
     --conf "spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver" \
     --conf "spark.sql.catalog.bbb=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.bbb.url=jdbc:hive2://172.16.10.13:10000/data" \
     --conf "spark.sql.catalog.bbb.driver=org.apache.hive.jdbc.HiveDriver"
   
   select count(1) from aaa.data.data_part;
   ```
   
   Exception
   ```
   24/03/19 21:58:25 INFO HiveSessionImpl: Operation log session directory is created: /tmp/root/operation_logs/f15a5434-6356-455b-aa8e-4ce9903c1b81
   24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Submitting query 'SELECT * FROM "data"."data_part" WHERE 1=0' with a7459d6d-2a5c-4b56-945c-3159e58d12fd
   24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Running query with a7459d6d-2a5c-4b56-945c-3159e58d12fd
   24/03/19 21:58:25 INFO DAGScheduler: Asked to cancel job group a7459d6d-2a5c-4b56-945c-3159e58d12fd
   24/03/19 21:58:25 ERROR SparkExecuteStatementOperation: Error executing query with a7459d6d-2a5c-4b56-945c-3159e58d12fd, currentState RUNNING, 
   org.apache.spark.sql.catalyst.parser.ParseException: 
   Syntax error at or near '"data"'(line 1, pos 14)
   
   == SQL ==
   SELECT * FROM "data"."data_part" WHERE 1=0
   --------------^^^
   
   	at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143)
   	at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
   	at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   no
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   local test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2012154538

   Just FYI, SPARK-47496 makes loading a custom dialect much easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2012528573

   Thank you, @xleoken and all.
   
   Let me close this PR first to prevent accidental merging. We can continue to discuss on this PR (or old PRs, https://github.com/apache/spark/pull/19238 and https://github.com/apache/spark/pull/4015).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011349301

   For the record, it was rejected at https://github.com/apache/spark/pull/19238 and https://github.com/apache/spark/pull/4015


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/c48c5006-d51f-4eeb-9779-5b30fc24cde5)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533099686


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   hi @dongjoon-hyun, thanks for your review. 
   Your considerations are correct, but this patch is applicable to both `Hive Thrift Server` and `Spart Thrift Server`.
   
   > You want to achieve to introduce Hive-specific syntax via this HiveDialect instead of Spark Thrift Server, right?
   
   Actually, it's not. I used `sbin/start-thriftserver.sh` in the production environment.
   
   > I'm not sure HiveDialect is a valid name in Apache Spark community
   
   OK, `HiveDialect` seems better for `jdbc:hive2`. In the future, if encountering Hive-specific syntax or SparkSQL-specific syntax issue, we can distinguish between Hive and Spark in specific methods.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011074139

   > Thanks for pinging me @dongjoon-hyun.
   > 
   > I know that it's technically feasible. But we have a much more efficient and direct way to access hive tables. I don't see the necessity for adding it as a built-in dialect.
   
   Welcome @yaooqinn, can you explain in detail? The key to this patch is to override `quoteIdentifier` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2010187142

   cc @yaooqinn , too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #45609: [SPARK-47482] Add HiveDialect to sql module
URL: https://github.com/apache/spark/pull/45609


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533099686


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   hi @dongjoon-hyun, thanks for your review. 
   Your considerations are correct, but this patch is applicable to both `Hive Thrift Server` and `Spart Thrift Server`.
   
   > You want to achieve to introduce Hive-specific syntax via this HiveDialect instead of Spark Thrift Server, right?
   Actually, it's not. I used `sbin/start-thriftserver.sh` in the production environment.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011724349

   I think Spark already supported visit Hive. It's the mainstream approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011733604

   > I think Spark already supported visit Hive. It's the mainstream approach.
   
   Hi @beliefer,  we want to query data from two independent data centers, so we use multiple jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/730041ad-caf2-4aef-aab7-03e69aca6345)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011740762

   @xleoken I think you can implements the catalog plugin and register two custom hive jdbc dialects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011372388

   > Not all Drivers can to be built-in support at Apache Spark. It can be provided as a thridparty library
   
   But we know, spark thrift server is based on hive thrift server, so `HiveDialect` should be built-in support at Apache Spark.
   
   By the way, how to provided as a thridparty library? Put only one `HiveDialect` scala file? it seems not friendly.
   
   https://github.com/apache/spark/pull/19238 https://github.com/apache/spark/pull/19238 meets the same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011766349

   > @xleoken I think you can implements the catalog plugin and register two custom hive jdbc dialects.
   
   It seems too heavy. 
   
   What I'm curious about is why such people encounter the same problem, but the community doesn't solve it.
   
   As Daniel Fernandez said, only two functions should be overriden. in https://issues.apache.org/jira/browse/SPARK-22016
   
   https://issues.apache.org/jira/browse/SPARK-21063
   https://issues.apache.org/jira/browse/SPARK-22016
   https://issues.apache.org/jira/browse/SPARK-31457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/0f71419b-380d-444a-bbaf-b4ee82cc30fc)
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533265715


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   If you are trying to use this for Spark Thrift Server, this should be `SparkDialect` in Spark community. However, in that case, it will look very weird because Apache Spark **needs** a direct to access itself. That's the meaning why we don't want to add any `SparkDialect` or `HiveDialect`.
   > Actually, it's not. I used sbin/start-thriftserver.sh in the production environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.
yaooqinn commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011049532

   Thanks for pinging me @dongjoon-hyun.
   
   I know that it's technically feasible. But we have a much more efficient and direct way to access hive tables. I don't see the necessity for adding it as a built-in dialect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011377844

   @HyukjinKwon how about change `JdbcDialects#quoteIdentifier`, let it return s"`$colName`" instead of s""""$colName"""".
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/056fc087-c6a3-4155-a9e8-ea495e02860d)
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011360014

   > For the record, it was rejected at #19238 and #4015
   
   Hi @HyukjinKwon, I can not understand why it was rejected.
   
   The following sql will run failed without `HiveDialect`.
   ```
   bin/spark-sql \
     --conf "spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data" \
     --conf "spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver" \
     --conf "spark.sql.catalog.bbb=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.bbb.url=jdbc:hive2://172.16.10.13:10000/data" \
     --conf "spark.sql.catalog.bbb.driver=org.apache.hive.jdbc.HiveDriver"
   
   select count(1) from aaa.data.data_part;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011362919

   Not all Drivers can't to be built-in support at Apache Spark. It can be provided as a thridparty library


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.
xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011798247

   cc @MrDLontheway @danielfx90, can you share your idea, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1532521015


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Although I understand your proposal, I'm not sure `HiveDialect` is a valid name in Apache Spark community because Apache Spark ThriftServer also uses `jdbc:hive2`. You want to achieve to introduce Hive-specific syntax via this `HiveDialect` instead of `Spark Thrift Server`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org