You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "xleoken (via GitHub)" <gi...@apache.org> on 2024/03/20 08:31:48 UTC

[PR] [SPARK-47482] Add HiveDialect to sql module [spark]

xleoken opened a new pull request, #45609:
URL: https://github.com/apache/spark/pull/45609

### What changes were proposed in this pull request?

Add HiveDialect to sql module

### Why are the changes needed?

In scenarios with multiple hive catalogs, throw `ParseException`

SQL
```
bin/spark-sql \
--conf "spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
--conf "spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data" \
--conf "spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver" \
--conf "spark.sql.catalog.bbb=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
--conf "spark.sql.catalog.bbb.url=jdbc:hive2://172.16.10.13:10000/data" \
--conf "spark.sql.catalog.bbb.driver=org.apache.hive.jdbc.HiveDriver"

select count(1) from aaa.data.data_part;
```

Exception
```
24/03/19 21:58:25 INFO HiveSessionImpl: Operation log session directory is created: /tmp/root/operation_logs/f15a5434-6356-455b-aa8e-4ce9903c1b81
24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Submitting query 'SELECT * FROM "data"."data_part" WHERE 1=0' with a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 INFO SparkExecuteStatementOperation: Running query with a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 INFO DAGScheduler: Asked to cancel job group a7459d6d-2a5c-4b56-945c-3159e58d12fd
24/03/19 21:58:25 ERROR SparkExecuteStatementOperation: Error executing query with a7459d6d-2a5c-4b56-945c-3159e58d12fd, currentState RUNNING,
org.apache.spark.sql.catalyst.parser.ParseException:
Syntax error at or near '"data"'(line 1, pos 14)

== SQL ==
SELECT * FROM "data"."data_part" WHERE 1=0
--------------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:89)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:620)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:620)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
```

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

local test

### Was this patch authored or co-authored using generative AI tooling?

no

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2012154538

   Just FYI, SPARK-47496 makes loading a custom dialect much easier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2012528573

   Thank you, @xleoken and all.
   
   Let me close this PR first to prevent accidental merging. We can continue to discuss on this PR (or old PRs, https://github.com/apache/spark/pull/19238 and https://github.com/apache/spark/pull/4015).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011349301

   For the record, it was rejected at https://github.com/apache/spark/pull/19238 and https://github.com/apache/spark/pull/4015


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/c48c5006-d51f-4eeb-9779-5b30fc24cde5)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533099686


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   hi @dongjoon-hyun, thanks for your review. 
   Your considerations are correct, but this patch is applicable to both `Hive Thrift Server` and `Spart Thrift Server`.
   
   > You want to achieve to introduce Hive-specific syntax via this HiveDialect instead of Spark Thrift Server, right?
   
   Actually, it's not. I used `sbin/start-thriftserver.sh` in the production environment.
   
   > I'm not sure HiveDialect is a valid name in Apache Spark community
   
   OK, `HiveDialect` seems better for `jdbc:hive2`. In the future, if encountering Hive-specific syntax or SparkSQL-specific syntax issue, we can distinguish between Hive and Spark in specific methods.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011074139

   > Thanks for pinging me @dongjoon-hyun.
   > 
   > I know that it's technically feasible. But we have a much more efficient and direct way to access hive tables. I don't see the necessity for adding it as a built-in dialect.
   
   Welcome @yaooqinn, can you explain in detail? The key to this patch is to override `quoteIdentifier` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2010187142

   cc @yaooqinn , too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun closed pull request #45609: [SPARK-47482] Add HiveDialect to sql module
URL: https://github.com/apache/spark/pull/45609


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533099686


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   hi @dongjoon-hyun, thanks for your review. 
   Your considerations are correct, but this patch is applicable to both `Hive Thrift Server` and `Spart Thrift Server`.
   
   > You want to achieve to introduce Hive-specific syntax via this HiveDialect instead of Spark Thrift Server, right?
   Actually, it's not. I used `sbin/start-thriftserver.sh` in the production environment.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011724349

   I think Spark already supported visit Hive. It's the mainstream approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011733604

   > I think Spark already supported visit Hive. It's the mainstream approach.
   
   Hi @beliefer,  we want to query data from two independent data centers, so we use multiple jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/730041ad-caf2-4aef-aab7-03e69aca6345)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011740762

   @xleoken I think you can implements the catalog plugin and register two custom hive jdbc dialects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011372388

   > Not all Drivers can to be built-in support at Apache Spark. It can be provided as a thridparty library
   
   But we know, spark thrift server is based on hive thrift server, so `HiveDialect` should be built-in support at Apache Spark.
   
   By the way, how to provided as a thridparty library? Put only one `HiveDialect` scala file? it seems not friendly.
   
   https://github.com/apache/spark/pull/19238 https://github.com/apache/spark/pull/19238 meets the same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011766349

   > @xleoken I think you can implements the catalog plugin and register two custom hive jdbc dialects.
   
   It seems too heavy. 
   
   What I'm curious about is why such people encounter the same problem, but the community doesn't solve it.
   
   As Daniel Fernandez said, only two functions should be overriden. in https://issues.apache.org/jira/browse/SPARK-22016
   
   https://issues.apache.org/jira/browse/SPARK-21063
   https://issues.apache.org/jira/browse/SPARK-22016
   https://issues.apache.org/jira/browse/SPARK-31457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/0f71419b-380d-444a-bbaf-b4ee82cc30fc)
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533265715


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   If you are trying to use this for Spark Thrift Server, this should be `SparkDialect` in Spark community. However, in that case, it will look very weird because Apache Spark **needs** a direct to access itself. That's the meaning why we don't want to add any `SparkDialect` or `HiveDialect`.
   > Actually, it's not. I used sbin/start-thriftserver.sh in the production environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011049532

   Thanks for pinging me @dongjoon-hyun.
   
   I know that it's technically feasible. But we have a much more efficient and direct way to access hive tables. I don't see the necessity for adding it as a built-in dialect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011377844

   @HyukjinKwon how about change `JdbcDialects#quoteIdentifier`, let it return s"`$colName`" instead of s""""$colName"""".
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1533369753


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Hi @dongjoon-hyun, we want to query data from two independent data centers, so we use multiple spark jdbc catalogs.
   
   ![image](https://github.com/apache/spark/assets/95013770/056fc087-c6a3-4155-a9e8-ea495e02860d)
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011360014

   > For the record, it was rejected at #19238 and #4015
   
   Hi @HyukjinKwon, I can not understand why it was rejected.
   
   The following sql will run failed without `HiveDialect`.
   ```
   bin/spark-sql \
     --conf "spark.sql.catalog.aaa=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.aaa.url=jdbc:hive2://172.16.10.12:10000/data" \
     --conf "spark.sql.catalog.aaa.driver=org.apache.hive.jdbc.HiveDriver" \
     --conf "spark.sql.catalog.bbb=org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog" \
     --conf "spark.sql.catalog.bbb.url=jdbc:hive2://172.16.10.13:10000/data" \
     --conf "spark.sql.catalog.bbb.driver=org.apache.hive.jdbc.HiveDriver"
   
   select count(1) from aaa.data.data_part;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011362919

   Not all Drivers can't to be built-in support at Apache Spark. It can be provided as a thridparty library


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "xleoken (via GitHub)" <gi...@apache.org>.

xleoken commented on PR #45609:
URL: https://github.com/apache/spark/pull/45609#issuecomment-2011798247

   cc @MrDLontheway @danielfx90, can you share your idea, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on code in PR #45609:
URL: https://github.com/apache/spark/pull/45609#discussion_r1532521015


##########
sql/core/src/main/scala/org/apache/spark/sql/jdbc/HiveDialect.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.jdbc
+
+import java.util.Locale
+
+import org.apache.spark.sql.catalyst.SQLConfHelper
+
+private case object HiveDialect extends JdbcDialect with SQLConfHelper {
+
+  override def canHandle(url : String): Boolean =
+    url.toLowerCase(Locale.ROOT).startsWith("jdbc:hive2")

Review Comment:
   Although I understand your proposal, I'm not sure `HiveDialect` is a valid name in Apache Spark community because Apache Spark ThriftServer also uses `jdbc:hive2`. You want to achieve to introduce Hive-specific syntax via this `HiveDialect` instead of `Spark Thrift Server`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org