You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@kyuubi.apache.org by "Yikf (via GitHub)" <gi...@apache.org> on 2023/02/17 14:32:52 UTC

[GitHub] [kyuubi] Yikf opened a new pull request, #4359: Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

Yikf opened a new pull request, #4359:
URL: https://github.com/apache/kyuubi/pull/4359

   <!--
   Thanks for sending a pull request!
   
   Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/CONTRIBUTING.html
     2. If the PR is related to an issue in https://github.com/apache/kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'.
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'.
   -->
   
   ### _Why are the changes needed?_
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you add a feature, you can talk about the use case of it.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   [SPARK-41448](https://issues.apache.org/jira/browse/SPARK-41448) make consistent MR job IDs in FileBatchWriter and FileFormatWriter in Apache Spark 3.3.2, but it breaks a serializable issue, JobId is non-serializable.
   
   And this pr aims to rewrite `FileWriterFactory` to circumvent the problem
   
   ### _How was this patch tested?_
   - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
   
   - [ ] Add screenshots for manual tests if appropriate
   
   - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4359: Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4359:
URL: https://github.com/apache/kyuubi/pull/4359#discussion_r1110978119


##########
extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/FileWriterFactory.scala:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.spark.connector.hive.write
+
+import java.util.Date
+
+import org.apache.hadoop.mapred.JobID
+import org.apache.hadoop.mapreduce.{TaskAttemptID, TaskID, TaskType}
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{DataWriter, DataWriterFactory}
+import org.apache.spark.sql.execution.datasources.{DynamicPartitionDataSingleWriter, SingleDirectoryDataWriter, WriteJobDescription}
+import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.sparkHadoopWriterUtils
+
+/**
+ * TODO:
+ * 1. This class is rewrite because SPARK-42478.
+ * 2. If a later version of spark 3.3.x is released and SPARK-42478 is fixed,
+ *    we can remove this class

Review Comment:
   Actually, it may be better to reserve this, suppose Spark 3.3.2 will be adopted widely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] Yikf commented on pull request #4359: Workaround for SPARK-41448 to keep FileWriterFactory serializable

Posted by "Yikf (via GitHub)" <gi...@apache.org>.
Yikf commented on PR #4359:
URL: https://github.com/apache/kyuubi/pull/4359#issuecomment-1435692669

   Thanks @pan3793 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 closed pull request #4359: Workaround for SPARK-41448 to keep FileWriterFactory serializable

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 closed pull request #4359: Workaround for SPARK-41448 to keep FileWriterFactory serializable
URL: https://github.com/apache/kyuubi/pull/4359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on pull request #4359: Workaround for SPARK-41448 to keep FileWriterFactory serializable

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on PR #4359:
URL: https://github.com/apache/kyuubi/pull/4359#issuecomment-1435680808

   Thanks, merged to master/1.7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] codecov-commenter commented on pull request #4359: Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #4359:
URL: https://github.com/apache/kyuubi/pull/4359#issuecomment-1434866132

   # [Codecov](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#4359](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2452ae0) into [master](https://codecov.io/gh/apache/kyuubi/commit/0be3cbff6e35c8e86635bfe6d856d0dfa148247d?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0be3cbf) will **decrease** coverage by `0.09%`.
   > The diff coverage is `n/a`.
   
   > :exclamation: Current head 2452ae0 differs from pull request most recent head 1e5164e. Consider uploading reports for the commit 1e5164e to get more accurate results
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #4359      +/-   ##
   ============================================
   - Coverage     53.71%   53.63%   -0.09%     
     Complexity       13       13              
   ============================================
     Files           562      562              
     Lines         30786    30786              
     Branches       4149     4149              
   ============================================
   - Hits          16537    16511      -26     
   - Misses        12697    12723      +26     
     Partials       1552     1552              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...e/kyuubi/jdbc/hive/ClosedOrCancelledException.java](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLWhpdmUtamRiYy9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUva3l1dWJpL2pkYmMvaGl2ZS9DbG9zZWRPckNhbmNlbGxlZEV4Y2VwdGlvbi5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...kyuubi/server/trino/api/v1/StatementResource.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXJ2ZXIvdHJpbm8vYXBpL3YxL1N0YXRlbWVudFJlc291cmNlLnNjYWxh) | `50.66% <0.00%> (-8.00%)` | :arrow_down: |
   | [.../kyuubi/server/mysql/constant/MySQLErrorCode.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXJ2ZXIvbXlzcWwvY29uc3RhbnQvTXlTUUxFcnJvckNvZGUuc2NhbGE=) | `13.84% <0.00%> (-6.16%)` | :arrow_down: |
   | [...ache/kyuubi/server/mysql/MySQLCommandHandler.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXJ2ZXIvbXlzcWwvTXlTUUxDb21tYW5kSGFuZGxlci5zY2FsYQ==) | `77.77% <0.00%> (-4.05%)` | :arrow_down: |
   | [...in/spark/authz/ranger/SparkRangerAdminPlugin.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZXh0ZW5zaW9ucy9zcGFyay9reXV1Ymktc3BhcmstYXV0aHovc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9reXV1YmkvcGx1Z2luL3NwYXJrL2F1dGh6L3Jhbmdlci9TcGFya1JhbmdlckFkbWluUGx1Z2luLnNjYWxh) | `64.47% <0.00%> (-2.64%)` | :arrow_down: |
   | [...g/apache/kyuubi/operation/BatchJobSubmission.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9vcGVyYXRpb24vQmF0Y2hKb2JTdWJtaXNzaW9uLnNjYWxh) | `75.27% <0.00%> (-2.20%)` | :arrow_down: |
   | [...ache/kyuubi/server/mysql/MySQLGenericPackets.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9zZXJ2ZXIvbXlzcWwvTXlTUUxHZW5lcmljUGFja2V0cy5zY2FsYQ==) | `76.59% <0.00%> (-2.13%)` | :arrow_down: |
   | [...ain/scala/org/apache/kyuubi/engine/EngineRef.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLXNlcnZlci9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9lbmdpbmUvRW5naW5lUmVmLnNjYWxh) | `70.96% <0.00%> (-0.81%)` | :arrow_down: |
   | [...he/kyuubi/ha/client/etcd/EtcdDiscoveryClient.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLWhhL3NyYy9tYWluL3NjYWxhL29yZy9hcGFjaGUva3l1dWJpL2hhL2NsaWVudC9ldGNkL0V0Y2REaXNjb3ZlcnlDbGllbnQuc2NhbGE=) | `69.02% <0.00%> (-0.55%)` | :arrow_down: |
   | [...in/scala/org/apache/kyuubi/config/KyuubiConf.scala](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-a3l1dWJpLWNvbW1vbi9zcmMvbWFpbi9zY2FsYS9vcmcvYXBhY2hlL2t5dXViaS9jb25maWcvS3l1dWJpQ29uZi5zY2FsYQ==) | `97.44% <0.00%> (-0.07%)` | :arrow_down: |
   | ... and [1 more](https://codecov.io/gh/apache/kyuubi/pull/4359?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org


[GitHub] [kyuubi] pan3793 commented on a diff in pull request #4359: Workaround for SPARK-41448 to keep FileWriterFactory serializable

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #4359:
URL: https://github.com/apache/kyuubi/pull/4359#discussion_r1110979834


##########
extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/write/FileWriterFactory.scala:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kyuubi.spark.connector.hive.write
+
+import java.util.Date
+
+import org.apache.hadoop.mapred.JobID
+import org.apache.hadoop.mapreduce.{TaskAttemptID, TaskID, TaskType}
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.spark.internal.io.FileCommitProtocol
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.write.{DataWriter, DataWriterFactory}
+import org.apache.spark.sql.execution.datasources.{DynamicPartitionDataSingleWriter, SingleDirectoryDataWriter, WriteJobDescription}
+import org.apache.spark.sql.hive.kyuubi.connector.HiveBridgeHelper.sparkHadoopWriterUtils
+
+/**
+ * TODO:
+ * 1. This class is rewrite because SPARK-42478.
+ * 2. If a later version of spark 3.3.x is released and SPARK-42478 is fixed,
+ *    we can remove this class

Review Comment:
   ```suggestion
    * This class is rewritten because of SPARK-42478, which affects Spark 3.3.2
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@kyuubi.apache.org
For additional commands, e-mail: notifications-help@kyuubi.apache.org