You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/11/09 19:25:00 UTC

[GitHub] [flink-table-store] LadyForest opened a new pull request, #369: [FLINK-29964] Support Spark/Hive with OSS

LadyForest opened a new pull request, #369:
URL: https://github.com/apache/flink-table-store/pull/369

   Currently, we rely on Flink's plugin mechanism to initialize Filesystem, which does not apply to other engines like Spark/Hive. This PR resolves
   - Invoke `Filesystem#initialize` to configure the target filesystem (including jar dependency and conf parameters)
   - For Spark, convert case insensitive conf to case sensitive conf (special handling for oss)
   - For Hive, let AK/endpoint can be configured via the `SET` command
   - Tested upon Spark3.3 & Hive3.1, found sorts of class conflicts then decided to shade oss to ease use. So introduce a submodule `flink-table-store-filesystem` to shade fs-oss/fs-s3(in the next pr)
   - Add readme
   
   Towards the E2E test, I'm unsure whether we can put AK info directly since the oss server is hard to mock. (maybe GitHub CI can support passing some secret values as env variable, then we can use credential class instead of AK text)
   For the next pr (which supports s3, there will be an E2E test using minio docker to serve like an s3 server)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#discussion_r1033308142


##########
flink-table-store-hive/flink-table-store-hive-catalog/pom.xml:
##########
@@ -445,6 +445,28 @@ under the License.
                     <groupId>org.pentaho</groupId>
                     <artifactId>pentaho-aggdesigner-algorithm</artifactId>
                 </exclusion>
+                <!-- Exclude the older hadoop-common-->
+                <exclusion>
+                    <groupId>org.apache.hadoop</groupId>
+                    <artifactId>hadoop-common</artifactId>
+                </exclusion>
+            </exclusions>
+        </dependency>
+        <dependency>

Review Comment:
   What causes the change here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
LadyForest commented on PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#issuecomment-1311287152

   > Github supports encrypted secrets for repository (https://docs.github.com/en/actions/security-guides/encrypted-secrets). You might still want to talk with access key providers to make sure that this is safe.
   
   Thanks for the information, while I think I need someone with admin access to set up this procedure.
   ![image](https://user-images.githubusercontent.com/55568005/201278351-4684a8be-e27e-43df-8f6f-eeaef7c69d90.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#issuecomment-1311702365

   https://issues.apache.org/jira/browse/FLINK-30000
   We can evaluate the difficulty of this solution. If it can be solved, it will be much more elegant than calling `FileSystem.initialize`, and it will not cause exceptions in some processes due to missing calls to `FileSystem.initialize`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi merged pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
JingsongLi merged PR #369:
URL: https://github.com/apache/flink-table-store/pull/369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#issuecomment-1328782275

   And please rebase latest master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#issuecomment-1311698953

   > 
   
   We don't have `admin` access too. There is no `settings`...
   And I don't think we should add Aliyun remote service dependency for our testing... Other maintainers will be at a loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] tsreaper commented on a diff in pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
tsreaper commented on code in PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#discussion_r1019776118


##########
docs/content/docs/filesystem/overview.md:
##########
@@ -0,0 +1,91 @@
+---
+title: "Overview"
+weight: 1
+type: docs
+aliases:
+- /filesystem/overview.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+# File Systems for Unified Engine
+
+Apache Flink Table Store utilizes the same pluggable file systems as Apache Flink. Users can follow the [standard plugin mechanism](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/filesystems/plugins/) to configure the

Review Comment:
   Use https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/plugins/ , don't use a specific version.



##########
flink-table-store-spark2/src/main/java/org/apache/flink/table/store/spark/SparkSource.java:
##########
@@ -19,21 +19,25 @@
 package org.apache.flink.table.store.spark;
 
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileSystem;
 import org.apache.flink.table.store.table.FileStoreTableFactory;
 
 import org.apache.spark.sql.sources.DataSourceRegister;
 import org.apache.spark.sql.sources.v2.DataSourceOptions;
 import org.apache.spark.sql.sources.v2.ReadSupport;
+import org.apache.spark.sql.sources.v2.SessionConfigSupport;
 import org.apache.spark.sql.sources.v2.reader.DataSourceReader;
 import org.apache.spark.sql.types.StructType;
 
 /** The Spark source for table store. */
-public class SparkSource implements DataSourceRegister, ReadSupport {
+public class SparkSource implements DataSourceRegister, ReadSupport, SessionConfigSupport {
+
+    private static final String SHORT_NAME = "tablestore";
 
     @Override
     public String shortName() {
         // Not use 'table-store' here, the '-' is not allowed in SQL
-        return "tablestore";
+        return SHORT_NAME;

Review Comment:
   Same above.



##########
flink-table-store-spark/src/main/java/org/apache/flink/table/store/spark/SparkSource.java:
##########
@@ -31,12 +32,14 @@
 import java.util.Map;
 
 /** The spark source for table store. */
-public class SparkSource implements DataSourceRegister, TableProvider {
+public class SparkSource implements DataSourceRegister, SessionConfigSupport {
+
+    private static final String SHORT_NAME = "tablestore";
 
     @Override
     public String shortName() {
         // Not use 'table-store' here, the '-' is not allowed in SQL

Review Comment:
   Move comment above `private static final`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #369: [FLINK-29964] Support Spark/Hive with OSS

Posted by GitBox <gi...@apache.org>.
LadyForest commented on code in PR #369:
URL: https://github.com/apache/flink-table-store/pull/369#discussion_r1033374714


##########
flink-table-store-hive/flink-table-store-hive-catalog/pom.xml:
##########
@@ -445,6 +445,28 @@ under the License.
                     <groupId>org.pentaho</groupId>
                     <artifactId>pentaho-aggdesigner-algorithm</artifactId>
                 </exclusion>
+                <!-- Exclude the older hadoop-common-->
+                <exclusion>
+                    <groupId>org.apache.hadoop</groupId>
+                    <artifactId>hadoop-common</artifactId>
+                </exclusion>
+            </exclusions>
+        </dependency>
+        <dependency>

Review Comment:
   Because the Hive E2E test failed before, I debugged and found that `Configuration#getPropsWithPrefix` is introduced since hadoop-common 2.8, while
   `mvn dependency:tree -Dincludes=org.apache.hadoop:hadoop-common`
   reveals we depended on hadoop-common 2.7.2
   ```
   [INFO] org.apache.flink:flink-table-store-hive-catalog:jar:0.3-SNAPSHOT
   [INFO] \- org.apache.hive.hcatalog:hive-webhcat-java-client:jar:2.3.9:test
   [INFO]    \- org.apache.hadoop:hadoop-common:jar:2.7.2:compile
   ```
   
   So I changed it in the first place. But later, I realized that it'd be better not to call `Configuration#getPropsWithPrefix` but forgot to revert this back.
   
   I'll revert it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org