You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/01 08:50:00 UTC
[jira] [Commented] (FLINK-9366) Distribute Cache only works for
client-accessible files
[ https://issues.apache.org/jira/browse/FLINK-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497750#comment-16497750 ]
ASF GitHub Bot commented on FLINK-9366:
---------------------------------------
Github user dawidwys commented on a diff in the pull request:
https://github.com/apache/flink/pull/6107#discussion_r192334113
--- Diff: flink-fs-tests/src/test/java/org/apache/flink/hdfstests/DistributedCacheDfsTest.java ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.hdfstests;
+
+import org.apache.flink.api.common.functions.RichMapFunction;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.test.util.MiniClusterResource;
+import org.apache.flink.util.NetUtils;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests for distributing files with {@link org.apache.flink.api.common.cache.DistributedCache} via HDFS.
+ */
+public class DistributedCacheDfsTest {
+
+ private static final String testFileContent = "Goethe - Faust: Der Tragoedie erster Teil\n" + "Prolog im Himmel.\n"
+ + "Der Herr. Die himmlischen Heerscharen. Nachher Mephistopheles. Die drei\n" + "Erzengel treten vor.\n"
+ + "RAPHAEL: Die Sonne toent, nach alter Weise, In Brudersphaeren Wettgesang,\n"
+ + "Und ihre vorgeschriebne Reise Vollendet sie mit Donnergang. Ihr Anblick\n"
+ + "gibt den Engeln Staerke, Wenn keiner Sie ergruenden mag; die unbegreiflich\n"
+ + "hohen Werke Sind herrlich wie am ersten Tag.\n"
+ + "GABRIEL: Und schnell und unbegreiflich schnelle Dreht sich umher der Erde\n"
+ + "Pracht; Es wechselt Paradieseshelle Mit tiefer, schauervoller Nacht. Es\n"
+ + "schaeumt das Meer in breiten Fluessen Am tiefen Grund der Felsen auf, Und\n"
+ + "Fels und Meer wird fortgerissen Im ewig schnellem Sphaerenlauf.\n"
+ + "MICHAEL: Und Stuerme brausen um die Wette Vom Meer aufs Land, vom Land\n"
+ + "aufs Meer, und bilden wuetend eine Kette Der tiefsten Wirkung rings umher.\n"
+ + "Da flammt ein blitzendes Verheeren Dem Pfade vor des Donnerschlags. Doch\n"
+ + "deine Boten, Herr, verehren Das sanfte Wandeln deines Tags.";
+
+ @ClassRule
+ public static TemporaryFolder tempFolder = new TemporaryFolder();
+
+ private static MiniClusterResource miniClusterResource;
+ private static MiniDFSCluster hdfsCluster;
+ private static Configuration conf = new Configuration();
+
+ private static Path testFile;
+ private static Path testDir;
+
+ @BeforeClass
+ public static void setup() throws Exception {
+ File dataDir = tempFolder.newFolder();
+
+ conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, dataDir.getAbsolutePath());
+ MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
+ hdfsCluster = builder.build();
+
+ String hdfsURI = "hdfs://"
+ + NetUtils.hostAndPortToUrlString(hdfsCluster.getURI().getHost(), hdfsCluster.getNameNodePort())
+ + "/";
+
+ miniClusterResource = new MiniClusterResource(
--- End diff --
It is not necessary. Changed it.
> Distribute Cache only works for client-accessible files
> -------------------------------------------------------
>
> Key: FLINK-9366
> URL: https://issues.apache.org/jira/browse/FLINK-9366
> Project: Flink
> Issue Type: Bug
> Components: Client, Local Runtime
> Affects Versions: 1.6.0
> Reporter: Chesnay Schepler
> Assignee: Dawid Wysakowicz
> Priority: Blocker
> Fix For: 1.6.0
>
>
> In FLINK-8620 the distributed cache was modified to the distribute files via the blob store, instead of downloading them from a distributed filesystem.
> Previously, taskmanagers would download requested files from the DFS. Now, they retrieve it form the blob store. This requires the client to preemptively upload all files used with distributed cache.
> As a result it is no longer possible to use the distributed cache for files that reside in a cluster-internal DFS, as the client cannot download it. This is a regression from the previous behavior and may break existing setups.
> [~aljoscha] [~dawidwys]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)