You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rmetzger <gi...@git.apache.org> on 2015/08/21 12:39:19 UTC

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

GitHub user rmetzger opened a pull request:

    https://github.com/apache/flink/pull/1038

    [FLINK-2555] Properly pass security credentials in the Hadoop Input/Output format wrappers

    This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using the security credentials passed in the JobConf / Job class in the getSplits() method.
    
    Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change. This limitation is due to missing methods in the old APIs.
    
    I've also updated the version of the "de.javakaffee.kryo-serializers" from 0.27 to 0.36 because a user on the ML recently needed a specific Kryo serializer which was not available in the old dependency.
    
    For the Java and Scala API, I renamed the first argument's name: `readHadoopFile(org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V> mapreduceInputFormat, Class<K> key, Class<V> value, String inputPath, Job job)`
    
    This makes it easier in IDE completions to distinguish between the mapreduce and the mapred variant. (before the argument was always called `mapredInputFormat` now, we have the `mapreduceInputFormat` variant where applicable)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rmetzger/flink flink2555

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1038.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1038
    
----
commit bac21bf5d77c8e15c608ecbf006d29e7af1dd68a
Author: Aljoscha Krettek <al...@gmail.com>
Date:   2015-07-23T13:12:38Z

    [FLINK-2398][api-breaking] Introduce StreamGraphGenerator
    
    This decouples the building of the StreamGraph from the API methods.
    Before the methods would build the StreamGraph as they go. Now the API
    methods build a hierachy of StreamTransformation nodes. From these a
    StreamGraph is generated upon execution.
    
    This also introduces some API breaking changes:
    
     - The result of methods that create sinks is now DataStreamSink instead
       of DataStream
     - Iterations cannot have feedback edges with differing parallelism
     - "Preserve partitioning" is not the default for feedback edges. The
       previous option for this is removed.
     - You can close an iteration several times, no need for a union.
     - Strict checking of whether partitioning and parallelism work
       together. I.e. if upstream and downstream parallelism don't match it
       is not legal to have Forward partitioning anymore. This was not very
       transparent: When you went from low parallelism to high dop some
       downstream  operators would never get any input. When you went from high
       parallelism to low dop you would get skew in the downstream operators
       because all elements that would be forwarded to an operator that is not
       "there" go to another operator. This requires insertion of global()
       or rebalance() in some places. For example with most sources which
       have parallelism one.
    
    This also makes StreamExecutionEnvironment.execute() behave consistently
    across different execution environments (local, remote ...): The list of
    operators to be executed are cleared after execute is called.

commit e4b72e6d0148d071a97d2dab5c3bd97b81ee97a5
Author: Robert Metzger <rm...@apache.org>
Date:   2015-08-20T16:43:04Z

    [FLINK-2555] Properly pass security credentials in the Hadoop Input/Output format wrappers
    
    This is needed because the Hadoop IF/OF's are using Hadoop's FileSystem stack, which is using
    the security credentials passed in the JobConf / Job class in the getSplits() method.
    
    Note that access to secured Hadoop 1.x using Hadoop IF/OF's is not possible with this change.
    This limitation is due to missing methods in the old APIs.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134563009
  
    It would be great if we implemented a test case against the MiniKDC server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1038#discussion_r37976251
  
    --- Diff: flink-java/src/main/java/org/apache/flink/api/java/hadoop/common/HadoopInputFormatCommonBase.java ---
    @@ -0,0 +1,79 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.api.java.hadoop.common;
    +
    +import org.apache.flink.api.common.io.RichInputFormat;
    +import org.apache.flink.core.io.InputSplit;
    +import org.apache.hadoop.security.Credentials;
    +import org.apache.hadoop.security.UserGroupInformation;
    +
    +import java.io.IOException;
    +import java.io.ObjectInputStream;
    +import java.io.ObjectOutputStream;
    +import java.lang.reflect.InvocationTargetException;
    +import java.lang.reflect.Method;
    +
    +/**
    + * A common base for both "mapred" and "mapreduce" Hadoop input formats.
    + */
    +public abstract class HadoopInputFormatCommonBase<T, SPITTYPE extends InputSplit> extends RichInputFormat<T, SPITTYPE> {
    +	protected transient Credentials credentials;
    +
    +	protected HadoopInputFormatCommonBase(Credentials creds) {
    +		this.credentials = creds;
    +	}
    +
    +	protected void write(ObjectOutputStream out) throws IOException {
    +		this.credentials.write(out);
    +	}
    +
    +	public void read(ObjectInputStream in) throws IOException {
    +		this.credentials = new Credentials();
    +		credentials.readFields(in);
    +	}
    +
    --- End diff --
    
    whitespace


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134112685
  
    Okay, that makes sense.
    I'll add some comments to the classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger closed the pull request at:

    https://github.com/apache/flink/pull/1038


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-135343326
  
    I'm manually closing this pull request. It has been merged by @uce.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-133505584
  
    The Bases exist because there is a java-specific and a scals-specific version of each HadoopInputFormat. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-133483445
  
    There might actually be a way of testing against a secured cluster: https://issues.apache.org/jira/browse/HADOOP-9848 / https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-minikdc/src/main/java/org/apache/hadoop/minikdc/MiniKdc.java
    This seems to be available since Hadoop 2.3.0



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134531698
  
    @mxm: I removed the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134584589
  
    I agree. Lets file a JIRA and do it separately, as this is probably a bigger task.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-133479533
  
    I actually think that there is no need for the `HadoopInputFormatBase`s to exist. 
    There are two implementations and two bases for mapred and mapreduce, but they have nothing in common.
    
    There are some tests for the non secure case in `org.apache.flink.test.hadoop`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134991397
  
    I'll address my trivial comment and merge this. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by mxm <gi...@git.apache.org>.
Github user mxm commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-134632899
  
    I've opened another issue for that: https://issues.apache.org/jira/browse/FLINK-2573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-135013779
  
    Thanks alot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1038#issuecomment-133478724
  
    Looks good. The HadoopFormatBase and similar classes could use a line or two more in comments, but otherwise, this seems well.
    
    Any way to test this? There does not seem to be any test for the format wrappers, yet...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---