You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/10/19 14:58:50 UTC

[GitHub] [flink] rkhachatryan commented on a change in pull request #13648: [FLINK-19632] Introduce a new ResultPartitionType for Approximate Local Recovery

rkhachatryan commented on a change in pull request #13648:
URL: https://github.com/apache/flink/pull/13648#discussion_r507745730



##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartitionType.java
##########
@@ -71,7 +71,17 @@
 	 * <p>For batch jobs, it will be best to keep this unlimited ({@link #PIPELINED}) since there are
 	 * no checkpoint barriers.
 	 */
-	PIPELINED_BOUNDED(true, true, true, false);
+	PIPELINED_BOUNDED(true, true, true, false),
+
+	/**
+	 * Pipelined partitions with a bounded (local) buffer pool to support downstream task to
+	 * continue consuming data after reconnection in Approximate Local-Recovery.
+	 *
+	 * <p>Pipelined results can be consumed only once by a single consumer at one time.
+	 * {@link #PIPELINED_APPROXIMATE} is different from {@link #PIPELINED_BOUNDED} in that
+	 * {@link #PIPELINED_APPROXIMATE} is not decomposed automatically after consumption.
+	 */
+	PIPELINED_APPROXIMATE(true, true, true, true);

Review comment:
       Can you please explain why this partition type is bounded?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",

Review comment:
       I think this message should mention that a new view is being created.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",
+					parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+				releaseView();
+			}
+
+			LOG.debug("{}: Creating read view for subpartition {} of partition {}.",
+				parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+
+			readView = new PipelinedApproximateSubpartitionView(this, availabilityListener);
+		}
+
+		return readView;
+	}
+
+	@Override
+	Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+		if (isPartialBuffer) {
+			isPartialBuffer = !buffer.cleanupPartialRecord();
+		}
+
+		return buffer.build();
+	}
+
+	void releaseView() {
+		LOG.info("Releasing view of subpartition {} of {}.", getSubPartitionIndex(), parent.getPartitionId());
+		readView = null;

Review comment:
       The writes in this method should be done under a lock, right?
   But I'm not sure that all execution paths do acquire this lock.
   Should we add `synchronized (buffers)` or `checkState(Thread.holdsLock)`?
   

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",
+					parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+				releaseView();
+			}
+
+			LOG.debug("{}: Creating read view for subpartition {} of partition {}.",
+				parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+
+			readView = new PipelinedApproximateSubpartitionView(this, availabilityListener);
+		}
+
+		return readView;
+	}
+
+	@Override
+	Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+		if (isPartialBuffer) {
+			isPartialBuffer = !buffer.cleanupPartialRecord();
+		}
+
+		return buffer.build();
+	}
+
+	void releaseView() {
+		LOG.info("Releasing view of subpartition {} of {}.", getSubPartitionIndex(), parent.getPartitionId());
+		readView = null;
+		isPartialBuffer = true;
+		isBlockedByCheckpoint = false;
+		sequenceNumber = 0;
+	}
+
+	@Override
+	public String toString() {

Review comment:
       I couldn't find any differences from `super.toString` other than class name.
   Can we just replace in super `"PipelinedSubpartition` with `getSiimpleClassName` instead of overriding?
   
   ditto: view

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",
+					parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+				releaseView();
+			}
+
+			LOG.debug("{}: Creating read view for subpartition {} of partition {}.",
+				parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+
+			readView = new PipelinedApproximateSubpartitionView(this, availabilityListener);
+		}
+
+		return readView;
+	}
+
+	@Override
+	Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+		if (isPartialBuffer) {
+			isPartialBuffer = !buffer.cleanupPartialRecord();
+		}
+
+		return buffer.build();

Review comment:
       nit: `super.buildSliceBuffer` ?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartitionView.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * View over a pipelined in-memory only subpartition allowing reconnecting.
+ */
+public class PipelinedApproximateSubpartitionView extends PipelinedSubpartitionView {
+
+	PipelinedApproximateSubpartitionView(PipelinedApproximateSubpartition parent, BufferAvailabilityListener listener) {
+		super(parent, listener);
+	}
+
+	@Override
+	public void releaseAllResources() {

Review comment:
       I think this method is called not only upon downstream RPC, but also on task shutdown and other cases.
   If so, completely skipping of `super.releaseAllResources` can lead to resource leaks in those cases.
   WDYT?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedSubpartition.java
##########
@@ -259,9 +260,10 @@ BufferAndBacklog pollBuffer() {
 			}
 
 			while (!buffers.isEmpty()) {
-				BufferConsumer bufferConsumer = buffers.peek().getBufferConsumer();
+				BufferConsumerWithPartialRecordLength bufferConsumerWithPartialRecordLength = buffers.peek();
+				BufferConsumer bufferConsumer = requireNonNull(bufferConsumerWithPartialRecordLength).getBufferConsumer();

Review comment:
       I think there is no point in adding explicit `requireNonNull` just before dereferencing it.

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",
+					parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+				releaseView();
+			}
+
+			LOG.debug("{}: Creating read view for subpartition {} of partition {}.",
+				parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+
+			readView = new PipelinedApproximateSubpartitionView(this, availabilityListener);
+		}
+
+		return readView;
+	}
+
+	@Override
+	Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+		if (isPartialBuffer) {
+			isPartialBuffer = !buffer.cleanupPartialRecord();
+		}
+
+		return buffer.build();
+	}
+
+	void releaseView() {
+		LOG.info("Releasing view of subpartition {} of {}.", getSubPartitionIndex(), parent.getPartitionId());
+		readView = null;

Review comment:
       I'm concerned about a potential race condition here (even with `synchronized` added).
   
   Consider a case:
   Thread1: call `subpartition.createReadView()` - create `view1`
   Thread2: obtain a reference to `view1`
   Thread1: call `subpartition.createReadView()` - create `view2`
   Thread2: call `view1.releaseAllResources` <-- nulls out subpartition.readView; `view2` is now corrupt?
   
   WDYT?

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/PipelinedApproximateSubpartition.java
##########
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferConsumerWithPartialRecordLength;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * A pipelined in-memory only subpartition, which allows to reconnecting after failure.
+ * Only one view is allowed at a time to read teh subpartition.
+ */
+public class PipelinedApproximateSubpartition extends PipelinedSubpartition {
+
+	private static final Logger LOG = LoggerFactory.getLogger(PipelinedApproximateSubpartition.class);
+
+	private boolean isPartialBuffer = false;
+
+	PipelinedApproximateSubpartition(int index, ResultPartition parent) {
+		super(index, parent);
+	}
+
+	@Override
+	public PipelinedSubpartitionView createReadView(BufferAvailabilityListener availabilityListener) {
+		synchronized (buffers) {
+			checkState(!isReleased);
+
+			// if the view is not released yet
+			if (readView != null) {
+				LOG.info("{} ReadView for Subpartition {} of {} has not been released!",
+					parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+				releaseView();
+			}
+
+			LOG.debug("{}: Creating read view for subpartition {} of partition {}.",
+				parent.getOwningTaskName(), getSubPartitionIndex(), parent.getPartitionId());
+
+			readView = new PipelinedApproximateSubpartitionView(this, availabilityListener);
+		}
+
+		return readView;
+	}
+
+	@Override
+	Buffer buildSliceBuffer(BufferConsumerWithPartialRecordLength buffer) {
+		if (isPartialBuffer) {
+			isPartialBuffer = !buffer.cleanupPartialRecord();
+		}
+
+		return buffer.build();
+	}
+
+	void releaseView() {
+		LOG.info("Releasing view of subpartition {} of {}.", getSubPartitionIndex(), parent.getPartitionId());
+		readView = null;
+		isPartialBuffer = true;

Review comment:
       The name `isPartialBuffer` is a bit misleading to me because it implies that partial buffer was emitted.
   But in fact, this field reflects that the view was released.
   How about `isPartialBufferCleanupRequired`?
   

##########
File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/ResultPartitionFactory.java
##########
@@ -130,8 +132,15 @@ public ResultPartition create(
 				bufferCompressor,
 				bufferPoolFactory);
 
+			BiFunction<Integer, PipelinedResultPartition, PipelinedSubpartition> factory;
+			if (type == ResultPartitionType.PIPELINED_APPROXIMATE) {
+				factory = PipelinedApproximateSubpartition::new;
+			} else {
+				factory = PipelinedSubpartition::new;
+			}
+

Review comment:
       nit: I'd prefer this simple ternary if in a loop:
   ```
   for (int i = 0; i < subpartitions.length; i++) {
       subpartitions[i] = type == ResultPartitionType.PIPELINED_APPROXIMATE ?
           new PipelinedApproximateSubpartition(i, pipelinedPartition) :
           new PipelinedSubpartition(i, pipelinedPartition);
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org