You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "pgaref (via GitHub)" <gi...@apache.org> on 2023/04/25 03:56:41 UTC

[GitHub] [flink] pgaref opened a new pull request, #22481: [FLINK-27805] bump orc version to 1.7.8

pgaref opened a new pull request, #22481:
URL: https://github.com/apache/flink/pull/22481

   Bump to orc 1.7.8  -- [Release Notes](https://orc.apache.org/news/releases/)
   ORC now supports writers with FSDataOutputStream (instead of just paths previously) so cleaning NoHivePhysicalWriterImpl and PhysicalWriterImpl


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536767645

   BTW, could you cast your +1 with that information on Apache ORC 1.7.9 vote thread?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184577015


##########
flink-formats/flink-orc-nohive/pom.xml:
##########
@@ -90,6 +90,37 @@ under the License.
 			</exclusions>
 		</dependency>
 
+		<dependency>
+			<groupId>org.apache.hadoop</groupId>
+			<artifactId>hadoop-common</artifactId>
+			<scope>provided</scope>
+			<exclusions>
+				<exclusion>
+					<groupId>ch.qos.reload4j</groupId>
+					<artifactId>reload4j</artifactId>
+				</exclusion>
+				<exclusion>
+					<groupId>org.slf4j</groupId>
+					<artifactId>slf4j-reload4j</artifactId>
+				</exclusion>
+			</exclusions>

Review Comment:
   Thats because we dont allow Reload4J dependencies due to their conflict with Log4j -- we use maven-enforcer [rules] for that (https://github.com/apache/flink/blob/85efa13e5d16f7962113d9b905efcceb9b99f8a3/pom.xml#L1773):



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] MartijnVisser commented on pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "MartijnVisser (via GitHub)" <gi...@apache.org>.
MartijnVisser commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1521507651

   @lirui-apache @JingsongLi Does one of you want to review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, its the same functionality -- looks like we need the stream open for snapshotting, that is then cleaned up as part of [snapshotContext.closeExceptionally](https://github.com/apache/flink/blob/f9438dd54fa6896563b152d50b7a4b3c47ad9ebf/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamOperatorStateHandler.java#L265) method
   
   I also replicated the `ClosedChannelException` issue described above when keeping the stream open in the existing [tests](https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296) so I believe we are good here
   
   PS: we also do the same for other formats like [Avro](https://github.com/apache/flink/blob/255d65613ad348f66b1043211f51ce7a56dfeac6/flink-core/src/main/java/org/apache/flink/util/CloseShieldOutputStream.java#L53)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1522510048

   Aslo cc @pnowojski  / @akalash  that might be interested


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, did no go deeper here as we are replicating the existing functionality. 
   I also replicated the `ClosedChannelException` issue described above when keeping the stream open in the existing tests, for example: https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184510094


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/PhysicalWriterImpl.java:
##########
@@ -1,398 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.flink.orc.writer;
-
-import org.apache.flink.annotation.Internal;
-import org.apache.flink.core.fs.FSDataOutputStream;
-
-import com.google.protobuf.CodedOutputStream;
-import org.apache.orc.CompressionCodec;
-import org.apache.orc.CompressionKind;
-import org.apache.orc.OrcFile;
-import org.apache.orc.OrcProto;
-import org.apache.orc.PhysicalWriter;
-import org.apache.orc.impl.HadoopShims;
-import org.apache.orc.impl.OrcCodecPool;
-import org.apache.orc.impl.OutStream;
-import org.apache.orc.impl.StreamName;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.io.OutputStream;
-import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Map;
-import java.util.TreeMap;
-
-import static org.apache.orc.impl.WriterImpl.getEstimatedBufferSize;
-
-/**
- * A slightly customised clone of {@link org.apache.orc.impl.PhysicalFsWriter}.
- *
- * <p>Whereas PhysicalFsWriter implementation works on the basis of a Path, this implementation
- * leverages Flink's {@link FSDataOutputStream} to write the compressed data.
- *
- * <p>NOTE: If the ORC dependency version is updated, this file may have to be updated as well to be
- * in sync with the new version's PhysicalFsWriter.

Review Comment:
   Hey @tzulitai, thats correct the original (removed) `PhysicalWriterImpl` was a copy of the ORC PhysicalWriter with support for FSDataOutputStream. https://github.com/apache/orc/blob/a85b4c8852a894a701ddb73c15fb84ed1035abb9/java/core/src/java/org/apache/orc/impl/PhysicalFsWriter.java
   
   [ORC-1198 ](https://github.com/apache/orc/pull/1153) recently introduced a PhysicalFsWriter constructor with FSDataOutputStream as a parameter and there is no need to internally maintain this anymore 🥳 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] wgtmac commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535588804

   Thanks @dongjoon-hyun and @pgaref. I will keep an eye on it and prepare RC2 if required.
   
   BTW, it would be better to test 1.7.9-RC1 instead of 1.7.9-SNAPSHOT. But I think they have the same content except the version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, its the same functionality -- looks like we need the stream open for snapshotting, that is then cleaned up as part of [snapshotContext.closeExceptionally](https://github.com/apache/flink/blob/f9438dd54fa6896563b152d50b7a4b3c47ad9ebf/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamOperatorStateHandler.java#L265) method
   
   I also replicated the `ClosedChannelException` issue described above when keeping the stream open in the existing [tests](https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296) so I believe we are good here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, its the same functionality -- looks like we need the stream open for snapshotting, that is then cleaned up as part of [snapshotContext.closeExceptionally](https://github.com/apache/flink/blob/f9438dd54fa6896563b152d50b7a4b3c47ad9ebf/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamOperatorStateHandler.java#L265) method
   
   I also replicated the `ClosedChannelException` issue described above when keeping the stream open in the existing [tests](https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296) so I believe we are good here
   
   PS: we also do the same for other formats, e.g., [Avro](https://github.com/apache/flink/blob/255d65613ad348f66b1043211f51ce7a56dfeac6/flink-core/src/main/java/org/apache/flink/util/CloseShieldOutputStream.java#L53)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, its the same functionality -- looks like we need the stream open for snapshotting, that is then cleaned up as part of [snapshotContext.closeExceptionally method](https://github.com/apache/flink/blob/f9438dd54fa6896563b152d50b7a4b3c47ad9ebf/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/StreamOperatorStateHandler.java#L265)
   
   I also replicated the `ClosedChannelException` issue described above when keeping the stream open in the existing [tests](https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296) so I believe we are good here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535069827

   BTW, @pgaref . As you know, @wgtmac started Apache ORC 1.7.9 RC1. Can we test `RC1` in this PR?
   - https://github.com/apache/orc/issues/1437


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184577015


##########
flink-formats/flink-orc-nohive/pom.xml:
##########
@@ -90,6 +90,37 @@ under the License.
 			</exclusions>
 		</dependency>
 
+		<dependency>
+			<groupId>org.apache.hadoop</groupId>
+			<artifactId>hadoop-common</artifactId>
+			<scope>provided</scope>
+			<exclusions>
+				<exclusion>
+					<groupId>ch.qos.reload4j</groupId>
+					<artifactId>reload4j</artifactId>
+				</exclusion>
+				<exclusion>
+					<groupId>org.slf4j</groupId>
+					<artifactId>slf4j-reload4j</artifactId>
+				</exclusion>
+			</exclusions>

Review Comment:
   Thats because we dont allow Reload4J dependencies due to their conflict with Log4j -- we use maven-enforcer [rules] for that (https://github.com/apache/flink/blob/85efa13e5d16f7962113d9b905efcceb9b99f8a3/pom.xml#L1773):
   
   Failed run example:
   ```
   [ERROR] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message:
   Log4j 1 and Reload4J dependencies are not allowed because they conflict with Log4j 2. If the dependency absolutely requires the Log4j 1 API, use 'org.apache.logging.log4j:log4j-1.2-api'.
   Found Banned Dependency: ch.qos.reload4j:reload4j:jar:1.2.18.3
   Found Banned Dependency: org.slf4j:slf4j-reload4j:jar:1.7.36
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1186501105


##########
flink-formats/flink-orc-nohive/src/test/java/org/apache/flink/orc/nohive/OrcColumnarRowSplitReaderNoHiveTest.java:
##########
@@ -105,7 +105,9 @@ protected OrcColumnarRowSplitReader createReader(
             throws IOException {
         return OrcNoHiveSplitReaderUtil.genPartColumnarRowReader(
                 new Configuration(),
-                IntStream.range(0, fullTypes.length).mapToObj(i -> "f" + i).toArray(String[]::new),
+                IntStream.range(0, fullTypes.length)
+                        .mapToObj(i -> "_col" + i)
+                        .toArray(String[]::new),

Review Comment:
   Yes, this is needed for the schema evolution to work properly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536782582

   @tzulitai / @dmvk  can you please take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536767114

   It's great! Thank you so much, @pgaref .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] flinkbot commented on pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "flinkbot (via GitHub)" <gi...@apache.org>.
flinkbot commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1521126055

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "8a31508bb40da5f27e122d679175d8d545e6b38e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8a31508bb40da5f27e122d679175d8d545e6b38e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8a31508bb40da5f27e122d679175d8d545e6b38e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536766494

   FYI @dongjoon-hyun  @wgtmac  we had a [green run for 1.7.9](https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=48695&view=results).
   
   Switching back to 1.7.8 to get this PR merged and I will create a new ticket for the 1.7.9 bump when its ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536768582

   Also, cc @williamhyun once more too..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1536767873

   
   
   
   
   > BTW, could you cast your +1 with that information on Apache ORC 1.7.9 vote thread?
   
   Sure, will do


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dmvk commented on pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "dmvk (via GitHub)" <gi...@apache.org>.
dmvk commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1532746581

   Please add @liujiawinds as a co-author [1] to the commit.
   
   [1] https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1533273430

   Thank you, @pgaref and @dmvk .
   
   cc @williamhyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] tzulitai commented on a diff in pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "tzulitai (via GitHub)" <gi...@apache.org>.
tzulitai commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184398560


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   I have the same question - although it looks like the original customized `PhysicalWriterImpl` does the same thing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535056878

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dmvk commented on a diff in pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "dmvk (via GitHub)" <gi...@apache.org>.
dmvk commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1183474482


##########
flink-formats/flink-orc-nohive/pom.xml:
##########
@@ -90,6 +90,37 @@ under the License.
 			</exclusions>
 		</dependency>
 
+		<dependency>
+			<groupId>org.apache.hadoop</groupId>
+			<artifactId>hadoop-common</artifactId>
+			<scope>provided</scope>
+			<exclusions>
+				<exclusion>
+					<groupId>ch.qos.reload4j</groupId>
+					<artifactId>reload4j</artifactId>
+				</exclusion>
+				<exclusion>
+					<groupId>org.slf4j</groupId>
+					<artifactId>slf4j-reload4j</artifactId>
+				</exclusion>
+			</exclusions>

Review Comment:
   Why do we need these excludes?



##########
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/util/OrcBulkWriterTestUtil.java:
##########
@@ -43,50 +43,51 @@ public class OrcBulkWriterTestUtil {
     public static final String USER_METADATA_KEY = "userKey";
     public static final ByteBuffer USER_METADATA_VALUE = ByteBuffer.wrap("hello".getBytes());
 
-    public static void validate(File files, List<Record> expected) throws IOException {
-        final File[] buckets = files.listFiles();
-        assertThat(buckets).isNotNull();
-        assertThat(buckets).hasSize(1);
+    public static void validate(File files, List<Record> expected, CompressionKind compressionKind)
+            throws IOException {
+        assertThat(files).isNotNull();
+        assertThat(files.exists()).isTrue();
 
+        assertThat(expected).isNotNull();
+        assertThat(expected).isNotEmpty();
+        validateBucketAndFileSize(files, 1);
+
+        final File[] buckets = files.listFiles();
         final File[] partFiles = buckets[0].listFiles();
-        assertThat(partFiles).isNotNull();
 
         for (File partFile : partFiles) {
-            assertThat(partFile.length()).isGreaterThan(0);
+            final Reader reader = getOrcReader(partFile);

Review Comment:
   ```suggestion
               try (final Reader reader = getOrcReader(partFile)) {
   ```



##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   is this tested in any way? how do we close files then to avoid leaking resources?



##########
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/util/OrcBulkWriterTestUtil.java:
##########
@@ -95,4 +96,21 @@ private static List<Record> getResults(Reader reader) throws IOException {
 
         return results;
     }
+
+    private static void validateBucketAndFileSize(File outputDir, int bucketCount) {
+        final File[] buckets = outputDir.listFiles();
+        assertThat(buckets).isNotNull();
+        assertThat(buckets.length).isEqualTo(bucketCount);
+
+        final File[] partFiles = buckets[0].listFiles();
+        assertThat(partFiles.length).isNotNull();
+    }
+
+    private static Reader getOrcReader(File orcFile) throws IOException {

Review Comment:
   nit
   ```suggestion
       private static Reader createOrcReader(File orcFile) throws IOException {
   ```



##########
flink-formats/flink-orc-nohive/src/test/java/org/apache/flink/orc/nohive/OrcColumnarRowSplitReaderNoHiveTest.java:
##########
@@ -105,7 +105,9 @@ protected OrcColumnarRowSplitReader createReader(
             throws IOException {
         return OrcNoHiveSplitReaderUtil.genPartColumnarRowReader(
                 new Configuration(),
-                IntStream.range(0, fullTypes.length).mapToObj(i -> "f" + i).toArray(String[]::new),
+                IntStream.range(0, fullTypes.length)
+                        .mapToObj(i -> "_col" + i)
+                        .toArray(String[]::new),

Review Comment:
   Why do we need this change? Does the test fail without it?



##########
flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcColumnarRowSplitReaderTest.java:
##########
@@ -154,11 +154,11 @@ void testReadFileWithSelectFields() throws IOException {
         long totalF0 = 0;
 
         Map<String, Object> partSpec = new HashMap<>();
-        partSpec.put("f1", 1);
-        partSpec.put("f3", 3L);
-        partSpec.put("f5", "f5");
-        partSpec.put("f8", BigDecimal.valueOf(5.333));
-        partSpec.put("f13", "f13");
+        partSpec.put("_col1", 1);

Review Comment:
   Do we need this change? There seem to be a lot of these renames that generate unnecessary noise ("hiding what's important in the PR").
   
   Consider removing them or at least moving them into a separate commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on a diff in pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184521758


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/NoCloseFSDataOutputStream.java:
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.orc.writer;
+
+import org.apache.flink.annotation.Internal;
+
+import org.apache.hadoop.fs.FSDataOutputStream;
+
+import java.io.IOException;
+import java.io.OutputStream;
+
+/**
+ * A slightly customised clone of {@link org.apache.hadoop.fs.FSDataOutputStream}.
+ *
+ * <p>This implementation does not close the underlying flink stream to avoid exceptions when
+ * checkpointing.
+ */
+@Internal
+public class NoCloseFSDataOutputStream extends FSDataOutputStream {
+
+    public NoCloseFSDataOutputStream(OutputStream out) throws IOException {
+        super(out);
+    }
+
+    @Override
+    public void close() throws IOException {
+        // Don't close the internal stream here to avoid
+        // Stream Closed or ClosedChannelException when Flink performs checkpoint.

Review Comment:
   Exactly, did no go deeper here as we are replicating the existing functionality. 
   I also observed the `ClosedChannelException` issue when keeping the stream open in the existing tests, for example: https://github.com/apache/flink/blob/master/flink-formats/flink-orc/src/test/java/org/apache/flink/orc/OrcFileSystemITCase.java#L296 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] dongjoon-hyun commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535071204

   If we need some patches in order to help this PR, Apache ORC community can cut RC2 accordingly to help Apache Flink community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] tzulitai commented on a diff in pull request #22481: [FLINK-27805] bump orc version to 1.7.8

Posted by "tzulitai (via GitHub)" <gi...@apache.org>.
tzulitai commented on code in PR #22481:
URL: https://github.com/apache/flink/pull/22481#discussion_r1184399791


##########
flink-formats/flink-orc/src/main/java/org/apache/flink/orc/writer/PhysicalWriterImpl.java:
##########
@@ -1,398 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.flink.orc.writer;
-
-import org.apache.flink.annotation.Internal;
-import org.apache.flink.core.fs.FSDataOutputStream;
-
-import com.google.protobuf.CodedOutputStream;
-import org.apache.orc.CompressionCodec;
-import org.apache.orc.CompressionKind;
-import org.apache.orc.OrcFile;
-import org.apache.orc.OrcProto;
-import org.apache.orc.PhysicalWriter;
-import org.apache.orc.impl.HadoopShims;
-import org.apache.orc.impl.OrcCodecPool;
-import org.apache.orc.impl.OutStream;
-import org.apache.orc.impl.StreamName;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.io.IOException;
-import java.io.OutputStream;
-import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.List;
-import java.util.Map;
-import java.util.TreeMap;
-
-import static org.apache.orc.impl.WriterImpl.getEstimatedBufferSize;
-
-/**
- * A slightly customised clone of {@link org.apache.orc.impl.PhysicalFsWriter}.
- *
- * <p>Whereas PhysicalFsWriter implementation works on the basis of a Path, this implementation
- * leverages Flink's {@link FSDataOutputStream} to write the compressed data.
- *
- * <p>NOTE: If the ORC dependency version is updated, this file may have to be updated as well to be
- * in sync with the new version's PhysicalFsWriter.

Review Comment:
   @pgaref quick question before I dive deeper:
   
   I assume that the new `PhysicalFsWriter` using provided `FSDataOutputStream` has the exact same functionality as what was implemented with the original custom `PhysicalWriterImpl` and `NoHivePhysicalWriterImpl`? I did not do a line-by-line cross check, but for example, this Javadoc in the original `PhysicalWriterImpl` has me wondering.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink] pgaref commented on pull request #22481: [FLINK-27805][Connectors/ORC] bump orc version to 1.7.8

Posted by "pgaref (via GitHub)" <gi...@apache.org>.
pgaref commented on PR #22481:
URL: https://github.com/apache/flink/pull/22481#issuecomment-1535349716

   hey @dongjoon-hyun  -- thanks for keeping an eye!
   Just triggered a run on[ 1.7.9-SNAPSHOT](https://github.com/apache/flink/pull/22481/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R184) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org