You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "markap14 (via GitHub)" <gi...@apache.org> on 2023/06/02 17:21:43 UTC

[GitHub] [nifi] markap14 opened a new pull request, #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily

markap14 opened a new pull request, #7334:
URL: https://github.com/apache/nifi/pull/7334

   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with -->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 -->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   [NIFI-00000](https://issues.apache.org/jira/browse/NIFI-00000)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
     - [ ] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] mattyb149 commented on a diff in pull request #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily

Posted by "mattyb149 (via GitHub)" <gi...@apache.org>.
mattyb149 commented on code in PR #7334:
URL: https://github.com/apache/nifi/pull/7334#discussion_r1214647825


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml:
##########
@@ -119,7 +119,7 @@
     <logger name="org.apache.nifi.processors.standard.LogAttribute" level="INFO"/>
     <logger name="org.apache.nifi.processors.standard.LogMessage" level="INFO"/>
     <logger name="org.apache.nifi.controller.repository.StandardProcessSession" level="WARN" />
-
+    <logger name="org.apache.parquet.hadoop.InternalParquetRecordReader" level="WARN" />

Review Comment:
   Can we add a comment here as to why we're changing this level (i.e. highly unnecessary INFO-level logging done by this class)?



##########
nifi-nar-bundles/nifi-parquet-bundle/nifi-parquet-processors/src/main/java/org/apache/nifi/parquet/stream/NifiSeekableInputStream.java:
##########
@@ -29,11 +29,11 @@ public class NifiSeekableInputStream extends DelegatingSeekableInputStream {
     public NifiSeekableInputStream(final ByteCountingInputStream input) {
         super(input);
         this.input = input;
-        this.input.mark(Integer.MAX_VALUE);
+        this.input.mark(8192);

Review Comment:
   Maybe create a constant since it's used more than once



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] mattyb149 closed pull request #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily

Posted by "mattyb149 (via GitHub)" <gi...@apache.org>.
mattyb149 closed pull request #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily
URL: https://github.com/apache/nifi/pull/7334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] mattyb149 commented on a diff in pull request #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily

Posted by "mattyb149 (via GitHub)" <gi...@apache.org>.
mattyb149 commented on code in PR #7334:
URL: https://github.com/apache/nifi/pull/7334#discussion_r1214647825


##########
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-resources/src/main/resources/conf/logback.xml:
##########
@@ -119,7 +119,7 @@
     <logger name="org.apache.nifi.processors.standard.LogAttribute" level="INFO"/>
     <logger name="org.apache.nifi.processors.standard.LogMessage" level="INFO"/>
     <logger name="org.apache.nifi.controller.repository.StandardProcessSession" level="WARN" />
-
+    <logger name="org.apache.parquet.hadoop.InternalParquetRecordReader" level="WARN" />

Review Comment:
   Can we add a comment here as to why we're changing this level (i.e. highly unnecessary INFO-level logging done by this class)?



##########
nifi-nar-bundles/nifi-parquet-bundle/nifi-parquet-processors/src/main/java/org/apache/nifi/parquet/stream/NifiSeekableInputStream.java:
##########
@@ -29,11 +29,11 @@ public class NifiSeekableInputStream extends DelegatingSeekableInputStream {
     public NifiSeekableInputStream(final ByteCountingInputStream input) {
         super(input);
         this.input = input;
-        this.input.mark(Integer.MAX_VALUE);
+        this.input.mark(8192);

Review Comment:
   Maybe create a constant since it's used more than once



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] mattyb149 commented on pull request #7334: NIFI-11636: Do not buffer Parquet content into memory unnecessarily

Posted by "mattyb149 (via GitHub)" <gi...@apache.org>.
mattyb149 commented on PR #7334:
URL: https://github.com/apache/nifi/pull/7334#issuecomment-1574219620

   +1 LGTM, thanks for the fix! Merging to 1.x and main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org