You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/05/03 22:04:05 UTC

[GitHub] [nifi] Brzhk opened a new pull request, #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Brzhk opened a new pull request, #6012:
URL: https://github.com/apache/nifi/pull/6012

   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with -->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 -->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   _[NIFI-9966](https://issues.apache.org/jira/browse/NIFI-9966) Fixes the registry ability to load flowfiles exceeding 50MB
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-9966`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-0000`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
     - [x] JDK 8
     - [ ] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [x] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [x] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [x] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] Brzhk commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
Brzhk commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r865957279


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        final ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();
+        return IOUtils.toByteArray(objStream);

Review Comment:
   Here is the hard question, and to answer it, either you have a good friends at IBM, or time to read the JGIT code, because i am still looking for a user-docoumentation. And , in a few lines, while it's difficult to find a simple answer, the open() function that you used is, in fact, a one-shot, quick access to files function. the variant using the object reader will actually take care of the plumbing involved.  
   
   There's  A LOT to discuss if you need actual proof, gointg throught the stacktrace (i forgot to provide) has been very, very tedious to put into the right context,  since jgit has a lot of overriden  function  & processing, so here's a good example of red flags.  
   
   WE could go over there work If need be, [ But honorable members of the Jury, i have here a written confession!](https://github.com/dpursehouse/jgit/blob/8774f541904ca9afba1786b4da14c1aedf4dda78/org.eclipse.jgit/src/org/eclipse/jgit/lib/ObjectDatabase.java#L90)
   
   And the Stacktrace:
   ```2022-03-23 11:17:46,813 ERROR [NiFi Registry Web Server-16] o.a.n.r.web.mapper.ThrowableMapper An unexpected error has occurred: org.eclipse.jgit.errors.LargeObjectException: 5f577bf973b0ed8364de3a86e8d16ac0d9cc2228 exceeds size limit. Returning Internal Server Error response.
   org.eclipse.jgit.errors.LargeObjectException: 5f577bf973b0ed8364de3a86e8d16ac0d9cc2228 exceeds size limit
   at org.eclipse.jgit.internal.storage.file.UnpackedObject$LargeObject.getCachedBytes(UnpackedObject.java:365)  --> https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/f32b8612433e499090c76ded014dd5e94322b786/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java
   at org.eclipse.jgit.lib.ObjectLoader.getBytes(ObjectLoader.java:74)  --> https://git.eclipse.org/r/plugins/gitiles/jgit/jgit/+/f32b8612433e499090c76ded014dd5e94322b786/org.eclipse.jgit/src/org/eclipse/jgit/lib/ObjectLoader.java
   at org.apache.nifi.registry.provider.flow.git.GitFlowMetaData.getContent(GitFlowMetaData.java:525) --> https://github.com/apache/nifi/blob/5928d2048ee9e9f8b4572832a12816bde5419e23/nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java#L523
   at org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider.getFlowContent(GitFlowPersistenceProvider.java:205)
   at org.apache.nifi.registry.service.RegistryService.getVersionedFlowSnapshot(RegistryService.java:686)
   at org.apache.nifi.registry.service.RegistryService.getFlowSnapshot(RegistryService.java:674)
   at org.apache.nifi.registry.web.service.StandardServiceFacade.getFlowSnapshot(StandardServiceFacade.java:324)
   at org.apache.nifi.registry.web.service.StandardServiceFacade$$FastClassBySpringCGLIB$$8b3bf0a8.invoke(<generated>)
   at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
   at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
   at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
   at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:366)
   at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
   at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
   at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
   at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691)
   at org.apache.nifi.registry.web.service.StandardServiceFacade$$EnhancerBySpringCGLIB$$968ab588.getFlowSnapshot(<generated>)
   at org.apache.nifi.registry.web.api.BucketFlowResource.getFlowVersion(BucketFlowResource.java:419)
   at jdk.internal.reflect.GeneratedMethodAccessor264.invoke(Unknown Source)
   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
   at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
   at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
   at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
   at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
   at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
   at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
   at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
   at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
   at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
   at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
   at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
   at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
   at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
   at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
   at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232)
   at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
   at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:392)
   at org.glassfish.jersey.servlet.ServletContainer.serviceImpl(ServletContainer.java:385)
   at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:560)
   at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:501)
   at org.glassfish.jersey.servlet.ServletContainer.doFilter(ServletContainer.java:438)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
   at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
   at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:320)
   at org.apache.nifi.registry.web.security.authorization.ResourceAuthorizationFilter.forwardRequestWithoutAuthorizationCheck(ResourceAuthorizationFilter.java:151)
   at org.apache.nifi.registry.web.security.authorization.ResourceAuthorizationFilter.doFilter(ResourceAuthorizationFilter.java:113)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:126)
   at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:118)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.apache.nifi.registry.web.security.authentication.IdentityFilter.doFilter(IdentityFilter.java:70)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.apache.nifi.registry.web.security.authentication.IdentityFilter.doFilter(IdentityFilter.java:91)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:92)
   at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:77)
   at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
   at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
   at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
   at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
   at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:358)
   at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:271)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
   at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:128)
   at org.springframework.boot.web.servlet.support.ErrorPageFilter.access$000(ErrorPageFilter.java:66)
   at org.springframework.boot.web.servlet.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:103)
   at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
   at org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:121)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
   at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
   at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
   at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
   at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
   at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
   at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
   at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
   at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
   at org.eclipse.jetty.server.Server.handle(Server.java:505)
   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
   at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
   at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:427)
   at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:321)
   at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:159)
   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
   at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:781)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:917)
   at java.base/java.lang.Thread.run(Thread.java:834)```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] asfgit closed pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git
URL: https://github.com/apache/nifi/pull/6012


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] Brzhk commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
Brzhk commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r864971579


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();

Review Comment:
   Hmmm... May there be another validation to do? you marked this as resolved, but you still show up as pending review...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] Brzhk commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
Brzhk commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r864707635


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();

Review Comment:
   yes it could, Thank you. Well. I'm headed back to the contributor guide, to see what the git flow is on a PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] ChrisSamo632 commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
ChrisSamo632 commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r864699394


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();

Review Comment:
   `objectStream` could be declared `final` (to stay consistent with the line above)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] bbende commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
bbende commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r865896339


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        final ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();
+        return IOUtils.toByteArray(objStream);

Review Comment:
   Since both of these are still loading the whole content to a byte[], what is the actual difference with the previous `getBytes()` call vs. using `IOUtils.toByteArray` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] ChrisSamo632 commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
ChrisSamo632 commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r864975618


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();

Review Comment:
   Correct, I'm happy that this comment has been addressed, however I've not yet (had time to) completed a full review - ideally I'd like to build this locally from your branch and try the export/import with a large file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] Brzhk commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
Brzhk commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r866071944


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        final ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();
+        return IOUtils.toByteArray(objStream);

Review Comment:
   So, about that. There is a limitation, either on calling knowing the type, because the ObjectLoader is of a subclass (small or large) that is created following rules that, i believe are not perfectly applied everytime, just as you stumbled upon.  My rationale for my first 11PR, was to keep a code impact to a minimum, and so i kept all of that concern away as well.
   
   However, it is a very good point that we havent tested, or i missed that part in the build, for performance impact.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] bbende commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
bbende commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r865970235


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        final ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();
+        return IOUtils.toByteArray(objStream);

Review Comment:
   Thanks for the info! I saw this comment in the code that you linked to
   ```
   @return true if this object is too large to obtain as a byte array.
   	 *         Objects over a certain threshold should be accessed only by their
   	 *         {@link #openStream()} to prevent overflowing the JVM heap.
   ```
   So seems like that is the right approach. We could possibly call `isLarge()` to determine which way to load the bytes, but I'm not sure if there is a real advantage to that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] Brzhk commented on a diff in pull request #6012: NIFI-9966 Corrects the registry loading of large flowfiles from git

Posted by GitBox <gi...@apache.org>.
Brzhk commented on code in PR #6012:
URL: https://github.com/apache/nifi/pull/6012#discussion_r866071944


##########
nifi-registry/nifi-registry-core/nifi-registry-framework/src/main/java/org/apache/nifi/registry/provider/flow/git/GitFlowMetaData.java:
##########
@@ -522,7 +524,8 @@ void commit(String author, String message, Bucket bucket, Flow.FlowPointer flowP
 
     byte[] getContent(String objectId) throws IOException {
         final ObjectId flowSnapshotObjectId = gitRepo.resolve(objectId);
-        return gitRepo.newObjectReader().open(flowSnapshotObjectId).getBytes();
+        final ObjectStream objStream = gitRepo.newObjectReader().open(flowSnapshotObjectId).openStream();
+        return IOUtils.toByteArray(objStream);

Review Comment:
   So, about that. There is a limitation, either on calling knowing the type, because the ObjectLoader is of a subclass (small or large) that is created following rules that, i believe are not perfectly applied everytime, just as you stumble upon. 
   
   
   As of now, we seem to be the very first whom got impacted. However, we have very feww pipelines indeed, so we are currently blind. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org