You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Andrew Lim <an...@gmail.com> on 2022/01/28 17:42:26 UTC

Re: Doco, PutSolrContentStream

Hi Dwane,

Thanks for finding and reporting those documentation errors. I filed a Jira [1] to fix those.

It looks like the change to the default value of nifi.provenance.repository.rollover.time was made in 1.12.0 [2]. I will see if we can improve the docs to give more context to why this was done as part of [1].

-Drew


[1] https://issues.apache.org/jira/browse/NIFI-9642 <https://issues.apache.org/jira/browse/NIFI-9642>
[2] https://issues.apache.org/jira/browse/NIFI-7339 <https://issues.apache.org/jira/browse/NIFI-7339>


> On Jan 28, 2022, at 7:56 AM, Nathan Gough <th...@apache.org> wrote:
> 
> Hi Dwane,
> 
> I've created a Jira issue to test and rectify the Solr + ZooKeeper issue: https://issues.apache.org/jira/browse/NIFI-9641 <https://issues.apache.org/jira/browse/NIFI-9641>
> 
> Thanks for the report!
> Nathan
> 
> On Fri, Jan 28, 2022 at 7:22 AM Dwane Hall <dwanehall@hotmail.com <ma...@hotmail.com>> wrote:
> Hey NiFi community I hope all is well with everyone wherever they may be.  I recently updated our NiFi instances from 1.11.4 to 1.15.3 and have made a few observations from this process worth mentioning.
>  
> Some minor documentation inconsistencies
> A couple of the default values appear to have changed in nifi.properties through versions (listed below are the old and new values along with links to the documentation). 
>  
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository <https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository>
> “The FlowFile Repository checkpoint interval. The default value is 2 mins.” [new default value is 20 secs]
> 1.11.4 nifi.flowfile.repository.checkpoint.interval=2 mins
> 11.15.3 nifi.flowfile.repository.checkpoint.interval=20 secs
>  
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties <https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties>
> “The amount of time to wait before rolling over the latest data provenance information so that it is available in the User Interface. The default value is 30 secs.”
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties <https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties>
> “If processing a high volume of events, change nifi.provenance.repository.rollover.time from a default of 30 secs to 1 min and ...” [The new default value is 10 min].
> 1.11.4 nifi.provenance.repository.rollover.time=30 sec
> 1.15.3 nifi.provenance.repository.rollover.time=10 min
> This seems to be a significant change was there any reason for this new default setting I was unable to find documentation referencing the increase?
>  
> PutSolrContentStream processor issues
>  
> Secondly after a successful upgrade I noticed our use of the PutSolrContentStream processor had broken.  Looking through the processor code there was an upgrade to the SolrJ client and a commit in March 2020 (and referenced below) that appears to prevent nested zk chroot paths for SolrCloud connections (i.e. the zookeeper connection string is truncated).
>  
> SolrUtils.java (nifi/SolrUtils.java at master · apache/nifi · GitHub <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java>)
> The commit of intrest regarding the new process for initiating a CloudSolrClient in SolrJ 
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b <https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b>
>  
> For a nested Solr path "/solr/PROD", "/solr/DEV", "/solr/DR" … the string is truncated to the base path only i.e. “/solr” (this is only an issue for nested chroots)
>  
> The code of interest is here in the SolrUtils.java class
>  
> if (SOLR_TYPE_STANDARD.getValue().equals(context.getProperty(SOLR_TYPE).getValue())) {
>             return new HttpSolrClient(solrLocation, httpClient);
>             return new HttpSolrClient.Builder(solrLocation).withHttpClient(httpClient).build();
>         } else {
>             // CloudSolrClient.Builder now requires a List of ZK addresses and znode for solr as separate parameters
>             final String zk[] = solrLocation.split("/");
>             final List zkList = Arrays.asList(zk[0].split(","));
>             String zkRoot = "/";
>             if (zk.length > 1 && ! zk[1].isEmpty()) {
>                 zkRoot += zk[1];
>             }
>   
> 
> <image.png>                                             
>                                                
> I think the issue can be resolved by changing this line of code which should capture the entire nested path and not just the base path at position zk[1] in the String array.
>  
> final String zk[] = solrLocation.split("/");
> To
> final String[] zk = solrLocation.split("/",2);
>  
> 
> Thanks,
> 
> Dwane


Re: Doco, PutSolrContentStream

Posted by Dwane Hall <dw...@hotmail.com>.
Hey Nathan thanks for the quick response I took a look at the pull request and tested the .nar bundle in our development environment. I'm happy to report everything worked as expected i was able to push documents though to our Solr instances on nested zk chroot suffixes successfully. Let me know if you require any other testing, assistance, or confirmation from my end.

Cheers,
Dwane

________________________________
From: Nathan Gough <th...@apache.org>
Sent: Friday, 25 February 2022 5:10 AM
To: dev@nifi.apache.org <de...@nifi.apache.org>
Cc: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Doco, PutSolrContentStream

Hi Dwane,

If possible would you be able to test out the PR here and see if it fixes your issue? https://github.com/apache/nifi/pull/5727

You can checkout the PR and build the code yourself in the nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-nar directory or you can backup your current nifi-solr-nar-1.16.0-SNAPSHOT.nar in the ./nifi/lib directory and download and replace with the one I built here: https://easyupload.io/74ujf4.

Thanks,
Nathan


On Fri, Jan 28, 2022 at 12:42 PM Andrew Lim <an...@gmail.com>> wrote:
Hi Dwane,

Thanks for finding and reporting those documentation errors. I filed a Jira [1] to fix those.

It looks like the change to the default value of nifi.provenance.repository.rollover.time was made in 1.12.0 [2]. I will see if we can improve the docs to give more context to why this was done as part of [1].

-Drew


[1] https://issues.apache.org/jira/browse/NIFI-9642 <https://issues.apache.org/jira/browse/NIFI-9642>
[2] https://issues.apache.org/jira/browse/NIFI-7339 <https://issues.apache.org/jira/browse/NIFI-7339>


> On Jan 28, 2022, at 7:56 AM, Nathan Gough <th...@apache.org>> wrote:
>
> Hi Dwane,
>
> I've created a Jira issue to test and rectify the Solr + ZooKeeper issue: https://issues.apache.org/jira/browse/NIFI-9641 <https://issues.apache.org/jira/browse/NIFI-9641>
>
> Thanks for the report!
> Nathan
>
> On Fri, Jan 28, 2022 at 7:22 AM Dwane Hall <dw...@hotmail.com> <ma...@hotmail.com>>> wrote:
> Hey NiFi community I hope all is well with everyone wherever they may be.  I recently updated our NiFi instances from 1.11.4 to 1.15.3 and have made a few observations from this process worth mentioning.
>
> Some minor documentation inconsistencies
> A couple of the default values appear to have changed in nifi.properties through versions (listed below are the old and new values along with links to the documentation).
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository <https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository>
> “The FlowFile Repository checkpoint interval. The default value is 2 mins.” [new default value is 20 secs]
> 1.11.4 nifi.flowfile.repository.checkpoint.interval=2 mins
> 11.15.3 nifi.flowfile.repository.checkpoint.interval=20 secs
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties <https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties>
> “The amount of time to wait before rolling over the latest data provenance information so that it is available in the User Interface. The default value is 30 secs.”
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties <https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties>
> “If processing a high volume of events, change nifi.provenance.repository.rollover.time from a default of 30 secs to 1 min and ...” [The new default value is 10 min].
> 1.11.4 nifi.provenance.repository.rollover.time=30 sec
> 1.15.3 nifi.provenance.repository.rollover.time=10 min
> This seems to be a significant change was there any reason for this new default setting I was unable to find documentation referencing the increase?
>
> PutSolrContentStream processor issues
>
> Secondly after a successful upgrade I noticed our use of the PutSolrContentStream processor had broken.  Looking through the processor code there was an upgrade to the SolrJ client and a commit in March 2020 (and referenced below) that appears to prevent nested zk chroot paths for SolrCloud connections (i.e. the zookeeper connection string is truncated).
>
> SolrUtils.java (nifi/SolrUtils.java at master · apache/nifi · GitHub <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java>)
> The commit of intrest regarding the new process for initiating a CloudSolrClient in SolrJ
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b <https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b>
>
> For a nested Solr path "/solr/PROD", "/solr/DEV", "/solr/DR" … the string is truncated to the base path only i.e. “/solr” (this is only an issue for nested chroots)
>
> The code of interest is here in the SolrUtils.java class
>
> if (SOLR_TYPE_STANDARD.getValue().equals(context.getProperty(SOLR_TYPE).getValue())) {
>             return new HttpSolrClient(solrLocation, httpClient);
>             return new HttpSolrClient.Builder(solrLocation).withHttpClient(httpClient).build();
>         } else {
>             // CloudSolrClient.Builder now requires a List of ZK addresses and znode for solr as separate parameters
>             final String zk[] = solrLocation.split("/");
>             final List zkList = Arrays.asList(zk[0].split(","));
>             String zkRoot = "/";
>             if (zk.length > 1 && ! zk[1].isEmpty()) {
>                 zkRoot += zk[1];
>             }
>
>
> <image.png>
>
> I think the issue can be resolved by changing this line of code which should capture the entire nested path and not just the base path at position zk[1] in the String array.
>
> final String zk[] = solrLocation.split("/");
> To
> final String[] zk = solrLocation.split("/",2);
>
>
> Thanks,
>
> Dwane


Re: Doco, PutSolrContentStream

Posted by Nathan Gough <th...@apache.org>.
Hi Dwane,

If possible would you be able to test out the PR here and see if it fixes
your issue? https://github.com/apache/nifi/pull/5727

You can checkout the PR and build the code yourself in the
nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-nar directory or you can
backup your current nifi-solr-nar-1.16.0-SNAPSHOT.nar in the ./nifi/lib
directory and download and replace with the one I built here:
https://easyupload.io/74ujf4.

Thanks,
Nathan


On Fri, Jan 28, 2022 at 12:42 PM Andrew Lim <an...@gmail.com>
wrote:

> Hi Dwane,
>
> Thanks for finding and reporting those documentation errors. I filed a
> Jira [1] to fix those.
>
> It looks like the change to the default value of
> nifi.provenance.repository.rollover.time was made in 1.12.0 [2]. I will see
> if we can improve the docs to give more context to why this was done as
> part of [1].
>
> -Drew
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-9642 <
> https://issues.apache.org/jira/browse/NIFI-9642>
> [2] https://issues.apache.org/jira/browse/NIFI-7339 <
> https://issues.apache.org/jira/browse/NIFI-7339>
>
>
> > On Jan 28, 2022, at 7:56 AM, Nathan Gough <th...@apache.org> wrote:
> >
> > Hi Dwane,
> >
> > I've created a Jira issue to test and rectify the Solr + ZooKeeper
> issue: https://issues.apache.org/jira/browse/NIFI-9641 <
> https://issues.apache.org/jira/browse/NIFI-9641>
> >
> > Thanks for the report!
> > Nathan
> >
> > On Fri, Jan 28, 2022 at 7:22 AM Dwane Hall <dwanehall@hotmail.com
> <ma...@hotmail.com>> wrote:
> > Hey NiFi community I hope all is well with everyone wherever they may
> be.  I recently updated our NiFi instances from 1.11.4 to 1.15.3 and have
> made a few observations from this process worth mentioning.
> >
> > Some minor documentation inconsistencies
> > A couple of the default values appear to have changed in nifi.properties
> through versions (listed below are the old and new values along with links
> to the documentation).
> >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository
> <
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository
> >
> > “The FlowFile Repository checkpoint interval. The default value is 2
> mins.” [new default value is 20 secs]
> > 1.11.4 nifi.flowfile.repository.checkpoint.interval=2 mins
> > 11.15.3 nifi.flowfile.repository.checkpoint.interval=20 secs
> >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> <
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >
> > “The amount of time to wait before rolling over the latest data
> provenance information so that it is available in the User Interface. The
> default value is 30 secs.”
> >
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties
> <
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties
> >
> > “If processing a high volume of events, change
> nifi.provenance.repository.rollover.time from a default of 30 secs to 1 min
> and ...” [The new default value is 10 min].
> > 1.11.4 nifi.provenance.repository.rollover.time=30 sec
> > 1.15.3 nifi.provenance.repository.rollover.time=10 min
> > This seems to be a significant change was there any reason for this new
> default setting I was unable to find documentation referencing the increase?
> >
> > PutSolrContentStream processor issues
> >
> > Secondly after a successful upgrade I noticed our use of the
> PutSolrContentStream processor had broken.  Looking through the processor
> code there was an upgrade to the SolrJ client and a commit in March 2020
> (and referenced below) that appears to prevent nested zk chroot paths for
> SolrCloud connections (i.e. the zookeeper connection string is truncated).
> >
> > SolrUtils.java (nifi/SolrUtils.java at master · apache/nifi · GitHub <
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java
> >)
> > The commit of intrest regarding the new process for initiating a
> CloudSolrClient in SolrJ
> >
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b
> <
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b
> >
> >
> > For a nested Solr path "/solr/PROD", "/solr/DEV", "/solr/DR" … the
> string is truncated to the base path only i.e. “/solr” (this is only an
> issue for nested chroots)
> >
> > The code of interest is here in the SolrUtils.java class
> >
> > if
> (SOLR_TYPE_STANDARD.getValue().equals(context.getProperty(SOLR_TYPE).getValue()))
> {
> >             return new HttpSolrClient(solrLocation, httpClient);
> >             return new
> HttpSolrClient.Builder(solrLocation).withHttpClient(httpClient).build();
> >         } else {
> >             // CloudSolrClient.Builder now requires a List of ZK
> addresses and znode for solr as separate parameters
> >             final String zk[] = solrLocation.split("/");
> >             final List zkList = Arrays.asList(zk[0].split(","));
> >             String zkRoot = "/";
> >             if (zk.length > 1 && ! zk[1].isEmpty()) {
> >                 zkRoot += zk[1];
> >             }
> >
> >
> > <image.png>
> >
> > I think the issue can be resolved by changing this line of code which
> should capture the entire nested path and not just the base path at
> position zk[1] in the String array.
> >
> > final String zk[] = solrLocation.split("/");
> > To
> > final String[] zk = solrLocation.split("/",2);
> >
> >
> > Thanks,
> >
> > Dwane
>
>

Re: Doco, PutSolrContentStream

Posted by Nathan Gough <th...@apache.org>.
Hi Dwane,

If possible would you be able to test out the PR here and see if it fixes
your issue? https://github.com/apache/nifi/pull/5727

You can checkout the PR and build the code yourself in the
nifi/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-nar directory or you can
backup your current nifi-solr-nar-1.16.0-SNAPSHOT.nar in the ./nifi/lib
directory and download and replace with the one I built here:
https://easyupload.io/74ujf4.

Thanks,
Nathan


On Fri, Jan 28, 2022 at 12:42 PM Andrew Lim <an...@gmail.com>
wrote:

> Hi Dwane,
>
> Thanks for finding and reporting those documentation errors. I filed a
> Jira [1] to fix those.
>
> It looks like the change to the default value of
> nifi.provenance.repository.rollover.time was made in 1.12.0 [2]. I will see
> if we can improve the docs to give more context to why this was done as
> part of [1].
>
> -Drew
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-9642 <
> https://issues.apache.org/jira/browse/NIFI-9642>
> [2] https://issues.apache.org/jira/browse/NIFI-7339 <
> https://issues.apache.org/jira/browse/NIFI-7339>
>
>
> > On Jan 28, 2022, at 7:56 AM, Nathan Gough <th...@apache.org> wrote:
> >
> > Hi Dwane,
> >
> > I've created a Jira issue to test and rectify the Solr + ZooKeeper
> issue: https://issues.apache.org/jira/browse/NIFI-9641 <
> https://issues.apache.org/jira/browse/NIFI-9641>
> >
> > Thanks for the report!
> > Nathan
> >
> > On Fri, Jan 28, 2022 at 7:22 AM Dwane Hall <dwanehall@hotmail.com
> <ma...@hotmail.com>> wrote:
> > Hey NiFi community I hope all is well with everyone wherever they may
> be.  I recently updated our NiFi instances from 1.11.4 to 1.15.3 and have
> made a few observations from this process worth mentioning.
> >
> > Some minor documentation inconsistencies
> > A couple of the default values appear to have changed in nifi.properties
> through versions (listed below are the old and new values along with links
> to the documentation).
> >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository
> <
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#write-ahead-flowfile-repository
> >
> > “The FlowFile Repository checkpoint interval. The default value is 2
> mins.” [new default value is 20 secs]
> > 1.11.4 nifi.flowfile.repository.checkpoint.interval=2 mins
> > 11.15.3 nifi.flowfile.repository.checkpoint.interval=20 secs
> >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> <
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
> >
> > “The amount of time to wait before rolling over the latest data
> provenance information so that it is available in the User Interface. The
> default value is 30 secs.”
> >
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties
> <
> https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#system-properties
> >
> > “If processing a high volume of events, change
> nifi.provenance.repository.rollover.time from a default of 30 secs to 1 min
> and ...” [The new default value is 10 min].
> > 1.11.4 nifi.provenance.repository.rollover.time=30 sec
> > 1.15.3 nifi.provenance.repository.rollover.time=10 min
> > This seems to be a significant change was there any reason for this new
> default setting I was unable to find documentation referencing the increase?
> >
> > PutSolrContentStream processor issues
> >
> > Secondly after a successful upgrade I noticed our use of the
> PutSolrContentStream processor had broken.  Looking through the processor
> code there was an upgrade to the SolrJ client and a commit in March 2020
> (and referenced below) that appears to prevent nested zk chroot paths for
> SolrCloud connections (i.e. the zookeeper connection string is truncated).
> >
> > SolrUtils.java (nifi/SolrUtils.java at master · apache/nifi · GitHub <
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrUtils.java
> >)
> > The commit of intrest regarding the new process for initiating a
> CloudSolrClient in SolrJ
> >
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b
> <
> https://github.com/apache/nifi/commit/9b4292024be6fae188cb1efa3a07dc9489e9a5b4#diff-13320e5b198f236cea296fb01cb7376755d65c444678e781fa0940c2a28db88b
> >
> >
> > For a nested Solr path "/solr/PROD", "/solr/DEV", "/solr/DR" … the
> string is truncated to the base path only i.e. “/solr” (this is only an
> issue for nested chroots)
> >
> > The code of interest is here in the SolrUtils.java class
> >
> > if
> (SOLR_TYPE_STANDARD.getValue().equals(context.getProperty(SOLR_TYPE).getValue()))
> {
> >             return new HttpSolrClient(solrLocation, httpClient);
> >             return new
> HttpSolrClient.Builder(solrLocation).withHttpClient(httpClient).build();
> >         } else {
> >             // CloudSolrClient.Builder now requires a List of ZK
> addresses and znode for solr as separate parameters
> >             final String zk[] = solrLocation.split("/");
> >             final List zkList = Arrays.asList(zk[0].split(","));
> >             String zkRoot = "/";
> >             if (zk.length > 1 && ! zk[1].isEmpty()) {
> >                 zkRoot += zk[1];
> >             }
> >
> >
> > <image.png>
> >
> > I think the issue can be resolved by changing this line of code which
> should capture the entire nested path and not just the base path at
> position zk[1] in the String array.
> >
> > final String zk[] = solrLocation.split("/");
> > To
> > final String[] zk = solrLocation.split("/",2);
> >
> >
> > Thanks,
> >
> > Dwane
>
>