You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Kumiko Yada <Ku...@ds-iq.com> on 2016/09/09 17:05:46 UTC

Upgrade 0.7.0 to 1.0.0

Hello,

We upgraded from 0.7.0 to 1.0.0, and we encountered some issues.  We think that this upgrade brought a change to how the site-to-site calls are made for the remote process groups and is causing us to hit a max requests limit that was also introduced with v1.0. Is this a known issue?

The upgrade changes the serialization version of the provenance files that is not backwards compatibility (nifi-1.0 -> 1-9, nifi-0.7 -> 1-8).  Is there any workaround this?

Thanks
Kumiko

Re: Upgrade 0.7.0 to 1.0.0

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Chien,

Thanks for providing the additional configuration info.

> nifi.remote.input.host and nifi.remote.input.socket.port is different for each node. If the remote process group url doesn't have to point to the ncm anymore, is it recommended to use a vip or dns rr alias?

No, vip or dns rr are not required. The URL specified as RPG's target
URL is used for retrieving the remote NiFi cluster (or could be a
standalone) information. Once remote NiFi peer information is
retrieved, the Site-to-Site client uses a weighted round robin
mechanism based on the number of queued flow-files in remote ports to
distribute actual data transfer across cluster.

When the remote node specified by the RPG's address goes down, if a
client has already established Site-to-Site communication and knows
other nodes in the remote cluster, it can keep updating the remote
cluster topology by talking to one of those nodes.
A client also caches the remote peers info and even persist it to a
local file so that it can be loaded when the client node restarts.

Currently, if 20 S2S client instances connect to 20 nodes NiFi
cluster, it's possible that the cluster coordinator node may receive
20 requests from 20 client nodes concurrently.
When a client sends request to /nifi-api/site-to-site, the request is
replicated from the cluster coordinator node to all of 20 nodes then
the response is merged to complete the request, it means each request
can take longer to complete by aggregating the response, and it can
increase the number of concurrent requests.

If there are multiple RPGs targeting to the same remote NiFi cluster
in a data flow, number of requests are also multiplied. Those should
be consolidated to a single RPG if there's such.

Glad to know that this issue doesn't happen anymore, but when it does,
please take a thread dump and share it with us.

Thanks,
Koji

On Tue, Sep 13, 2016 at 4:59 AM, Chien Le <Ch...@ds-iq.com> wrote:
> Hi Koji,
>
> The remote process group is configured as follows:
> URL - http://host02.corp:10000/nifi
> Transport Protocol - RAW
> Communications Timeout - 30 sec
> Yield Duration - 10 sec
>
> Proxy settings are all blank.
>
> nifi.remote properties:
> # Site to Site properties
> nifi.remote.input.host=host02.corp
> nifi.remote.input.http.enabled=true
> nifi.remote.input.http.transaction.ttl=30 sec
> nifi.remote.input.secure=false
> nifi.remote.input.socket.port=12000
>
> nifi.remote.input.host and nifi.remote.input.socket.port is different for each node. If the remote process group url doesn't have to point to the ncm anymore, is it recommended to use a vip or dns rr alias?
>
> We have 20 instances in this cluster and the remote process group is actually the same cluster so it's redirecting to itself in an attempt to re-distribute work across the cluster from primary node processors.
>
> The good news though is that I can't seem to replicate the problem anymore but this was after some revert attempts that was then undone due to the serialization changes (luckily this is in our dev env). I'm going to have to chalk it up to some misconfiguration for now.
>
> Thanks,
> Chien
>
> -----Original Message-----
> From: Koji Kawamura [mailto:ijokarumawak@apache.org]
> Sent: Sunday, September 11, 2016 11:37 PM
> To: users@nifi.apache.org
> Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>; Ki Kang <Ki...@ds-iq.com>
> Subject: Re: Upgrade 0.7.0 to 1.0.0
>
> Hello Kumiko,
>
> Sorry to hear that you're having issues with upgrading to 1.0.0.
>
> 1) Any NiFi API URL of a node in the remote cluster should work for
> the target URL. If there're node1, 2, 3 in a remote cluster then
> "http://node<1, 2 or 3>:<port>/nifi" should work. Every node is
> capable to handle web requests with NiFi 1.0 Zero Master Clustering.
>
> 2) Having 100 outstanding (not finished, under being replicated
> requests) seems there're some issues. How many Site-to-Site clients do
> you have? Normally, each client sends request to
> /nifi-api/site-to-site once per 10 minutes to refresh cached remote
> site info. Actual data transfer is done with different endpoint.
>
> Would you share the Remote Process Group configs such as Transport
> Protocol and Proxy settings. Also please share thread dump. It can be
> generated by executing "bin/nifi.sh dump" command, then thread dump is
> logged to "logs/nifi-bootstrap.log".
>
> Thanks,
> Koji
>
> On Sat, Sep 10, 2016 at 7:45 AM, Kumiko Yada <Ku...@ds-iq.com> wrote:
>> These are two specific questions that we have.
>>
>>
>>
>> 1)      In Nifi Summary/Remove Process Group, what value need to be used for
>> the Target URI.  We were able to use the specific URI in Nifi 0.7.0 because
>> we had to specified the master node; however, Nifi 1.0.0 uses zero master
>> clustering and automatically assign the master.
>>
>> 2)      We are getting the error “2016-09-08 16:00:01,246 ERROR [NiFi Web
>> Server-6153] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Cannot replicate
>> request GET /nifi-api/site-to-site because there are 100 outstanding HTTP
>> Requests already. Request Counts Per URI = {/nifi-api/site-to-site=99,
>> /nifi-api/flow/current-user=1}.
>>
>> In
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster/src/main/java/org/apache/nifi/cluster/coordination/http/replication/ThreadPoolRequestReplicator.java,
>> the limit is hardcoded (MAX_CONCURRENT_REQUESTS = 100) in the
>> ThreadPoolRequestReplicator class, which is completely new with version 1.0.
>> Are we getting this error because we hit this hard coded value 100?  If so,
>> what is the best workaround, would the increase this value a good idea?
>>
>>
>>
>> Thanks
>>
>> Kumiko
>>
>>
>>
>> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
>> Sent: Friday, September 9, 2016 10:06 AM
>> To: users@nifi.apache.org
>> Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven
>> <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>
>> Subject: Upgrade 0.7.0 to 1.0.0
>>
>>
>>
>> Hello,
>>
>>
>>
>> We upgraded from 0.7.0 to 1.0.0, and we encountered some issues.  We think
>> that this upgrade brought a change to how the site-to-site calls are made
>> for the remote process groups and is causing us to hit a max requests limit
>> that was also introduced with v1.0. Is this a known issue?
>>
>>
>>
>> The upgrade changes the serialization version of the provenance files that
>> is not backwards compatibility (nifi-1.0 -> 1-9, nifi-0.7 -> 1-8).  Is there
>> any workaround this?
>>
>>
>>
>> Thanks
>>
>> Kumiko

RE: Upgrade 0.7.0 to 1.0.0

Posted by Chien Le <Ch...@ds-iq.com>.
Hi Koji,

The remote process group is configured as follows:
URL - http://host02.corp:10000/nifi
Transport Protocol - RAW
Communications Timeout - 30 sec
Yield Duration - 10 sec

Proxy settings are all blank. 

nifi.remote properties:
# Site to Site properties
nifi.remote.input.host=host02.corp
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.input.secure=false
nifi.remote.input.socket.port=12000

nifi.remote.input.host and nifi.remote.input.socket.port is different for each node. If the remote process group url doesn't have to point to the ncm anymore, is it recommended to use a vip or dns rr alias?

We have 20 instances in this cluster and the remote process group is actually the same cluster so it's redirecting to itself in an attempt to re-distribute work across the cluster from primary node processors.

The good news though is that I can't seem to replicate the problem anymore but this was after some revert attempts that was then undone due to the serialization changes (luckily this is in our dev env). I'm going to have to chalk it up to some misconfiguration for now.

Thanks,
Chien

-----Original Message-----
From: Koji Kawamura [mailto:ijokarumawak@apache.org] 
Sent: Sunday, September 11, 2016 11:37 PM
To: users@nifi.apache.org
Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>; Ki Kang <Ki...@ds-iq.com>
Subject: Re: Upgrade 0.7.0 to 1.0.0

Hello Kumiko,

Sorry to hear that you're having issues with upgrading to 1.0.0.

1) Any NiFi API URL of a node in the remote cluster should work for
the target URL. If there're node1, 2, 3 in a remote cluster then
"http://node<1, 2 or 3>:<port>/nifi" should work. Every node is
capable to handle web requests with NiFi 1.0 Zero Master Clustering.

2) Having 100 outstanding (not finished, under being replicated
requests) seems there're some issues. How many Site-to-Site clients do
you have? Normally, each client sends request to
/nifi-api/site-to-site once per 10 minutes to refresh cached remote
site info. Actual data transfer is done with different endpoint.

Would you share the Remote Process Group configs such as Transport
Protocol and Proxy settings. Also please share thread dump. It can be
generated by executing "bin/nifi.sh dump" command, then thread dump is
logged to "logs/nifi-bootstrap.log".

Thanks,
Koji

On Sat, Sep 10, 2016 at 7:45 AM, Kumiko Yada <Ku...@ds-iq.com> wrote:
> These are two specific questions that we have.
>
>
>
> 1)      In Nifi Summary/Remove Process Group, what value need to be used for
> the Target URI.  We were able to use the specific URI in Nifi 0.7.0 because
> we had to specified the master node; however, Nifi 1.0.0 uses zero master
> clustering and automatically assign the master.
>
> 2)      We are getting the error “2016-09-08 16:00:01,246 ERROR [NiFi Web
> Server-6153] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Cannot replicate
> request GET /nifi-api/site-to-site because there are 100 outstanding HTTP
> Requests already. Request Counts Per URI = {/nifi-api/site-to-site=99,
> /nifi-api/flow/current-user=1}.
>
> In
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster/src/main/java/org/apache/nifi/cluster/coordination/http/replication/ThreadPoolRequestReplicator.java,
> the limit is hardcoded (MAX_CONCURRENT_REQUESTS = 100) in the
> ThreadPoolRequestReplicator class, which is completely new with version 1.0.
> Are we getting this error because we hit this hard coded value 100?  If so,
> what is the best workaround, would the increase this value a good idea?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Friday, September 9, 2016 10:06 AM
> To: users@nifi.apache.org
> Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven
> <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>
> Subject: Upgrade 0.7.0 to 1.0.0
>
>
>
> Hello,
>
>
>
> We upgraded from 0.7.0 to 1.0.0, and we encountered some issues.  We think
> that this upgrade brought a change to how the site-to-site calls are made
> for the remote process groups and is causing us to hit a max requests limit
> that was also introduced with v1.0. Is this a known issue?
>
>
>
> The upgrade changes the serialization version of the provenance files that
> is not backwards compatibility (nifi-1.0 -> 1-9, nifi-0.7 -> 1-8).  Is there
> any workaround this?
>
>
>
> Thanks
>
> Kumiko

Re: Upgrade 0.7.0 to 1.0.0

Posted by Koji Kawamura <ij...@apache.org>.
Hello Kumiko,

Sorry to hear that you're having issues with upgrading to 1.0.0.

1) Any NiFi API URL of a node in the remote cluster should work for
the target URL. If there're node1, 2, 3 in a remote cluster then
"http://node<1, 2 or 3>:<port>/nifi" should work. Every node is
capable to handle web requests with NiFi 1.0 Zero Master Clustering.

2) Having 100 outstanding (not finished, under being replicated
requests) seems there're some issues. How many Site-to-Site clients do
you have? Normally, each client sends request to
/nifi-api/site-to-site once per 10 minutes to refresh cached remote
site info. Actual data transfer is done with different endpoint.

Would you share the Remote Process Group configs such as Transport
Protocol and Proxy settings. Also please share thread dump. It can be
generated by executing "bin/nifi.sh dump" command, then thread dump is
logged to "logs/nifi-bootstrap.log".

Thanks,
Koji

On Sat, Sep 10, 2016 at 7:45 AM, Kumiko Yada <Ku...@ds-iq.com> wrote:
> These are two specific questions that we have.
>
>
>
> 1)      In Nifi Summary/Remove Process Group, what value need to be used for
> the Target URI.  We were able to use the specific URI in Nifi 0.7.0 because
> we had to specified the master node; however, Nifi 1.0.0 uses zero master
> clustering and automatically assign the master.
>
> 2)      We are getting the error “2016-09-08 16:00:01,246 ERROR [NiFi Web
> Server-6153] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Cannot replicate
> request GET /nifi-api/site-to-site because there are 100 outstanding HTTP
> Requests already. Request Counts Per URI = {/nifi-api/site-to-site=99,
> /nifi-api/flow/current-user=1}.
>
> In
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster/src/main/java/org/apache/nifi/cluster/coordination/http/replication/ThreadPoolRequestReplicator.java,
> the limit is hardcoded (MAX_CONCURRENT_REQUESTS = 100) in the
> ThreadPoolRequestReplicator class, which is completely new with version 1.0.
> Are we getting this error because we hit this hard coded value 100?  If so,
> what is the best workaround, would the increase this value a good idea?
>
>
>
> Thanks
>
> Kumiko
>
>
>
> From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
> Sent: Friday, September 9, 2016 10:06 AM
> To: users@nifi.apache.org
> Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven
> <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>
> Subject: Upgrade 0.7.0 to 1.0.0
>
>
>
> Hello,
>
>
>
> We upgraded from 0.7.0 to 1.0.0, and we encountered some issues.  We think
> that this upgrade brought a change to how the site-to-site calls are made
> for the remote process groups and is causing us to hit a max requests limit
> that was also introduced with v1.0. Is this a known issue?
>
>
>
> The upgrade changes the serialization version of the provenance files that
> is not backwards compatibility (nifi-1.0 -> 1-9, nifi-0.7 -> 1-8).  Is there
> any workaround this?
>
>
>
> Thanks
>
> Kumiko

RE: Upgrade 0.7.0 to 1.0.0

Posted by Kumiko Yada <Ku...@ds-iq.com>.
These are two specific questions that we have.


1)      In Nifi Summary/Remove Process Group, what value need to be used for the Target URI.  We were able to use the specific URI in Nifi 0.7.0 because we had to specified the master node; however, Nifi 1.0.0 uses zero master clustering and automatically assign the master.

2)      We are getting the error "2016-09-08 16:00:01,246 ERROR [NiFi Web Server-6153] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Cannot replicate request GET /nifi-api/site-to-site because there are 100 outstanding HTTP Requests already. Request Counts Per URI = {/nifi-api/site-to-site=99, /nifi-api/flow/current-user=1}.


In https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-cluster/src/main/java/org/apache/nifi/cluster/coordination/http/replication/ThreadPoolRequestReplicator.java, the limit is hardcoded (MAX_CONCURRENT_REQUESTS = 100) in the ThreadPoolRequestReplicator class, which is completely new with version 1.0.  Are we getting this error because we hit this hard coded value 100?  If so, what is the best workaround, would the increase this value a good idea?

Thanks
Kumiko

From: Kumiko Yada [mailto:Kumiko.Yada@ds-iq.com]
Sent: Friday, September 9, 2016 10:06 AM
To: users@nifi.apache.org
Cc: Chien Le <Ch...@ds-iq.com>; Kevin Verhoeven <Ke...@ds-iq.com>; Wei Zhang <We...@ds-iq.com>
Subject: Upgrade 0.7.0 to 1.0.0

Hello,

We upgraded from 0.7.0 to 1.0.0, and we encountered some issues.  We think that this upgrade brought a change to how the site-to-site calls are made for the remote process groups and is causing us to hit a max requests limit that was also introduced with v1.0. Is this a known issue?

The upgrade changes the serialization version of the provenance files that is not backwards compatibility (nifi-1.0 -> 1-9, nifi-0.7 -> 1-8).  Is there any workaround this?

Thanks
Kumiko