You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Faisal Durrani <te...@gmail.com> on 2018/07/05 02:12:24 UTC

RPG S2S Error

Hi, I've got two questions

1.We are using Remote Process Group with Raw transport protocol to
distribute the data across four node cluster. I see the nifi app log has a
lot of instance of the below error


   1. o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate
with remote instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
(SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
due to java.io.EOFException; closing connection

These error do not show on the bulletin board and nor do I see any data
loss. I was curious to know if there is some bad configuration that is
causing this to happen.

2. The app log also has the below error


   1. o.a.n.r.c.socket.EndpointConnectionPool
EndpointConnectionPool[Cluster
URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that
port 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
penalizing peer

The data flow consume a high volume data and there is back pressure on
almost all the connections. So probably that is what causing it. I guess
there isn't much we can do here and once the back pressure resolve ,the
error goes away on its own.Please let me know of your view.

Re: RPG S2S Error

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Faisai,

Adding ControlRate processor before sending FlowFiles via RPG, you can
throttle the rate of sending data, that should help reducing the
probability for receiving side to get full.

If the current overall throughput is acceptable for your use-case, and
you don't see any data loss, then you should be able to ignore the
message.
You can filter message by log level, configured in conf/logback.xml.
By adding following line, you can filter EndpointConnectionPool
warning messages.
<logger name="org.apache.nifi.remote.client.socket.EndpointConnectionPool"
level="ERROR"/>

The SocketRemoteSiteListener log message level you want to filter is ERROR.
I think you need to write a custom log filter class to filter it.
https://logback.qos.ch/manual/filters.html

Thanks,
Koji

On Fri, Jul 20, 2018 at 3:11 PM, Faisal Durrani <te...@gmail.com> wrote:
> Hi Joe/Koji,
>
> I cant seem to figure out a way to reduce the back pressure or to find the
> root cause of the errors
>
> 1.Unable to communicate with remote instance Peer [xxxx] due to
> java.io.EOFException; closing connection
> 2.indicates that port 37e64bd0-5326-3c3f-80f4-42a828dea1d5's destination is
> full; penalizing peer
>
> I have tried increasing the rate of delivery of the data by increasing the
> concurrent tasks, increasing the back pressure thresholds , replacing the
> puthbasejson processor with puthbaserecord(the slowest part of our data
> flow) etc. While i have seen some  improvement , I can't seem to get rid of
> the above errors. I also changed various settings in the Nifi config like
>
> nifi.cluster.node.protocol.threads =50
> JVM =4096
> nifi.cluster.node.max.concurrent.requests=400
> nifi.cluster.node.protocol.threads=50
> nifi.web.jetty.threads=400
>
> Would it be safe to ignore these error as they fill up the API logs or do I
> need to investigate further? If we can ignore these then is there any way to
> stop them from appearing in the log file?
>
>
>
> On Fri, Jul 13, 2018 at 10:42 AM Joe Witt <jo...@gmail.com> wrote:
>>
>> you can allow for larger backlogs by increasing the backpressure
>> thresholds OR you can add additional nodes OR you can expire data.
>>
>> The whole point of the backpressure and pressure release features are to
>> let you be in control of how many resources are dedicated to buffering data.
>> However, in the most basic sense if rate of data arrival always exceeds rate
>> of delivery then delivery must he made faster or data must be expired at
>> some threshold age.
>>
>> thanks
>>
>> On Thu, Jul 12, 2018, 9:34 PM Faisal Durrani <te...@gmail.com> wrote:
>>>
>>> Hi Koji,
>>>
>>> I moved onto another cluster of Nifi nodes , did the same configuration
>>> for S2S there and boom.. the same error message all over the logs.(nothing
>>> on the bulletin board)
>>>
>>> Could it be because of the back pressure as i also get the  error
>>> -(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is
>>> full; penalizing peer) at the same time i see the closing connection error.
>>> I don't see a way to resolve the back pressure as we get continue stream of
>>> data from the kafka which is then inserted into Hbase( the slowest part of
>>> the data flow) which eventually causes the back pressure.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ij...@gmail.com>
>>> wrote:
>>>>
>>>> Hi Faisai,
>>>>
>>>> I think both error messages indicating the same thing, that is network
>>>> communication is closed in the middle of a Site-to-Site transaction.
>>>> That can be happen due to many reasons, such as freaky network, or
>>>> manually stop the port or RPG while some transaction is being
>>>> processed. I don't think it is a configuration issue, because NiFi was
>>>> able to initiate S2S communication.
>>>>
>>>> Thanks,
>>>> Koji
>>>>
>>>> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te...@gmail.com>
>>>> wrote:
>>>> > Hi Koji,
>>>> >
>>>> > In the subsequent tests the above error did not come but now we are
>>>> > getting
>>>> > errors on the RPG :
>>>> >
>>>> >
>>>> > RemoteGroupPort[name=1_pk_ip,targets=http://xxxxxx.prod.xx.local:9090/nifi/]
>>>> > failed to communicate with remote NiFi instance due to
>>>> > java.io.IOException:
>>>> > Failed to confirm transaction with
>>>> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to
>>>> > java.io.IOException:
>>>> > Connection reset by peer
>>>> >
>>>> > The transport protocol is RAW while the URLs mentioned while setting
>>>> > up the
>>>> > RPG is one of the node of the (4)node cluster.
>>>> >
>>>> > nifi.remote.input.socket.port = 5001
>>>> >
>>>> > nifi.remote.input.secure=false
>>>> >
>>>> > nifi.remote.input.http.transaction.ttl=60 sec
>>>> >
>>>> > nifi.remote.input.host=
>>>> >
>>>> > Please let me  know if there is any configuration changes that we need
>>>> > to
>>>> > make.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Hi Koji ,
>>>> >>
>>>> >> Thank you for your reply. I updated the logback.xml and ran the test
>>>> >> again. I can see an additional error in the app.log which is as
>>>> >> below.
>>>> >>
>>>> >> o.a.nifi.remote.SocketRemoteSiteListener
>>>> >> java.io.EOFException: null
>>>> >>      at
>>>> >> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>>> >>      at
>>>> >>
>>>> >> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
>>>> >>      at
>>>> >>
>>>> >> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
>>>> >>      at
>>>> >>
>>>> >> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
>>>> >>      at java.lang.Thread.run(Thread.java:745)
>>>> >>
>>>> >>
>>>> >> I notice this error is reported against not just one node but
>>>> >> different
>>>> >> nodes in the cluster. Would you be able infer the root cause of the
>>>> >> issue
>>>> >> from this information?
>>>> >>
>>>> >> Thanks.
>>>> >>
>>>> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>> 1. The error message sounds like the client disconnects in the
>>>> >>> middle
>>>> >>> of Site-to-Site communication. Enabling debug log would show more
>>>> >>> information, by adding <logger name="org.apache.nifi.remote"
>>>> >>> level="DEBUG"/> at conf/logback.xml.
>>>> >>>
>>>> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>>>> >>> distributed). Connection status history, 'Queued Count' per node may
>>>> >>> be useful to check. If not evenly distributed, I'd lower Remote Port
>>>> >>> batch settings at sending side.
>>>> >>> Then try to find a bottle neck in downstream flow. Increasing
>>>> >>> concurrent tasks at such bottle neck processor can help increasing
>>>> >>> throughput in some cases. Adding more node will also help.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Koji
>>>> >>>
>>>> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani
>>>> >>> <te...@gmail.com>
>>>> >>> wrote:
>>>> >>> > Hi, I've got two questions
>>>> >>> >
>>>> >>> > 1.We are using Remote Process Group with Raw transport protocol to
>>>> >>> > distribute the data across four node cluster. I see the nifi app
>>>> >>> > log
>>>> >>> > has a
>>>> >>> > lot of instance of the below error
>>>> >>> >
>>>> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate
>>>> >>> > with
>>>> >>> > remote
>>>> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>>>> >>> >
>>>> >>> >
>>>> >>> > (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>>>> >>> > due to java.io.EOFException; closing connection
>>>> >>> >
>>>> >>> > These error do not show on the bulletin board and nor do I see any
>>>> >>> > data
>>>> >>> > loss. I was curious to know if there is some bad configuration
>>>> >>> > that is
>>>> >>> > causing this to happen.
>>>> >>> >
>>>> >>> > 2. The app log also has the below error
>>>> >>> >
>>>> >>> > o.a.n.r.c.socket.EndpointConnectionPool
>>>> >>> > EndpointConnectionPool[Cluster
>>>> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>>>> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates
>>>> >>> > that
>>>> >>> > port
>>>> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
>>>> >>> > penalizing
>>>> >>> > peer
>>>> >>> >
>>>> >>> > The data flow consume a high volume data and there is back
>>>> >>> > pressure on
>>>> >>> > almost all the connections. So probably that is what causing it. I
>>>> >>> > guess
>>>> >>> > there isn't much we can do here and once the back pressure resolve
>>>> >>> > ,the
>>>> >>> > error goes away on its own.Please let me know of your view.
>>>> >>> >
>>>> >>> >

Re: RPG S2S Error

Posted by Faisal Durrani <te...@gmail.com>.
Hi Joe/Koji,

I cant seem to figure out a way to reduce the back pressure or to find the
root cause of the errors

1.Unable to communicate with remote instance Peer [xxxx] due to
java.io.EOFException; closing connection
2.indicates that port 37e64bd0-5326-3c3f-80f4-42a828dea1d5's destination is
full; penalizing peer

I have tried increasing the rate of delivery of the data by increasing the
concurrent tasks, increasing the back pressure thresholds , replacing the
puthbasejson processor with puthbaserecord(the slowest part of our data
flow) etc. While i have seen some  improvement , I can't seem to get rid of
the above errors. I also changed various settings in the Nifi config like

nifi.cluster.node.protocol.threads =50
JVM =4096
nifi.cluster.node.max.concurrent.requests=400
nifi.cluster.node.protocol.threads=50
nifi.web.jetty.threads=400

Would it be safe to ignore these error as they fill up the API logs or do I
need to investigate further? If we can ignore these then is there any way
to stop them from appearing in the log file?



On Fri, Jul 13, 2018 at 10:42 AM Joe Witt <jo...@gmail.com> wrote:

> you can allow for larger backlogs by increasing the backpressure
> thresholds OR you can add additional nodes OR you can expire data.
>
> The whole point of the backpressure and pressure release features are to
> let you be in control of how many resources are dedicated to buffering
> data.  However, in the most basic sense if rate of data arrival always
> exceeds rate of delivery then delivery must he made faster or data must be
> expired at some threshold age.
>
> thanks
>
> On Thu, Jul 12, 2018, 9:34 PM Faisal Durrani <te...@gmail.com> wrote:
>
>> Hi Koji,
>>
>> I moved onto another cluster of Nifi nodes , did the same configuration
>> for S2S there and boom.. the same error message all over the logs.(nothing
>> on the bulletin board)
>>
>> Could it be because of the back pressure as i also get the  error
>> -(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is
>> full; penalizing peer) at the same time i see the closing connection error.
>> I don't see a way to resolve the back pressure as we get continue stream of
>> data from the kafka which is then inserted into Hbase( the slowest part of
>> the data flow) which eventually causes the back pressure.
>>
>>
>>
>>
>>
>> On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ij...@gmail.com>
>> wrote:
>>
>>> Hi Faisai,
>>>
>>> I think both error messages indicating the same thing, that is network
>>> communication is closed in the middle of a Site-to-Site transaction.
>>> That can be happen due to many reasons, such as freaky network, or
>>> manually stop the port or RPG while some transaction is being
>>> processed. I don't think it is a configuration issue, because NiFi was
>>> able to initiate S2S communication.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te...@gmail.com>
>>> wrote:
>>> > Hi Koji,
>>> >
>>> > In the subsequent tests the above error did not come but now we are
>>> getting
>>> > errors on the RPG :
>>> >
>>> > RemoteGroupPort[name=1_pk_ip,targets=
>>> http://xxxxxx.prod.xx.local:9090/nifi/]
>>> > failed to communicate with remote NiFi instance due to
>>> java.io.IOException:
>>> > Failed to confirm transaction with
>>> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to
>>> java.io.IOException:
>>> > Connection reset by peer
>>> >
>>> > The transport protocol is RAW while the URLs mentioned while setting
>>> up the
>>> > RPG is one of the node of the (4)node cluster.
>>> >
>>> > nifi.remote.input.socket.port = 5001
>>> >
>>> > nifi.remote.input.secure=false
>>> >
>>> > nifi.remote.input.http.transaction.ttl=60 sec
>>> >
>>> > nifi.remote.input.host=
>>> >
>>> > Please let me  know if there is any configuration changes that we need
>>> to
>>> > make.
>>> >
>>> >
>>> >
>>> >
>>> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Koji ,
>>> >>
>>> >> Thank you for your reply. I updated the logback.xml and ran the test
>>> >> again. I can see an additional error in the app.log which is as below.
>>> >>
>>> >> o.a.nifi.remote.SocketRemoteSiteListener
>>> >> java.io.EOFException: null
>>> >>      at
>>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
>>> >>      at
>>> >>
>>> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
>>> >>      at java.lang.Thread.run(Thread.java:745)
>>> >>
>>> >>
>>> >> I notice this error is reported against not just one node but
>>> different
>>> >> nodes in the cluster. Would you be able infer the root cause of the
>>> issue
>>> >> from this information?
>>> >>
>>> >> Thanks.
>>> >>
>>> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> 1. The error message sounds like the client disconnects in the middle
>>> >>> of Site-to-Site communication. Enabling debug log would show more
>>> >>> information, by adding <logger name="org.apache.nifi.remote"
>>> >>> level="DEBUG"/> at conf/logback.xml.
>>> >>>
>>> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>>> >>> distributed). Connection status history, 'Queued Count' per node may
>>> >>> be useful to check. If not evenly distributed, I'd lower Remote Port
>>> >>> batch settings at sending side.
>>> >>> Then try to find a bottle neck in downstream flow. Increasing
>>> >>> concurrent tasks at such bottle neck processor can help increasing
>>> >>> throughput in some cases. Adding more node will also help.
>>> >>>
>>> >>> Thanks,
>>> >>> Koji
>>> >>>
>>> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te04.0172@gmail.com
>>> >
>>> >>> wrote:
>>> >>> > Hi, I've got two questions
>>> >>> >
>>> >>> > 1.We are using Remote Process Group with Raw transport protocol to
>>> >>> > distribute the data across four node cluster. I see the nifi app
>>> log
>>> >>> > has a
>>> >>> > lot of instance of the below error
>>> >>> >
>>> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
>>> >>> > remote
>>> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>>> >>> >
>>> >>> >
>>> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>>> >>> > due to java.io.EOFException; closing connection
>>> >>> >
>>> >>> > These error do not show on the bulletin board and nor do I see any
>>> data
>>> >>> > loss. I was curious to know if there is some bad configuration
>>> that is
>>> >>> > causing this to happen.
>>> >>> >
>>> >>> > 2. The app log also has the below error
>>> >>> >
>>> >>> > o.a.n.r.c.socket.EndpointConnectionPool
>>> EndpointConnectionPool[Cluster
>>> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>>> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates
>>> that
>>> >>> > port
>>> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
>>> penalizing
>>> >>> > peer
>>> >>> >
>>> >>> > The data flow consume a high volume data and there is back
>>> pressure on
>>> >>> > almost all the connections. So probably that is what causing it. I
>>> >>> > guess
>>> >>> > there isn't much we can do here and once the back pressure resolve
>>> ,the
>>> >>> > error goes away on its own.Please let me know of your view.
>>> >>> >
>>> >>> >
>>>
>>

Re: RPG S2S Error

Posted by Joe Witt <jo...@gmail.com>.
you can allow for larger backlogs by increasing the backpressure thresholds
OR you can add additional nodes OR you can expire data.

The whole point of the backpressure and pressure release features are to
let you be in control of how many resources are dedicated to buffering
data.  However, in the most basic sense if rate of data arrival always
exceeds rate of delivery then delivery must he made faster or data must be
expired at some threshold age.

thanks

On Thu, Jul 12, 2018, 9:34 PM Faisal Durrani <te...@gmail.com> wrote:

> Hi Koji,
>
> I moved onto another cluster of Nifi nodes , did the same configuration
> for S2S there and boom.. the same error message all over the logs.(nothing
> on the bulletin board)
>
> Could it be because of the back pressure as i also get the  error
> -(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is
> full; penalizing peer) at the same time i see the closing connection error.
> I don't see a way to resolve the back pressure as we get continue stream of
> data from the kafka which is then inserted into Hbase( the slowest part of
> the data flow) which eventually causes the back pressure.
>
>
>
>
>
> On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ij...@gmail.com>
> wrote:
>
>> Hi Faisai,
>>
>> I think both error messages indicating the same thing, that is network
>> communication is closed in the middle of a Site-to-Site transaction.
>> That can be happen due to many reasons, such as freaky network, or
>> manually stop the port or RPG while some transaction is being
>> processed. I don't think it is a configuration issue, because NiFi was
>> able to initiate S2S communication.
>>
>> Thanks,
>> Koji
>>
>> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te...@gmail.com>
>> wrote:
>> > Hi Koji,
>> >
>> > In the subsequent tests the above error did not come but now we are
>> getting
>> > errors on the RPG :
>> >
>> > RemoteGroupPort[name=1_pk_ip,targets=
>> http://xxxxxx.prod.xx.local:9090/nifi/]
>> > failed to communicate with remote NiFi instance due to
>> java.io.IOException:
>> > Failed to confirm transaction with
>> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to
>> java.io.IOException:
>> > Connection reset by peer
>> >
>> > The transport protocol is RAW while the URLs mentioned while setting up
>> the
>> > RPG is one of the node of the (4)node cluster.
>> >
>> > nifi.remote.input.socket.port = 5001
>> >
>> > nifi.remote.input.secure=false
>> >
>> > nifi.remote.input.http.transaction.ttl=60 sec
>> >
>> > nifi.remote.input.host=
>> >
>> > Please let me  know if there is any configuration changes that we need
>> to
>> > make.
>> >
>> >
>> >
>> >
>> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com>
>> wrote:
>> >>
>> >> Hi Koji ,
>> >>
>> >> Thank you for your reply. I updated the logback.xml and ran the test
>> >> again. I can see an additional error in the app.log which is as below.
>> >>
>> >> o.a.nifi.remote.SocketRemoteSiteListener
>> >> java.io.EOFException: null
>> >>      at
>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>> >>      at
>> >>
>> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
>> >>      at
>> >>
>> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
>> >>      at
>> >>
>> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
>> >>      at java.lang.Thread.run(Thread.java:745)
>> >>
>> >>
>> >> I notice this error is reported against not just one node but different
>> >> nodes in the cluster. Would you be able infer the root cause of the
>> issue
>> >> from this information?
>> >>
>> >> Thanks.
>> >>
>> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> 1. The error message sounds like the client disconnects in the middle
>> >>> of Site-to-Site communication. Enabling debug log would show more
>> >>> information, by adding <logger name="org.apache.nifi.remote"
>> >>> level="DEBUG"/> at conf/logback.xml.
>> >>>
>> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>> >>> distributed). Connection status history, 'Queued Count' per node may
>> >>> be useful to check. If not evenly distributed, I'd lower Remote Port
>> >>> batch settings at sending side.
>> >>> Then try to find a bottle neck in downstream flow. Increasing
>> >>> concurrent tasks at such bottle neck processor can help increasing
>> >>> throughput in some cases. Adding more node will also help.
>> >>>
>> >>> Thanks,
>> >>> Koji
>> >>>
>> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com>
>> >>> wrote:
>> >>> > Hi, I've got two questions
>> >>> >
>> >>> > 1.We are using Remote Process Group with Raw transport protocol to
>> >>> > distribute the data across four node cluster. I see the nifi app log
>> >>> > has a
>> >>> > lot of instance of the below error
>> >>> >
>> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
>> >>> > remote
>> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>> >>> >
>> >>> >
>> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>> >>> > due to java.io.EOFException; closing connection
>> >>> >
>> >>> > These error do not show on the bulletin board and nor do I see any
>> data
>> >>> > loss. I was curious to know if there is some bad configuration that
>> is
>> >>> > causing this to happen.
>> >>> >
>> >>> > 2. The app log also has the below error
>> >>> >
>> >>> > o.a.n.r.c.socket.EndpointConnectionPool
>> EndpointConnectionPool[Cluster
>> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates
>> that
>> >>> > port
>> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
>> penalizing
>> >>> > peer
>> >>> >
>> >>> > The data flow consume a high volume data and there is back pressure
>> on
>> >>> > almost all the connections. So probably that is what causing it. I
>> >>> > guess
>> >>> > there isn't much we can do here and once the back pressure resolve
>> ,the
>> >>> > error goes away on its own.Please let me know of your view.
>> >>> >
>> >>> >
>>
>

Re: RPG S2S Error

Posted by Faisal Durrani <te...@gmail.com>.
Hi Koji,

I moved onto another cluster of Nifi nodes , did the same configuration for
S2S there and boom.. the same error message all over the logs.(nothing on
the bulletin board)

Could it be because of the back pressure as i also get the  error
-(indicates that port 8c77c1b0-0164-1000-0000-0000052fa54c's destination is
full; penalizing peer) at the same time i see the closing connection error.
I don't see a way to resolve the back pressure as we get continue stream of
data from the kafka which is then inserted into Hbase( the slowest part of
the data flow) which eventually causes the back pressure.





On Fri, Jul 6, 2018 at 4:55 PM Koji Kawamura <ij...@gmail.com> wrote:

> Hi Faisai,
>
> I think both error messages indicating the same thing, that is network
> communication is closed in the middle of a Site-to-Site transaction.
> That can be happen due to many reasons, such as freaky network, or
> manually stop the port or RPG while some transaction is being
> processed. I don't think it is a configuration issue, because NiFi was
> able to initiate S2S communication.
>
> Thanks,
> Koji
>
> On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te...@gmail.com>
> wrote:
> > Hi Koji,
> >
> > In the subsequent tests the above error did not come but now we are
> getting
> > errors on the RPG :
> >
> > RemoteGroupPort[name=1_pk_ip,targets=
> http://xxxxxx.prod.xx.local:9090/nifi/]
> > failed to communicate with remote NiFi instance due to
> java.io.IOException:
> > Failed to confirm transaction with
> > Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to java.io.IOException:
> > Connection reset by peer
> >
> > The transport protocol is RAW while the URLs mentioned while setting up
> the
> > RPG is one of the node of the (4)node cluster.
> >
> > nifi.remote.input.socket.port = 5001
> >
> > nifi.remote.input.secure=false
> >
> > nifi.remote.input.http.transaction.ttl=60 sec
> >
> > nifi.remote.input.host=
> >
> > Please let me  know if there is any configuration changes that we need to
> > make.
> >
> >
> >
> >
> > On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com>
> wrote:
> >>
> >> Hi Koji ,
> >>
> >> Thank you for your reply. I updated the logback.xml and ran the test
> >> again. I can see an additional error in the app.log which is as below.
> >>
> >> o.a.nifi.remote.SocketRemoteSiteListener
> >> java.io.EOFException: null
> >>      at
> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:589)
> >>      at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> >>      at
> >>
> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
> >>      at
> >>
> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
> >>      at
> >>
> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
> >>      at java.lang.Thread.run(Thread.java:745)
> >>
> >>
> >> I notice this error is reported against not just one node but different
> >> nodes in the cluster. Would you be able infer the root cause of the
> issue
> >> from this information?
> >>
> >> Thanks.
> >>
> >> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> 1. The error message sounds like the client disconnects in the middle
> >>> of Site-to-Site communication. Enabling debug log would show more
> >>> information, by adding <logger name="org.apache.nifi.remote"
> >>> level="DEBUG"/> at conf/logback.xml.
> >>>
> >>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
> >>> distributed). Connection status history, 'Queued Count' per node may
> >>> be useful to check. If not evenly distributed, I'd lower Remote Port
> >>> batch settings at sending side.
> >>> Then try to find a bottle neck in downstream flow. Increasing
> >>> concurrent tasks at such bottle neck processor can help increasing
> >>> throughput in some cases. Adding more node will also help.
> >>>
> >>> Thanks,
> >>> Koji
> >>>
> >>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com>
> >>> wrote:
> >>> > Hi, I've got two questions
> >>> >
> >>> > 1.We are using Remote Process Group with Raw transport protocol to
> >>> > distribute the data across four node cluster. I see the nifi app log
> >>> > has a
> >>> > lot of instance of the below error
> >>> >
> >>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
> >>> > remote
> >>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
> >>> >
> >>> >
> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
> >>> > due to java.io.EOFException; closing connection
> >>> >
> >>> > These error do not show on the bulletin board and nor do I see any
> data
> >>> > loss. I was curious to know if there is some bad configuration that
> is
> >>> > causing this to happen.
> >>> >
> >>> > 2. The app log also has the below error
> >>> >
> >>> > o.a.n.r.c.socket.EndpointConnectionPool
> EndpointConnectionPool[Cluster
> >>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
> >>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that
> >>> > port
> >>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full;
> penalizing
> >>> > peer
> >>> >
> >>> > The data flow consume a high volume data and there is back pressure
> on
> >>> > almost all the connections. So probably that is what causing it. I
> >>> > guess
> >>> > there isn't much we can do here and once the back pressure resolve
> ,the
> >>> > error goes away on its own.Please let me know of your view.
> >>> >
> >>> >
>

Re: RPG S2S Error

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Faisai,

I think both error messages indicating the same thing, that is network
communication is closed in the middle of a Site-to-Site transaction.
That can be happen due to many reasons, such as freaky network, or
manually stop the port or RPG while some transaction is being
processed. I don't think it is a configuration issue, because NiFi was
able to initiate S2S communication.

Thanks,
Koji

On Fri, Jul 6, 2018 at 4:16 PM, Faisal Durrani <te...@gmail.com> wrote:
> Hi Koji,
>
> In the subsequent tests the above error did not come but now we are getting
> errors on the RPG :
>
> RemoteGroupPort[name=1_pk_ip,targets=http://xxxxxx.prod.xx.local:9090/nifi/]
> failed to communicate with remote NiFi instance due to java.io.IOException:
> Failed to confirm transaction with
> Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to java.io.IOException:
> Connection reset by peer
>
> The transport protocol is RAW while the URLs mentioned while setting up the
> RPG is one of the node of the (4)node cluster.
>
> nifi.remote.input.socket.port = 5001
>
> nifi.remote.input.secure=false
>
> nifi.remote.input.http.transaction.ttl=60 sec
>
> nifi.remote.input.host=
>
> Please let me  know if there is any configuration changes that we need to
> make.
>
>
>
>
> On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com> wrote:
>>
>> Hi Koji ,
>>
>> Thank you for your reply. I updated the logback.xml and ran the test
>> again. I can see an additional error in the app.log which is as below.
>>
>> o.a.nifi.remote.SocketRemoteSiteListener
>> java.io.EOFException: null
>> 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> 	at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> 	at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>> 	at
>> org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
>> 	at
>> org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
>> 	at
>> org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
>> 	at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I notice this error is reported against not just one node but different
>> nodes in the cluster. Would you be able infer the root cause of the issue
>> from this information?
>>
>> Thanks.
>>
>> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> 1. The error message sounds like the client disconnects in the middle
>>> of Site-to-Site communication. Enabling debug log would show more
>>> information, by adding <logger name="org.apache.nifi.remote"
>>> level="DEBUG"/> at conf/logback.xml.
>>>
>>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>>> distributed). Connection status history, 'Queued Count' per node may
>>> be useful to check. If not evenly distributed, I'd lower Remote Port
>>> batch settings at sending side.
>>> Then try to find a bottle neck in downstream flow. Increasing
>>> concurrent tasks at such bottle neck processor can help increasing
>>> throughput in some cases. Adding more node will also help.
>>>
>>> Thanks,
>>> Koji
>>>
>>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com>
>>> wrote:
>>> > Hi, I've got two questions
>>> >
>>> > 1.We are using Remote Process Group with Raw transport protocol to
>>> > distribute the data across four node cluster. I see the nifi app log
>>> > has a
>>> > lot of instance of the below error
>>> >
>>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
>>> > remote
>>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>>> >
>>> > (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>>> > due to java.io.EOFException; closing connection
>>> >
>>> > These error do not show on the bulletin board and nor do I see any data
>>> > loss. I was curious to know if there is some bad configuration that is
>>> > causing this to happen.
>>> >
>>> > 2. The app log also has the below error
>>> >
>>> > o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster
>>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that
>>> > port
>>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full; penalizing
>>> > peer
>>> >
>>> > The data flow consume a high volume data and there is back pressure on
>>> > almost all the connections. So probably that is what causing it. I
>>> > guess
>>> > there isn't much we can do here and once the back pressure resolve ,the
>>> > error goes away on its own.Please let me know of your view.
>>> >
>>> >

Re: RPG S2S Error

Posted by Faisal Durrani <te...@gmail.com>.
Hi Koji,

In the subsequent tests the above error did not come but now we are getting
errors on the RPG :

RemoteGroupPort[name=1_pk_ip,targets=http://xxxxxx.prod.xx.local:9090/nifi/]
failed to communicate with remote NiFi instance due to
java.io.IOException: Failed to confirm transaction with
Peer[url=nifi://xxx-xxxxx.prod.xx.local:5001] due to
java.io.IOException: Connection reset by peer

The transport protocol is RAW while the URLs mentioned while setting
up the RPG is one of the node of the (4)node cluster.

nifi.remote.input.socket.port = 5001

nifi.remote.input.secure=false

nifi.remote.input.http.transaction.ttl=60 sec

nifi.remote.input.host=

Please let me  know if there is any configuration changes that we need to make.





On Fri, Jul 6, 2018 at 9:48 AM Faisal Durrani <te...@gmail.com> wrote:

> Hi Koji ,
>
> Thank you for your reply. I updated the logback.xml and ran the test
> again. I can see an additional error in the app.log which is as below.
>
> o.a.nifi.remote.SocketRemoteSiteListener
> java.io.EOFException: null
> 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
> 	at java.io.DataInputStream.readUTF(DataInputStream.java:589)
> 	at java.io.DataInputStream.readUTF(DataInputStream.java:564)
> 	at org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
> 	at org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
> 	at org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
> 	at java.lang.Thread.run(Thread.java:745)
>
>
> I notice this error is reported against not just one node but different
> nodes in the cluster. Would you be able infer the root cause of the issue
> from this information?
>
> Thanks.
>
> On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com>
> wrote:
>
>> Hello,
>>
>> 1. The error message sounds like the client disconnects in the middle
>> of Site-to-Site communication. Enabling debug log would show more
>> information, by adding <logger name="org.apache.nifi.remote"
>> level="DEBUG"/> at conf/logback.xml.
>>
>> 2. I'd suggest checking if your 4 nodes receive data evenly (well
>> distributed). Connection status history, 'Queued Count' per node may
>> be useful to check. If not evenly distributed, I'd lower Remote Port
>> batch settings at sending side.
>> Then try to find a bottle neck in downstream flow. Increasing
>> concurrent tasks at such bottle neck processor can help increasing
>> throughput in some cases. Adding more node will also help.
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com>
>> wrote:
>> > Hi, I've got two questions
>> >
>> > 1.We are using Remote Process Group with Raw transport protocol to
>> > distribute the data across four node cluster. I see the nifi app log
>> has a
>> > lot of instance of the below error
>> >
>> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
>> remote
>> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
>> >
>> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
>> > due to java.io.EOFException; closing connection
>> >
>> > These error do not show on the bulletin board and nor do I see any data
>> > loss. I was curious to know if there is some bad configuration that is
>> > causing this to happen.
>> >
>> > 2. The app log also has the below error
>> >
>> > o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster
>> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
>> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that
>> port
>> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full; penalizing
>> peer
>> >
>> > The data flow consume a high volume data and there is back pressure on
>> > almost all the connections. So probably that is what causing it. I guess
>> > there isn't much we can do here and once the back pressure resolve ,the
>> > error goes away on its own.Please let me know of your view.
>> >
>> >
>>
>

Re: RPG S2S Error

Posted by Faisal Durrani <te...@gmail.com>.
Hi Koji ,

Thank you for your reply. I updated the logback.xml and ran the test again.
I can see an additional error in the app.log which is as below.

o.a.nifi.remote.SocketRemoteSiteListener
java.io.EOFException: null
	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
	at java.io.DataInputStream.readUTF(DataInputStream.java:589)
	at java.io.DataInputStream.readUTF(DataInputStream.java:564)
	at org.apache.nifi.remote.protocol.RequestType.readRequestType(RequestType.java:36)
	at org.apache.nifi.remote.protocol.socket.SocketFlowFileServerProtocol.getRequestType(SocketFlowFileServerProtocol.java:147)
	at org.apache.nifi.remote.SocketRemoteSiteListener$1$1.run(SocketRemoteSiteListener.java:253)
	at java.lang.Thread.run(Thread.java:745)


I notice this error is reported against not just one node but different
nodes in the cluster. Would you be able infer the root cause of the issue
from this information?

Thanks.

On Thu, Jul 5, 2018 at 3:34 PM Koji Kawamura <ij...@gmail.com> wrote:

> Hello,
>
> 1. The error message sounds like the client disconnects in the middle
> of Site-to-Site communication. Enabling debug log would show more
> information, by adding <logger name="org.apache.nifi.remote"
> level="DEBUG"/> at conf/logback.xml.
>
> 2. I'd suggest checking if your 4 nodes receive data evenly (well
> distributed). Connection status history, 'Queued Count' per node may
> be useful to check. If not evenly distributed, I'd lower Remote Port
> batch settings at sending side.
> Then try to find a bottle neck in downstream flow. Increasing
> concurrent tasks at such bottle neck processor can help increasing
> throughput in some cases. Adding more node will also help.
>
> Thanks,
> Koji
>
> On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com>
> wrote:
> > Hi, I've got two questions
> >
> > 1.We are using Remote Process Group with Raw transport protocol to
> > distribute the data across four node cluster. I see the nifi app log has
> a
> > lot of instance of the below error
> >
> > o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with
> remote
> > instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
> >
> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
> > due to java.io.EOFException; closing connection
> >
> > These error do not show on the bulletin board and nor do I see any data
> > loss. I was curious to know if there is some bad configuration that is
> > causing this to happen.
> >
> > 2. The app log also has the below error
> >
> > o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster
> > URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
> > Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that
> port
> > 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full; penalizing
> peer
> >
> > The data flow consume a high volume data and there is back pressure on
> > almost all the connections. So probably that is what causing it. I guess
> > there isn't much we can do here and once the back pressure resolve ,the
> > error goes away on its own.Please let me know of your view.
> >
> >
>

Re: RPG S2S Error

Posted by Koji Kawamura <ij...@gmail.com>.
Hello,

1. The error message sounds like the client disconnects in the middle
of Site-to-Site communication. Enabling debug log would show more
information, by adding <logger name="org.apache.nifi.remote"
level="DEBUG"/> at conf/logback.xml.

2. I'd suggest checking if your 4 nodes receive data evenly (well
distributed). Connection status history, 'Queued Count' per node may
be useful to check. If not evenly distributed, I'd lower Remote Port
batch settings at sending side.
Then try to find a bottle neck in downstream flow. Increasing
concurrent tasks at such bottle neck processor can help increasing
throughput in some cases. Adding more node will also help.

Thanks,
Koji

On Thu, Jul 5, 2018 at 11:12 AM, Faisal Durrani <te...@gmail.com> wrote:
> Hi, I've got two questions
>
> 1.We are using Remote Process Group with Raw transport protocol to
> distribute the data across four node cluster. I see the nifi app log has a
> lot of instance of the below error
>
> o.a.nifi.remote.SocketRemoteSiteListener Unable to communicate with remote
> instance Peer[url=nifi://xxx-xxxxxx.prod.xx.:59528]
> (SocketFlowFileServerProtocol[CommsID=0bf887ed-acb3-4eea-94ac-5abf53ad0bf1])
> due to java.io.EOFException; closing connection
>
> These error do not show on the bulletin board and nor do I see any data
> loss. I was curious to know if there is some bad configuration that is
> causing this to happen.
>
> 2. The app log also has the below error
>
> o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster
> URL=[http://xxx-xxxxxx.prod.xx.local:9090/nifi-api]]
> Peer[url=nifi://ins-btrananifi107z.prod.jp.local:5001] indicates that port
> 417e3d23-5b1a-1616-9728-9d9d1a462646's destination is full; penalizing peer
>
> The data flow consume a high volume data and there is back pressure on
> almost all the connections. So probably that is what causing it. I guess
> there isn't much we can do here and once the back pressure resolve ,the
> error goes away on its own.Please let me know of your view.
>
>