You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by rammohan ganapavarapu <ra...@gmail.com> on 2018/11/02 15:59:06 UTC

Re: qpid-cpp-0.35 errors

Any help in understating this error message would be appreciated.

Ram

On Wed, Oct 31, 2018 at 5:47 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim,
>
> Any idea about this error?
>
> Thanks,
> Ram
>
> On Tue, Oct 30, 2018, 2:13 PM Gordon Sim <gs...@redhat.com> wrote:
>
>> On 30/10/18 18:59, rammohan ganapavarapu wrote:
>> > There are two more error from my original post, can some one help me to
>> > understand when qpid throws these error?
>> >
>> >
>> >     1. 1. 2018-10-22 08:05:30 [Broker] error Channel exception:
>> >     not-attached: Channel 0 is not attached
>> >
>>  (/builddir/build/BUILD/qpid-cpp-1.35.0/src/qpid/amqp_0_10/SessionHandler.cpp:39)
>>
>> The one above is comon when you are sending asynchronously, and a
>> previous message caused the session to be ended with an exception frame.
>> Any subsequent messages that were sent before the client received the
>> exception frame result in above error.
>>
>> >     2. 2018-10-30 14:30:36 [Broker] error Connection exception:
>> >     framing-error: Queue ax-q-axgroup-001-consumer-group-001:
>> >     MessageStoreImpl::store() failed: jexception 0x0803 wmgr::enqueue()
>> threw
>> >     JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue
>> returned
>> >     partly completed (state ENQ_PART). (This data_tok: id=1714315
>> state=NONE)
>> >
>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> >     3. 2018-10-30 14:30:36 [Protocol] error Connection
>> >     qpid.10.68.94.134:5672-10.68.94.127:39458 closed by error: Queue
>> >     ax-q-axgroup-001-consumer-group-001: MessageStoreImpl::store()
>> failed:
>> >     jexception 0x0803 wmgr::enqueue() threw JERR_WMGR_ENQDISCONT:
>> Enqueued new
>> >     dtok when previous enqueue returned partly completed (state
>> ENQ_PART).
>> >     (This data_tok: id=1714315 state=NONE)
>> >
>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>
>> Not sure what the 'partly completed state' means here. Kim, any thoughts?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

I am using java client version 0.28 and qpid-cpp 1.35 version, and i see
this error on client side(producer).

2018-11-02 00:00:46,376  IoSender - /1.2.3.4:5672 INFO
o.a.q.t.n.io.IoSender - Logger.info() : Exception in thread sending to '/
1.2.3.4:5672': java.net.SocketException: Broken pipe (Write failed)
2018-11-02 00:00:46,377  IoReceiver - /1.2.3.4:5672 ERROR
o.a.q.c.AMQConnectionDelegate_0_10 - AMQConnectionDelegate_0_10.exception()
: previous exception
org.apache.qpid.transport.ConnectionException: java.net.SocketException:
Broken pipe (Write failed)
at org.apache.qpid.transport.Connection.exception(Connection.java:546)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.Assembler.exception(Assembler.java:107)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.InputHandler.exception(InputHandler.java:199)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:217)
~[qpid-common-0.28.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.apache.qpid.transport.SenderException:
java.net.SocketException: Broken pipe (Write failed)
at org.apache.qpid.transport.network.io.IoSender.close(IoSender.java:229)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.io.IoSender.close(IoSender.java:199)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.Disassembler.close(Disassembler.java:88)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionDelegate.sendConnectionCloseOkAndCloseSender(ConnectionDelegate.java:82)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionDelegate.connectionClose(ConnectionDelegate.java:74)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionDelegate.connectionClose(ConnectionDelegate.java:40)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionClose.dispatch(ConnectionClose.java:91)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionDelegate.control(ConnectionDelegate.java:49)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.ConnectionDelegate.control(ConnectionDelegate.java:40)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.Method.delegate(Method.java:163)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.Connection.received(Connection.java:392)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.Connection.received(Connection.java:62)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:183)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:189)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:105)
~[qpid-common-0.28.jar:na]
at
org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:44)
~[qpid-common-0.28.jar:na]
at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:161)
~[qpid-common-0.28.jar:na]
... 1 common frames omitted
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_121]
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
~[na:1.8.0_121]
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
~[na:1.8.0_121]
at org.apache.qpid.transport.network.io.IoSender.run(IoSender.java:308)
~[qpid-common-0.28.jar:na]
... 1 common frames omitted

Thanks,
Ram

On Sun, Nov 4, 2018 at 7:50 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Hi,
>
> Any one else saw this error before?  after this error broker stop taking
> any messages, not sure what is causing this error.
>
> Thanks,
> Ram
>
> On Fri, Nov 2, 2018 at 4:24 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Kim/Gordon,
>>
>> After this message broker is not accepting any more messages and keep
>> throwing this message.
>>
>> Thanks,
>> Ram
>>
>> On Fri, Nov 2, 2018 at 8:59 AM rammohan ganapavarapu <
>> rammohanganap@gmail.com> wrote:
>>
>>> Any help in understating this error message would be appreciated.
>>>
>>> Ram
>>>
>>> On Wed, Oct 31, 2018 at 5:47 AM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Kim,
>>>>
>>>> Any idea about this error?
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>> On Tue, Oct 30, 2018, 2:13 PM Gordon Sim <gs...@redhat.com> wrote:
>>>>
>>>>> On 30/10/18 18:59, rammohan ganapavarapu wrote:
>>>>> > There are two more error from my original post, can some one help me
>>>>> to
>>>>> > understand when qpid throws these error?
>>>>> >
>>>>> >
>>>>> >     1. 1. 2018-10-22 08:05:30 [Broker] error Channel exception:
>>>>> >     not-attached: Channel 0 is not attached
>>>>> >
>>>>>  (/builddir/build/BUILD/qpid-cpp-1.35.0/src/qpid/amqp_0_10/SessionHandler.cpp:39)
>>>>>
>>>>> The one above is comon when you are sending asynchronously, and a
>>>>> previous message caused the session to be ended with an exception
>>>>> frame.
>>>>> Any subsequent messages that were sent before the client received the
>>>>> exception frame result in above error.
>>>>>
>>>>> >     2. 2018-10-30 14:30:36 [Broker] error Connection exception:
>>>>> >     framing-error: Queue ax-q-axgroup-001-consumer-group-001:
>>>>> >     MessageStoreImpl::store() failed: jexception 0x0803
>>>>> wmgr::enqueue() threw
>>>>> >     JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue
>>>>> returned
>>>>> >     partly completed (state ENQ_PART). (This data_tok: id=1714315
>>>>> state=NONE)
>>>>> >
>>>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>> >     3. 2018-10-30 14:30:36 [Protocol] error Connection
>>>>> >     qpid.10.68.94.134:5672-10.68.94.127:39458 closed by error: Queue
>>>>> >     ax-q-axgroup-001-consumer-group-001: MessageStoreImpl::store()
>>>>> failed:
>>>>> >     jexception 0x0803 wmgr::enqueue() threw JERR_WMGR_ENQDISCONT:
>>>>> Enqueued new
>>>>> >     dtok when previous enqueue returned partly completed (state
>>>>> ENQ_PART).
>>>>> >     (This data_tok: id=1714315 state=NONE)
>>>>> >
>>>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>>>
>>>>> Not sure what the 'partly completed state' means here. Kim, any
>>>>> thoughts?
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Hi,

Any one else saw this error before?  after this error broker stop taking
any messages, not sure what is causing this error.

Thanks,
Ram

On Fri, Nov 2, 2018 at 4:24 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim/Gordon,
>
> After this message broker is not accepting any more messages and keep
> throwing this message.
>
> Thanks,
> Ram
>
> On Fri, Nov 2, 2018 at 8:59 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Any help in understating this error message would be appreciated.
>>
>> Ram
>>
>> On Wed, Oct 31, 2018 at 5:47 AM rammohan ganapavarapu <
>> rammohanganap@gmail.com> wrote:
>>
>>> Kim,
>>>
>>> Any idea about this error?
>>>
>>> Thanks,
>>> Ram
>>>
>>> On Tue, Oct 30, 2018, 2:13 PM Gordon Sim <gs...@redhat.com> wrote:
>>>
>>>> On 30/10/18 18:59, rammohan ganapavarapu wrote:
>>>> > There are two more error from my original post, can some one help me
>>>> to
>>>> > understand when qpid throws these error?
>>>> >
>>>> >
>>>> >     1. 1. 2018-10-22 08:05:30 [Broker] error Channel exception:
>>>> >     not-attached: Channel 0 is not attached
>>>> >
>>>>  (/builddir/build/BUILD/qpid-cpp-1.35.0/src/qpid/amqp_0_10/SessionHandler.cpp:39)
>>>>
>>>> The one above is comon when you are sending asynchronously, and a
>>>> previous message caused the session to be ended with an exception
>>>> frame.
>>>> Any subsequent messages that were sent before the client received the
>>>> exception frame result in above error.
>>>>
>>>> >     2. 2018-10-30 14:30:36 [Broker] error Connection exception:
>>>> >     framing-error: Queue ax-q-axgroup-001-consumer-group-001:
>>>> >     MessageStoreImpl::store() failed: jexception 0x0803
>>>> wmgr::enqueue() threw
>>>> >     JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue
>>>> returned
>>>> >     partly completed (state ENQ_PART). (This data_tok: id=1714315
>>>> state=NONE)
>>>> >
>>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>> >     3. 2018-10-30 14:30:36 [Protocol] error Connection
>>>> >     qpid.10.68.94.134:5672-10.68.94.127:39458 closed by error: Queue
>>>> >     ax-q-axgroup-001-consumer-group-001: MessageStoreImpl::store()
>>>> failed:
>>>> >     jexception 0x0803 wmgr::enqueue() threw JERR_WMGR_ENQDISCONT:
>>>> Enqueued new
>>>> >     dtok when previous enqueue returned partly completed (state
>>>> ENQ_PART).
>>>> >     (This data_tok: id=1714315 state=NONE)
>>>> >
>>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>>
>>>> Not sure what the 'partly completed state' means here. Kim, any
>>>> thoughts?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>
>>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

Any tools to read or dump messages from journal files?

Thanks,
Ram

On Fri, Dec 7, 2018 at 11:32 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim,
>
> We have only one main queue and one dead letter queue, and we have 10
> produces and 12 consumers and producers pump 1k messages/sec. Below is
> qpid-stat -q output when it stopped taking any more messages.
>
> bash-4.1# qpid-stat -q
> Queues
>   queue                                     dur  autoDel  excl  msg
>  msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
>
> =========================================================================================================================
>   ax-q-axgroup-001-consumer-group-001       Y                      0
>  114k   114k      0   5.88g    5.88g       11     2
>   ax-q-axgroup-001-consumer-group-001-dl    Y                      0
>  0      0       0      0        0         0     2
>   d72e183e-f0df-457c-89a7-81a2cff509c8:0.0       Y        Y        0
>  0      0       0      0        0         1     2
>
>
> Thanks,
> Ram
>
>
>
>
> On Fri, Dec 7, 2018 at 9:00 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> This looks like a bug to me, and that is why I am keen to see a
>> reproducer if you can find one. How many queues are there? How many
>> producers and consumers are there for each queue? How are the consumers
>> working? Are they configured as listeners, or do they poll for new
>> messages? How frequently? How long does it take under these conditions
>> for the error to occur typically? If I can get some kind of idea what
>> the runtime conditions are, it will give me some idea where to look.
>>
>> If you set the broker to use INFO+ logging (log-enable=info+), then you
>> should see some detail about the starting and recovery of the store when
>> the broker starts, which should include this info. The store settings in
>> the config file are global, so when you set a particular buffer
>> configuration, all queues will use this. It should be reported during
>> startup when using INFO+ level logging. Watch your log size, however, as
>> using this level will make the logs big.
>>
>> On 12/5/18 5:06 PM, rammohan ganapavarapu wrote:
>> > Kim,
>> >
>> > We have set wcache-page-size=128 in qpidd.conf, restarted broker and let
>> > client recreated the queues fresh, we still getting this error, how do
>> we
>> > find if queues created by client actually have this
>> wcache-page-size=128?
>> >
>> > 2018-12-05 21:18:16 [Protocol] error Connection
>> > qpid.<server>:5672-<client>:17769 closed by error: Queue <queue-name>:
>> > MessageStoreImpl::store() failed: jexception 0x0803 wmgr::enqueue()
>> threw
>> > JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue returned
>> > partly completed (state ENQ_PART). (This data_tok: id=456535 state=NONE)
>> >
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>> >
>> > Thanks,
>> > Ram
>> >
>> >
>> >
>> > On Tue, Dec 4, 2018 at 8:18 AM rammohan ganapavarapu <
>> > rammohanganap@gmail.com> wrote:
>> >
>> >> Kim,
>> >>
>> >> Thank you, i will play with that setting, please let me know if any
>> other
>> >> tunings will help.
>> >>
>> >> Ram
>> >>
>> >> On Wed, Nov 28, 2018 at 8:04 AM Kim van der Riet <kv...@redhat.com>
>> >> wrote:
>> >>
>> >>> The answer to your first question depends on what is more important to
>> >>> you - low latency or high throughput. Messages to be persisted will
>> >>> accumulate in a buffer page until it is full or until a timer is
>> >>> triggered, then it will be written to disk. It is not until this
>> happens
>> >>> that the message will be acknowledged by the broker. If low latency is
>> >>> important, then having smaller but more numerous buffer pages will
>> mean
>> >>> the messages will not wait for very long before being written to disk
>> >>> and acknowledged as received. However this occurs at the cost of some
>> >>> efficiency, which can affect throughput. If you have large volumes of
>> >>> messages and the throughput is more important, then using fewer but
>> >>> larger buffer pages will help you.
>> >>>
>> >>> Be aware, however, that the product of the size and number of pages is
>> >>> the total memory that will be consumed and held by the broker for
>> >>> buffering *per queue*. If you have a very large number of queues, then
>> >>> you must watch out that you don't over-size your write buffers or else
>> >>> you will run out of memory.
>> >>>
>> >>> While I cannot give you specific answers, as these depend on your
>> >>> performance priorities, I suggest some trial-and-error if you want to
>> >>> adjust these values.
>> >>>
>> >>> The Transaction Prepared List (TPL) is a special global queue for
>> >>> persisting transaction boundaries. As this info is usually small and
>> >>> relatively infrequent, the tpl-* settings apply to this queue only and
>> >>> the user has the option to use different values than the regular
>> queues.
>> >>> If you don't use transactions, then this can be ignored. It is not a
>> >>> queue that can be written to directly, but the store creates its own
>> >>> data that is saved in this queue. Adjusting the tpl-* settings depends
>> >>> only on the frequency of transactions in the user's application or
>> >>> use-case.
>> >>>
>> >>> Hope that helps,
>> >>>
>> >>> Kim van der Riet
>> >>>
>> >>> On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
>> >>>> Kim,
>> >>>>
>> >>>> 1. My message size is around 80kb, so what would be suggested values
>> for
>> >>>> the blow properties?
>> >>>>
>> >>>>
>> >>>> wcache-page-size
>> >>>> wcache-num-pages
>> >>>> tpl-wcache-num-pages
>> >>>> tpl-wcache-page-size
>> >>>>
>> >>>> right now i have all defaults, so i am trying to see if i can tune
>> these
>> >>>> values for my messages size to avoid those AIO busy cases.  I have
>> try
>> >>> to
>> >>>> define those properties/options in qpidd.conf file but when i run
>> >>>> qpid-config queues its not showing those values on my queues created
>> by
>> >>>> client application, do i have to define those options when i create
>> >>> queue
>> >>>> instead of keep them in qpidd.conf?
>> >>>>
>> >>>> 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
>> >>>>
>> >>>> Thanks,
>> >>>> Ram
>> >>>>
>> >>>> On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <
>> kvanderr@redhat.com>
>> >>>> wrote:
>> >>>>
>> >>>>> There is little documentation on linearstore. Certainly, the Apache
>> >>> docs
>> >>>>> don't contain much. I think this is an oversight, but it won't get
>> >>> fixed
>> >>>>> anytime soon.
>> >>>>>
>> >>>>> Kim
>> >>>>>
>> >>>>> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
>> >>>>>> Any one point me to the doc where i can read internals about how
>> >>>>>> linearstore works and how qpid uses it?
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Ram
>> >>>>>>
>> >>>>>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
>> >>>>>> rammohanganap@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Kim,
>> >>>>>>>
>> >>>>>>> Thanks for clearing that up for me, does it support SAN storage
>> >>> blocks.
>> >>>>>>> Where can i read more about linearstore if i want to know the low
>> >>> level
>> >>>>>>> internals?
>> >>>>>>>
>> >>>>>>> Ram
>> >>>>>>>
>> >>>>>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <
>> >>> kvanderr@redhat.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> The linearstore relies on using libaio for its async disk writes.
>> >>> The
>> >>>>>>>> O_DIRECT flag is used, and this requires a block of aligned
>> memory
>> >>> to
>> >>>>>>>> serve as a memory buffer for disk write operations. To my
>> knowledge,
>> >>>>>>>> this technique only works with local disks and controllers. NFS
>> does
>> >>>>> not
>> >>>>>>>> allow for DMA memory writes to disk AFAIK, and for as long as I
>> can
>> >>>>>>>> remember, has been a problem for the linearstore. With some work
>> it
>> >>>>>>>> might be possible to make it work using another write technique
>> >>> though.
>> >>>>>>>> NFS has never been a "supported" medium for linearstore.
>> >>>>>>>>
>> >>>>>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>> >>>>>>>>> But how does NFS will cause this issue, i am interested to see
>> >>> because
>> >>>>>>>> we
>> >>>>>>>>> are using NFS (V4 version) in some environments, so wanted to
>> learn
>> >>>>>>>> tunings
>> >>>>>>>>> when we use NFS.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Ram
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>> >>>>>>>>> rammohanganap@gmail.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>> Ram
>> >>>>>>>>>>
>> >>>>>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>> >>>>>>>>>>>> I was wrong about the NFS for qpid journal files, looks like
>> >>> they
>> >>>>>>>> are on
>> >>>>>>>>>>>> NFS, so does NFS cause this issue?
>> >>>>>>>>>>> Yes, I believe it does. What version of NFS are you using?
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>
>> >>>>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

We have only one main queue and one dead letter queue, and we have 10
produces and 12 consumers and producers pump 1k messages/sec. Below is
qpid-stat -q output when it stopped taking any more messages.

bash-4.1# qpid-stat -q
Queues
  queue                                     dur  autoDel  excl  msg
 msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind

=========================================================================================================================
  ax-q-axgroup-001-consumer-group-001       Y                      0
 114k   114k      0   5.88g    5.88g       11     2
  ax-q-axgroup-001-consumer-group-001-dl    Y                      0     0
    0       0      0        0         0     2
  d72e183e-f0df-457c-89a7-81a2cff509c8:0.0       Y        Y        0     0
    0       0      0        0         1     2


Thanks,
Ram




On Fri, Dec 7, 2018 at 9:00 AM Kim van der Riet <kv...@redhat.com> wrote:

> This looks like a bug to me, and that is why I am keen to see a
> reproducer if you can find one. How many queues are there? How many
> producers and consumers are there for each queue? How are the consumers
> working? Are they configured as listeners, or do they poll for new
> messages? How frequently? How long does it take under these conditions
> for the error to occur typically? If I can get some kind of idea what
> the runtime conditions are, it will give me some idea where to look.
>
> If you set the broker to use INFO+ logging (log-enable=info+), then you
> should see some detail about the starting and recovery of the store when
> the broker starts, which should include this info. The store settings in
> the config file are global, so when you set a particular buffer
> configuration, all queues will use this. It should be reported during
> startup when using INFO+ level logging. Watch your log size, however, as
> using this level will make the logs big.
>
> On 12/5/18 5:06 PM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > We have set wcache-page-size=128 in qpidd.conf, restarted broker and let
> > client recreated the queues fresh, we still getting this error, how do we
> > find if queues created by client actually have this wcache-page-size=128?
> >
> > 2018-12-05 21:18:16 [Protocol] error Connection
> > qpid.<server>:5672-<client>:17769 closed by error: Queue <queue-name>:
> > MessageStoreImpl::store() failed: jexception 0x0803 wmgr::enqueue() threw
> > JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue returned
> > partly completed (state ENQ_PART). (This data_tok: id=456535 state=NONE)
> >
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >
> > Thanks,
> > Ram
> >
> >
> >
> > On Tue, Dec 4, 2018 at 8:18 AM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> >> Kim,
> >>
> >> Thank you, i will play with that setting, please let me know if any
> other
> >> tunings will help.
> >>
> >> Ram
> >>
> >> On Wed, Nov 28, 2018 at 8:04 AM Kim van der Riet <kv...@redhat.com>
> >> wrote:
> >>
> >>> The answer to your first question depends on what is more important to
> >>> you - low latency or high throughput. Messages to be persisted will
> >>> accumulate in a buffer page until it is full or until a timer is
> >>> triggered, then it will be written to disk. It is not until this
> happens
> >>> that the message will be acknowledged by the broker. If low latency is
> >>> important, then having smaller but more numerous buffer pages will mean
> >>> the messages will not wait for very long before being written to disk
> >>> and acknowledged as received. However this occurs at the cost of some
> >>> efficiency, which can affect throughput. If you have large volumes of
> >>> messages and the throughput is more important, then using fewer but
> >>> larger buffer pages will help you.
> >>>
> >>> Be aware, however, that the product of the size and number of pages is
> >>> the total memory that will be consumed and held by the broker for
> >>> buffering *per queue*. If you have a very large number of queues, then
> >>> you must watch out that you don't over-size your write buffers or else
> >>> you will run out of memory.
> >>>
> >>> While I cannot give you specific answers, as these depend on your
> >>> performance priorities, I suggest some trial-and-error if you want to
> >>> adjust these values.
> >>>
> >>> The Transaction Prepared List (TPL) is a special global queue for
> >>> persisting transaction boundaries. As this info is usually small and
> >>> relatively infrequent, the tpl-* settings apply to this queue only and
> >>> the user has the option to use different values than the regular
> queues.
> >>> If you don't use transactions, then this can be ignored. It is not a
> >>> queue that can be written to directly, but the store creates its own
> >>> data that is saved in this queue. Adjusting the tpl-* settings depends
> >>> only on the frequency of transactions in the user's application or
> >>> use-case.
> >>>
> >>> Hope that helps,
> >>>
> >>> Kim van der Riet
> >>>
> >>> On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
> >>>> Kim,
> >>>>
> >>>> 1. My message size is around 80kb, so what would be suggested values
> for
> >>>> the blow properties?
> >>>>
> >>>>
> >>>> wcache-page-size
> >>>> wcache-num-pages
> >>>> tpl-wcache-num-pages
> >>>> tpl-wcache-page-size
> >>>>
> >>>> right now i have all defaults, so i am trying to see if i can tune
> these
> >>>> values for my messages size to avoid those AIO busy cases.  I have try
> >>> to
> >>>> define those properties/options in qpidd.conf file but when i run
> >>>> qpid-config queues its not showing those values on my queues created
> by
> >>>> client application, do i have to define those options when i create
> >>> queue
> >>>> instead of keep them in qpidd.conf?
> >>>>
> >>>> 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
> >>>>
> >>>> Thanks,
> >>>> Ram
> >>>>
> >>>> On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kvanderr@redhat.com
> >
> >>>> wrote:
> >>>>
> >>>>> There is little documentation on linearstore. Certainly, the Apache
> >>> docs
> >>>>> don't contain much. I think this is an oversight, but it won't get
> >>> fixed
> >>>>> anytime soon.
> >>>>>
> >>>>> Kim
> >>>>>
> >>>>> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
> >>>>>> Any one point me to the doc where i can read internals about how
> >>>>>> linearstore works and how qpid uses it?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ram
> >>>>>>
> >>>>>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
> >>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>
> >>>>>>> Kim,
> >>>>>>>
> >>>>>>> Thanks for clearing that up for me, does it support SAN storage
> >>> blocks.
> >>>>>>> Where can i read more about linearstore if i want to know the low
> >>> level
> >>>>>>> internals?
> >>>>>>>
> >>>>>>> Ram
> >>>>>>>
> >>>>>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <
> >>> kvanderr@redhat.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> The linearstore relies on using libaio for its async disk writes.
> >>> The
> >>>>>>>> O_DIRECT flag is used, and this requires a block of aligned memory
> >>> to
> >>>>>>>> serve as a memory buffer for disk write operations. To my
> knowledge,
> >>>>>>>> this technique only works with local disks and controllers. NFS
> does
> >>>>> not
> >>>>>>>> allow for DMA memory writes to disk AFAIK, and for as long as I
> can
> >>>>>>>> remember, has been a problem for the linearstore. With some work
> it
> >>>>>>>> might be possible to make it work using another write technique
> >>> though.
> >>>>>>>> NFS has never been a "supported" medium for linearstore.
> >>>>>>>>
> >>>>>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> >>>>>>>>> But how does NFS will cause this issue, i am interested to see
> >>> because
> >>>>>>>> we
> >>>>>>>>> are using NFS (V4 version) in some environments, so wanted to
> learn
> >>>>>>>> tunings
> >>>>>>>>> when we use NFS.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Ram
> >>>>>>>>>
> >>>>>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> >>>>>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Ram
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> >>>>>>>>>>>> I was wrong about the NFS for qpid journal files, looks like
> >>> they
> >>>>>>>> are on
> >>>>>>>>>>>> NFS, so does NFS cause this issue?
> >>>>>>>>>>> Yes, I believe it does. What version of NFS are you using?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

This looks like a bug to me, and that is why I am keen to see a 
reproducer if you can find one. How many queues are there? How many 
producers and consumers are there for each queue? How are the consumers 
working? Are they configured as listeners, or do they poll for new 
messages? How frequently? How long does it take under these conditions 
for the error to occur typically? If I can get some kind of idea what 
the runtime conditions are, it will give me some idea where to look.

If you set the broker to use INFO+ logging (log-enable=info+), then you 
should see some detail about the starting and recovery of the store when 
the broker starts, which should include this info. The store settings in 
the config file are global, so when you set a particular buffer 
configuration, all queues will use this. It should be reported during 
startup when using INFO+ level logging. Watch your log size, however, as 
using this level will make the logs big.

On 12/5/18 5:06 PM, rammohan ganapavarapu wrote:
> Kim,
>
> We have set wcache-page-size=128 in qpidd.conf, restarted broker and let
> client recreated the queues fresh, we still getting this error, how do we
> find if queues created by client actually have this wcache-page-size=128?
>
> 2018-12-05 21:18:16 [Protocol] error Connection
> qpid.<server>:5672-<client>:17769 closed by error: Queue <queue-name>:
> MessageStoreImpl::store() failed: jexception 0x0803 wmgr::enqueue() threw
> JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue returned
> partly completed (state ENQ_PART). (This data_tok: id=456535 state=NONE)
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>
> Thanks,
> Ram
>
>
>
> On Tue, Dec 4, 2018 at 8:18 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Kim,
>>
>> Thank you, i will play with that setting, please let me know if any other
>> tunings will help.
>>
>> Ram
>>
>> On Wed, Nov 28, 2018 at 8:04 AM Kim van der Riet <kv...@redhat.com>
>> wrote:
>>
>>> The answer to your first question depends on what is more important to
>>> you - low latency or high throughput. Messages to be persisted will
>>> accumulate in a buffer page until it is full or until a timer is
>>> triggered, then it will be written to disk. It is not until this happens
>>> that the message will be acknowledged by the broker. If low latency is
>>> important, then having smaller but more numerous buffer pages will mean
>>> the messages will not wait for very long before being written to disk
>>> and acknowledged as received. However this occurs at the cost of some
>>> efficiency, which can affect throughput. If you have large volumes of
>>> messages and the throughput is more important, then using fewer but
>>> larger buffer pages will help you.
>>>
>>> Be aware, however, that the product of the size and number of pages is
>>> the total memory that will be consumed and held by the broker for
>>> buffering *per queue*. If you have a very large number of queues, then
>>> you must watch out that you don't over-size your write buffers or else
>>> you will run out of memory.
>>>
>>> While I cannot give you specific answers, as these depend on your
>>> performance priorities, I suggest some trial-and-error if you want to
>>> adjust these values.
>>>
>>> The Transaction Prepared List (TPL) is a special global queue for
>>> persisting transaction boundaries. As this info is usually small and
>>> relatively infrequent, the tpl-* settings apply to this queue only and
>>> the user has the option to use different values than the regular queues.
>>> If you don't use transactions, then this can be ignored. It is not a
>>> queue that can be written to directly, but the store creates its own
>>> data that is saved in this queue. Adjusting the tpl-* settings depends
>>> only on the frequency of transactions in the user's application or
>>> use-case.
>>>
>>> Hope that helps,
>>>
>>> Kim van der Riet
>>>
>>> On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
>>>> Kim,
>>>>
>>>> 1. My message size is around 80kb, so what would be suggested values for
>>>> the blow properties?
>>>>
>>>>
>>>> wcache-page-size
>>>> wcache-num-pages
>>>> tpl-wcache-num-pages
>>>> tpl-wcache-page-size
>>>>
>>>> right now i have all defaults, so i am trying to see if i can tune these
>>>> values for my messages size to avoid those AIO busy cases.  I have try
>>> to
>>>> define those properties/options in qpidd.conf file but when i run
>>>> qpid-config queues its not showing those values on my queues created by
>>>> client application, do i have to define those options when i create
>>> queue
>>>> instead of keep them in qpidd.conf?
>>>>
>>>> 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>> On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kv...@redhat.com>
>>>> wrote:
>>>>
>>>>> There is little documentation on linearstore. Certainly, the Apache
>>> docs
>>>>> don't contain much. I think this is an oversight, but it won't get
>>> fixed
>>>>> anytime soon.
>>>>>
>>>>> Kim
>>>>>
>>>>> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
>>>>>> Any one point me to the doc where i can read internals about how
>>>>>> linearstore works and how qpid uses it?
>>>>>>
>>>>>> Thanks,
>>>>>> Ram
>>>>>>
>>>>>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>
>>>>>>> Kim,
>>>>>>>
>>>>>>> Thanks for clearing that up for me, does it support SAN storage
>>> blocks.
>>>>>>> Where can i read more about linearstore if i want to know the low
>>> level
>>>>>>> internals?
>>>>>>>
>>>>>>> Ram
>>>>>>>
>>>>>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <
>>> kvanderr@redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The linearstore relies on using libaio for its async disk writes.
>>> The
>>>>>>>> O_DIRECT flag is used, and this requires a block of aligned memory
>>> to
>>>>>>>> serve as a memory buffer for disk write operations. To my knowledge,
>>>>>>>> this technique only works with local disks and controllers. NFS does
>>>>> not
>>>>>>>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>>>>>>>> remember, has been a problem for the linearstore. With some work it
>>>>>>>> might be possible to make it work using another write technique
>>> though.
>>>>>>>> NFS has never been a "supported" medium for linearstore.
>>>>>>>>
>>>>>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>>>>>>>>> But how does NFS will cause this issue, i am interested to see
>>> because
>>>>>>>> we
>>>>>>>>> are using NFS (V4 version) in some environments, so wanted to learn
>>>>>>>> tunings
>>>>>>>>> when we use NFS.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ram
>>>>>>>>>
>>>>>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>>>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ram
>>>>>>>>>>
>>>>>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>>>>>>>>>>
>>>>>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>>>>>>>>>>>> I was wrong about the NFS for qpid journal files, looks like
>>> they
>>>>>>>> are on
>>>>>>>>>>>> NFS, so does NFS cause this issue?
>>>>>>>>>>> Yes, I believe it does. What version of NFS are you using?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

We have set wcache-page-size=128 in qpidd.conf, restarted broker and let
client recreated the queues fresh, we still getting this error, how do we
find if queues created by client actually have this wcache-page-size=128?

2018-12-05 21:18:16 [Protocol] error Connection
qpid.<server>:5672-<client>:17769 closed by error: Queue <queue-name>:
MessageStoreImpl::store() failed: jexception 0x0803 wmgr::enqueue() threw
JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue returned
partly completed (state ENQ_PART). (This data_tok: id=456535 state=NONE)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)

Thanks,
Ram



On Tue, Dec 4, 2018 at 8:18 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim,
>
> Thank you, i will play with that setting, please let me know if any other
> tunings will help.
>
> Ram
>
> On Wed, Nov 28, 2018 at 8:04 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> The answer to your first question depends on what is more important to
>> you - low latency or high throughput. Messages to be persisted will
>> accumulate in a buffer page until it is full or until a timer is
>> triggered, then it will be written to disk. It is not until this happens
>> that the message will be acknowledged by the broker. If low latency is
>> important, then having smaller but more numerous buffer pages will mean
>> the messages will not wait for very long before being written to disk
>> and acknowledged as received. However this occurs at the cost of some
>> efficiency, which can affect throughput. If you have large volumes of
>> messages and the throughput is more important, then using fewer but
>> larger buffer pages will help you.
>>
>> Be aware, however, that the product of the size and number of pages is
>> the total memory that will be consumed and held by the broker for
>> buffering *per queue*. If you have a very large number of queues, then
>> you must watch out that you don't over-size your write buffers or else
>> you will run out of memory.
>>
>> While I cannot give you specific answers, as these depend on your
>> performance priorities, I suggest some trial-and-error if you want to
>> adjust these values.
>>
>> The Transaction Prepared List (TPL) is a special global queue for
>> persisting transaction boundaries. As this info is usually small and
>> relatively infrequent, the tpl-* settings apply to this queue only and
>> the user has the option to use different values than the regular queues.
>> If you don't use transactions, then this can be ignored. It is not a
>> queue that can be written to directly, but the store creates its own
>> data that is saved in this queue. Adjusting the tpl-* settings depends
>> only on the frequency of transactions in the user's application or
>> use-case.
>>
>> Hope that helps,
>>
>> Kim van der Riet
>>
>> On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
>> > Kim,
>> >
>> > 1. My message size is around 80kb, so what would be suggested values for
>> > the blow properties?
>> >
>> >
>> > wcache-page-size
>> > wcache-num-pages
>> > tpl-wcache-num-pages
>> > tpl-wcache-page-size
>> >
>> > right now i have all defaults, so i am trying to see if i can tune these
>> > values for my messages size to avoid those AIO busy cases.  I have try
>> to
>> > define those properties/options in qpidd.conf file but when i run
>> > qpid-config queues its not showing those values on my queues created by
>> > client application, do i have to define those options when i create
>> queue
>> > instead of keep them in qpidd.conf?
>> >
>> > 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
>> >
>> > Thanks,
>> > Ram
>> >
>> > On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kv...@redhat.com>
>> > wrote:
>> >
>> >> There is little documentation on linearstore. Certainly, the Apache
>> docs
>> >> don't contain much. I think this is an oversight, but it won't get
>> fixed
>> >> anytime soon.
>> >>
>> >> Kim
>> >>
>> >> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
>> >>> Any one point me to the doc where i can read internals about how
>> >>> linearstore works and how qpid uses it?
>> >>>
>> >>> Thanks,
>> >>> Ram
>> >>>
>> >>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
>> >>> rammohanganap@gmail.com> wrote:
>> >>>
>> >>>> Kim,
>> >>>>
>> >>>> Thanks for clearing that up for me, does it support SAN storage
>> blocks.
>> >>>> Where can i read more about linearstore if i want to know the low
>> level
>> >>>> internals?
>> >>>>
>> >>>> Ram
>> >>>>
>> >>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <
>> kvanderr@redhat.com>
>> >>>> wrote:
>> >>>>
>> >>>>> The linearstore relies on using libaio for its async disk writes.
>> The
>> >>>>> O_DIRECT flag is used, and this requires a block of aligned memory
>> to
>> >>>>> serve as a memory buffer for disk write operations. To my knowledge,
>> >>>>> this technique only works with local disks and controllers. NFS does
>> >> not
>> >>>>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>> >>>>> remember, has been a problem for the linearstore. With some work it
>> >>>>> might be possible to make it work using another write technique
>> though.
>> >>>>> NFS has never been a "supported" medium for linearstore.
>> >>>>>
>> >>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>> >>>>>> But how does NFS will cause this issue, i am interested to see
>> because
>> >>>>> we
>> >>>>>> are using NFS (V4 version) in some environments, so wanted to learn
>> >>>>> tunings
>> >>>>>> when we use NFS.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Ram
>> >>>>>>
>> >>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>> >>>>>> rammohanganap@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Ram
>> >>>>>>>
>> >>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>> >>>>>>>
>> >>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>> >>>>>>>>> I was wrong about the NFS for qpid journal files, looks like
>> they
>> >>>>> are on
>> >>>>>>>>> NFS, so does NFS cause this issue?
>> >>>>>>>> Yes, I believe it does. What version of NFS are you using?
>> >>>>>>>>
>> >>>>>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>
>> >>>>>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >> For additional commands, e-mail: users-help@qpid.apache.org
>> >>
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

Thank you, i will play with that setting, please let me know if any other
tunings will help.

Ram

On Wed, Nov 28, 2018 at 8:04 AM Kim van der Riet <kv...@redhat.com>
wrote:

> The answer to your first question depends on what is more important to
> you - low latency or high throughput. Messages to be persisted will
> accumulate in a buffer page until it is full or until a timer is
> triggered, then it will be written to disk. It is not until this happens
> that the message will be acknowledged by the broker. If low latency is
> important, then having smaller but more numerous buffer pages will mean
> the messages will not wait for very long before being written to disk
> and acknowledged as received. However this occurs at the cost of some
> efficiency, which can affect throughput. If you have large volumes of
> messages and the throughput is more important, then using fewer but
> larger buffer pages will help you.
>
> Be aware, however, that the product of the size and number of pages is
> the total memory that will be consumed and held by the broker for
> buffering *per queue*. If you have a very large number of queues, then
> you must watch out that you don't over-size your write buffers or else
> you will run out of memory.
>
> While I cannot give you specific answers, as these depend on your
> performance priorities, I suggest some trial-and-error if you want to
> adjust these values.
>
> The Transaction Prepared List (TPL) is a special global queue for
> persisting transaction boundaries. As this info is usually small and
> relatively infrequent, the tpl-* settings apply to this queue only and
> the user has the option to use different values than the regular queues.
> If you don't use transactions, then this can be ignored. It is not a
> queue that can be written to directly, but the store creates its own
> data that is saved in this queue. Adjusting the tpl-* settings depends
> only on the frequency of transactions in the user's application or
> use-case.
>
> Hope that helps,
>
> Kim van der Riet
>
> On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > 1. My message size is around 80kb, so what would be suggested values for
> > the blow properties?
> >
> >
> > wcache-page-size
> > wcache-num-pages
> > tpl-wcache-num-pages
> > tpl-wcache-page-size
> >
> > right now i have all defaults, so i am trying to see if i can tune these
> > values for my messages size to avoid those AIO busy cases.  I have try to
> > define those properties/options in qpidd.conf file but when i run
> > qpid-config queues its not showing those values on my queues created by
> > client application, do i have to define those options when i create queue
> > instead of keep them in qpidd.conf?
> >
> > 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
> >
> > Thanks,
> > Ram
> >
> > On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kv...@redhat.com>
> > wrote:
> >
> >> There is little documentation on linearstore. Certainly, the Apache docs
> >> don't contain much. I think this is an oversight, but it won't get fixed
> >> anytime soon.
> >>
> >> Kim
> >>
> >> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
> >>> Any one point me to the doc where i can read internals about how
> >>> linearstore works and how qpid uses it?
> >>>
> >>> Thanks,
> >>> Ram
> >>>
> >>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
> >>> rammohanganap@gmail.com> wrote:
> >>>
> >>>> Kim,
> >>>>
> >>>> Thanks for clearing that up for me, does it support SAN storage
> blocks.
> >>>> Where can i read more about linearstore if i want to know the low
> level
> >>>> internals?
> >>>>
> >>>> Ram
> >>>>
> >>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kvanderr@redhat.com
> >
> >>>> wrote:
> >>>>
> >>>>> The linearstore relies on using libaio for its async disk writes. The
> >>>>> O_DIRECT flag is used, and this requires a block of aligned memory to
> >>>>> serve as a memory buffer for disk write operations. To my knowledge,
> >>>>> this technique only works with local disks and controllers. NFS does
> >> not
> >>>>> allow for DMA memory writes to disk AFAIK, and for as long as I can
> >>>>> remember, has been a problem for the linearstore. With some work it
> >>>>> might be possible to make it work using another write technique
> though.
> >>>>> NFS has never been a "supported" medium for linearstore.
> >>>>>
> >>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> >>>>>> But how does NFS will cause this issue, i am interested to see
> because
> >>>>> we
> >>>>>> are using NFS (V4 version) in some environments, so wanted to learn
> >>>>> tunings
> >>>>>> when we use NFS.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Ram
> >>>>>>
> >>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> >>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>
> >>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ram
> >>>>>>>
> >>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
> >>>>>>>
> >>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> >>>>>>>>> I was wrong about the NFS for qpid journal files, looks like they
> >>>>> are on
> >>>>>>>>> NFS, so does NFS cause this issue?
> >>>>>>>> Yes, I believe it does. What version of NFS are you using?
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >> For additional commands, e-mail: users-help@qpid.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

The answer to your first question depends on what is more important to 
you - low latency or high throughput. Messages to be persisted will 
accumulate in a buffer page until it is full or until a timer is 
triggered, then it will be written to disk. It is not until this happens 
that the message will be acknowledged by the broker. If low latency is 
important, then having smaller but more numerous buffer pages will mean 
the messages will not wait for very long before being written to disk 
and acknowledged as received. However this occurs at the cost of some 
efficiency, which can affect throughput. If you have large volumes of 
messages and the throughput is more important, then using fewer but 
larger buffer pages will help you.

Be aware, however, that the product of the size and number of pages is 
the total memory that will be consumed and held by the broker for 
buffering *per queue*. If you have a very large number of queues, then 
you must watch out that you don't over-size your write buffers or else 
you will run out of memory.

While I cannot give you specific answers, as these depend on your 
performance priorities, I suggest some trial-and-error if you want to 
adjust these values.

The Transaction Prepared List (TPL) is a special global queue for 
persisting transaction boundaries. As this info is usually small and 
relatively infrequent, the tpl-* settings apply to this queue only and 
the user has the option to use different values than the regular queues. 
If you don't use transactions, then this can be ignored. It is not a 
queue that can be written to directly, but the store creates its own 
data that is saved in this queue. Adjusting the tpl-* settings depends 
only on the frequency of transactions in the user's application or use-case.

Hope that helps,

Kim van der Riet

On 11/27/18 4:44 PM, rammohan ganapavarapu wrote:
> Kim,
>
> 1. My message size is around 80kb, so what would be suggested values for
> the blow properties?
>
>
> wcache-page-size
> wcache-num-pages
> tpl-wcache-num-pages
> tpl-wcache-page-size
>
> right now i have all defaults, so i am trying to see if i can tune these
> values for my messages size to avoid those AIO busy cases.  I have try to
> define those properties/options in qpidd.conf file but when i run
> qpid-config queues its not showing those values on my queues created by
> client application, do i have to define those options when i create queue
> instead of keep them in qpidd.conf?
>
> 2. What is difference b/w tpl-wcache-page-size and wcache-page-size
>
> Thanks,
> Ram
>
> On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> There is little documentation on linearstore. Certainly, the Apache docs
>> don't contain much. I think this is an oversight, but it won't get fixed
>> anytime soon.
>>
>> Kim
>>
>> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
>>> Any one point me to the doc where i can read internals about how
>>> linearstore works and how qpid uses it?
>>>
>>> Thanks,
>>> Ram
>>>
>>> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Kim,
>>>>
>>>> Thanks for clearing that up for me, does it support SAN storage blocks.
>>>> Where can i read more about linearstore if i want to know the low level
>>>> internals?
>>>>
>>>> Ram
>>>>
>>>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
>>>> wrote:
>>>>
>>>>> The linearstore relies on using libaio for its async disk writes. The
>>>>> O_DIRECT flag is used, and this requires a block of aligned memory to
>>>>> serve as a memory buffer for disk write operations. To my knowledge,
>>>>> this technique only works with local disks and controllers. NFS does
>> not
>>>>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>>>>> remember, has been a problem for the linearstore. With some work it
>>>>> might be possible to make it work using another write technique though.
>>>>> NFS has never been a "supported" medium for linearstore.
>>>>>
>>>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>>>>>> But how does NFS will cause this issue, i am interested to see because
>>>>> we
>>>>>> are using NFS (V4 version) in some environments, so wanted to learn
>>>>> tunings
>>>>>> when we use NFS.
>>>>>>
>>>>>> Thanks,
>>>>>> Ram
>>>>>>
>>>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>
>>>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ram
>>>>>>>
>>>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>>>>>>>
>>>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>>>>>>>>> I was wrong about the NFS for qpid journal files, looks like they
>>>>> are on
>>>>>>>>> NFS, so does NFS cause this issue?
>>>>>>>> Yes, I believe it does. What version of NFS are you using?
>>>>>>>>
>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

1. My message size is around 80kb, so what would be suggested values for
the blow properties?


wcache-page-size
wcache-num-pages
tpl-wcache-num-pages
tpl-wcache-page-size

right now i have all defaults, so i am trying to see if i can tune these
values for my messages size to avoid those AIO busy cases.  I have try to
define those properties/options in qpidd.conf file but when i run
qpid-config queues its not showing those values on my queues created by
client application, do i have to define those options when i create queue
instead of keep them in qpidd.conf?

2. What is difference b/w tpl-wcache-page-size and wcache-page-size

Thanks,
Ram

On Fri, Nov 16, 2018 at 9:26 AM Kim van der Riet <kv...@redhat.com>
wrote:

> There is little documentation on linearstore. Certainly, the Apache docs
> don't contain much. I think this is an oversight, but it won't get fixed
> anytime soon.
>
> Kim
>
> On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
> > Any one point me to the doc where i can read internals about how
> > linearstore works and how qpid uses it?
> >
> > Thanks,
> > Ram
> >
> > On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> >> Kim,
> >>
> >> Thanks for clearing that up for me, does it support SAN storage blocks.
> >> Where can i read more about linearstore if i want to know the low level
> >> internals?
> >>
> >> Ram
> >>
> >> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
> >> wrote:
> >>
> >>> The linearstore relies on using libaio for its async disk writes. The
> >>> O_DIRECT flag is used, and this requires a block of aligned memory to
> >>> serve as a memory buffer for disk write operations. To my knowledge,
> >>> this technique only works with local disks and controllers. NFS does
> not
> >>> allow for DMA memory writes to disk AFAIK, and for as long as I can
> >>> remember, has been a problem for the linearstore. With some work it
> >>> might be possible to make it work using another write technique though.
> >>> NFS has never been a "supported" medium for linearstore.
> >>>
> >>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> >>>> But how does NFS will cause this issue, i am interested to see because
> >>> we
> >>>> are using NFS (V4 version) in some environments, so wanted to learn
> >>> tunings
> >>>> when we use NFS.
> >>>>
> >>>> Thanks,
> >>>> Ram
> >>>>
> >>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> >>>> rammohanganap@gmail.com> wrote:
> >>>>
> >>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
> >>>>>
> >>>>> Thanks,
> >>>>> Ram
> >>>>>
> >>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
> >>>>>
> >>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> >>>>>>> I was wrong about the NFS for qpid journal files, looks like they
> >>> are on
> >>>>>>> NFS, so does NFS cause this issue?
> >>>>>> Yes, I believe it does. What version of NFS are you using?
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>
> >>>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

There is little documentation on linearstore. Certainly, the Apache docs 
don't contain much. I think this is an oversight, but it won't get fixed 
anytime soon.

Kim

On 11/16/18 12:11 PM, rammohan ganapavarapu wrote:
> Any one point me to the doc where i can read internals about how
> linearstore works and how qpid uses it?
>
> Thanks,
> Ram
>
> On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Kim,
>>
>> Thanks for clearing that up for me, does it support SAN storage blocks.
>> Where can i read more about linearstore if i want to know the low level
>> internals?
>>
>> Ram
>>
>> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
>> wrote:
>>
>>> The linearstore relies on using libaio for its async disk writes. The
>>> O_DIRECT flag is used, and this requires a block of aligned memory to
>>> serve as a memory buffer for disk write operations. To my knowledge,
>>> this technique only works with local disks and controllers. NFS does not
>>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>>> remember, has been a problem for the linearstore. With some work it
>>> might be possible to make it work using another write technique though.
>>> NFS has never been a "supported" medium for linearstore.
>>>
>>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>>>> But how does NFS will cause this issue, i am interested to see because
>>> we
>>>> are using NFS (V4 version) in some environments, so wanted to learn
>>> tunings
>>>> when we use NFS.
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>>>> rammohanganap@gmail.com> wrote:
>>>>
>>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>>>>>
>>>>> Thanks,
>>>>> Ram
>>>>>
>>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>>>>>
>>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>>>>>>> I was wrong about the NFS for qpid journal files, looks like they
>>> are on
>>>>>>> NFS, so does NFS cause this issue?
>>>>>> Yes, I believe it does. What version of NFS are you using?
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>
>>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

Actually we are using qpid as part of our application, and customer is
using my application. They are facing this issue and it is happening to
them intermittently but still don't know at what scenario it is happening.
I was trying the same application with NFS but still couldn't reproduce it.
We took tcpdump and i see that tcp trace doesn't have full message and it
is getting truncate before broker getting close the tcp connection.

Thanks,
Ram

On Fri, Nov 16, 2018 at 8:33 AM Kim van der Riet <kv...@redhat.com>
wrote:

> Did you find a reproducer at all?
>
> Kim
>
> On 11/12/18 11:43 AM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > Thanks for clearing that up for me, does it support SAN storage blocks.
> > Where can i read more about linearstore if i want to know the low level
> > internals?
> >
> > Ram
> >
> > On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
> > wrote:
> >
> >> The linearstore relies on using libaio for its async disk writes. The
> >> O_DIRECT flag is used, and this requires a block of aligned memory to
> >> serve as a memory buffer for disk write operations. To my knowledge,
> >> this technique only works with local disks and controllers. NFS does not
> >> allow for DMA memory writes to disk AFAIK, and for as long as I can
> >> remember, has been a problem for the linearstore. With some work it
> >> might be possible to make it work using another write technique though.
> >> NFS has never been a "supported" medium for linearstore.
> >>
> >> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> >>> But how does NFS will cause this issue, i am interested to see because
> we
> >>> are using NFS (V4 version) in some environments, so wanted to learn
> >> tunings
> >>> when we use NFS.
> >>>
> >>> Thanks,
> >>> Ram
> >>>
> >>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> >>> rammohanganap@gmail.com> wrote:
> >>>
> >>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
> >>>>
> >>>> Thanks,
> >>>> Ram
> >>>>
> >>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
> >>>>
> >>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> >>>>>> I was wrong about the NFS for qpid journal files, looks like they
> are
> >> on
> >>>>>> NFS, so does NFS cause this issue?
> >>>>> Yes, I believe it does. What version of NFS are you using?
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >> For additional commands, e-mail: users-help@qpid.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

Did you find a reproducer at all?

Kim

On 11/12/18 11:43 AM, rammohan ganapavarapu wrote:
> Kim,
>
> Thanks for clearing that up for me, does it support SAN storage blocks.
> Where can i read more about linearstore if i want to know the low level
> internals?
>
> Ram
>
> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> The linearstore relies on using libaio for its async disk writes. The
>> O_DIRECT flag is used, and this requires a block of aligned memory to
>> serve as a memory buffer for disk write operations. To my knowledge,
>> this technique only works with local disks and controllers. NFS does not
>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>> remember, has been a problem for the linearstore. With some work it
>> might be possible to make it work using another write technique though.
>> NFS has never been a "supported" medium for linearstore.
>>
>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>>> But how does NFS will cause this issue, i am interested to see because we
>>> are using NFS (V4 version) in some environments, so wanted to learn
>> tunings
>>> when we use NFS.
>>>
>>> Thanks,
>>> Ram
>>>
>>> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>>>>
>>>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>>>>>> I was wrong about the NFS for qpid journal files, looks like they are
>> on
>>>>>> NFS, so does NFS cause this issue?
>>>>> Yes, I believe it does. What version of NFS are you using?
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Any one point me to the doc where i can read internals about how
linearstore works and how qpid uses it?

Thanks,
Ram

On Mon, Nov 12, 2018 at 8:43 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim,
>
> Thanks for clearing that up for me, does it support SAN storage blocks.
> Where can i read more about linearstore if i want to know the low level
> internals?
>
> Ram
>
> On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> The linearstore relies on using libaio for its async disk writes. The
>> O_DIRECT flag is used, and this requires a block of aligned memory to
>> serve as a memory buffer for disk write operations. To my knowledge,
>> this technique only works with local disks and controllers. NFS does not
>> allow for DMA memory writes to disk AFAIK, and for as long as I can
>> remember, has been a problem for the linearstore. With some work it
>> might be possible to make it work using another write technique though.
>> NFS has never been a "supported" medium for linearstore.
>>
>> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
>> > But how does NFS will cause this issue, i am interested to see because
>> we
>> > are using NFS (V4 version) in some environments, so wanted to learn
>> tunings
>> > when we use NFS.
>> >
>> > Thanks,
>> > Ram
>> >
>> > On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
>> > rammohanganap@gmail.com> wrote:
>> >
>> >> Sorry, i thought it's NFS but it's actually SAN storage volume.
>> >>
>> >> Thanks,
>> >> Ram
>> >>
>> >> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>> >>
>> >>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>> >>>> I was wrong about the NFS for qpid journal files, looks like they
>> are on
>> >>>> NFS, so does NFS cause this issue?
>> >>> Yes, I believe it does. What version of NFS are you using?
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

Thanks for clearing that up for me, does it support SAN storage blocks.
Where can i read more about linearstore if i want to know the low level
internals?

Ram

On Mon, Nov 12, 2018 at 8:32 AM Kim van der Riet <kv...@redhat.com>
wrote:

> The linearstore relies on using libaio for its async disk writes. The
> O_DIRECT flag is used, and this requires a block of aligned memory to
> serve as a memory buffer for disk write operations. To my knowledge,
> this technique only works with local disks and controllers. NFS does not
> allow for DMA memory writes to disk AFAIK, and for as long as I can
> remember, has been a problem for the linearstore. With some work it
> might be possible to make it work using another write technique though.
> NFS has never been a "supported" medium for linearstore.
>
> On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> > But how does NFS will cause this issue, i am interested to see because we
> > are using NFS (V4 version) in some environments, so wanted to learn
> tunings
> > when we use NFS.
> >
> > Thanks,
> > Ram
> >
> > On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> >> Sorry, i thought it's NFS but it's actually SAN storage volume.
> >>
> >> Thanks,
> >> Ram
> >>
> >> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
> >>
> >>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> >>>> I was wrong about the NFS for qpid journal files, looks like they are
> on
> >>>> NFS, so does NFS cause this issue?
> >>> Yes, I believe it does. What version of NFS are you using?
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

The linearstore relies on using libaio for its async disk writes. The 
O_DIRECT flag is used, and this requires a block of aligned memory to 
serve as a memory buffer for disk write operations. To my knowledge, 
this technique only works with local disks and controllers. NFS does not 
allow for DMA memory writes to disk AFAIK, and for as long as I can 
remember, has been a problem for the linearstore. With some work it 
might be possible to make it work using another write technique though. 
NFS has never been a "supported" medium for linearstore.

On 11/9/18 4:28 PM, rammohan ganapavarapu wrote:
> But how does NFS will cause this issue, i am interested to see because we
> are using NFS (V4 version) in some environments, so wanted to learn tunings
> when we use NFS.
>
> Thanks,
> Ram
>
> On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Sorry, i thought it's NFS but it's actually SAN storage volume.
>>
>> Thanks,
>> Ram
>>
>> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>>
>>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>>>> I was wrong about the NFS for qpid journal files, looks like they are on
>>>> NFS, so does NFS cause this issue?
>>> Yes, I believe it does. What version of NFS are you using?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Thank you, yes we have multiple produces and multiple consumes accessing
the same queue at same time.  My message size is around ~80k. Small
confusion regarding AIO, if its AIO it shouldn't have io wait right? why is
it waiting?

Was reading about aio and found this, i am hitting this scenario?

"AIO read and write on files opened without O_DIRECT (i.e. normal buffered
filesystem AIO). On ext2, ext3, jfs, xfs and nfs, these do not return an
explicit error, but quietly default to sync" as in  <goog_793797107>
http://lse.sourceforge.net/io/aio.html

We have enabled trace loging and here is the full exception stack, see if
it helps you to identify the issue.

2018-11-06 18:04:33 [Broker] debug clean(): 10 messages remain; head is now
0
2018-11-06 18:04:38 [Store] critical Linear Store: Journal
"ax-q-axgroup-001-consumer-group-001": get_events() returned
JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=31 pc=2 po=0 aer=1 edac=TFFF
ps=[-------------------------------A]

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:38 [System] debug Exception constructed: Queue
<queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [Broker] warning Exchange <exchange-name> cannot
deliver to  Queue <queue-name>: Queue <queue-name>:
MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [System] debug Exception constructed: Queue
<queue-name>: async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY:
dequeue while part way through another op: _enq_busy=T _abort_busy=F
_commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Broker] error Connection exception: framing-error:
Queue <queue-name>: async_dequeue() failed: jexception 0x0000
(RHM_IORES_BUSY: dequeue while part way through another op: _enq_busy=T
_abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Broker] error Connection exception: framing-error:
Queue <queue-name>: async_dequeue() failed: jexception 0x0000
(RHM_IORES_BUSY: dequeue while part way through another op: _enq_busy=T
_abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<server>:33484 closed by error: Queue <queue-name>:
async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY: dequeue while
part way through another op: _enq_busy=T _abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)(501)

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<client>:40333 closed by error: Queue <queue-name>:
async_dequeue() failed: jexception 0x0000  (RHM_IORES_BUSY: dequeue while
part way through another op: _enq_busy=T _abort_busy=F _commit_busy=F)
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1272)(501)

2018-11-06 18:04:39 [System] debug Exception constructed: Value for
replyText is too large

2018-11-06 18:04:39 [Protocol] error Connection
qpid.<server>:5672-<client>:40333 closed by error: illegal-argument: Value
for replyText is too large(320)


Thanks a lot for your help,
Ram


On Thu, Nov 8, 2018 at 11:26 AM Kim van der Riet <kv...@redhat.com>
wrote:

> Resending, did not show up on the list the first time I sent it...
>
>
>
> -------- Forwarded Message --------
> Subject:        Re: qpid-cpp-0.35 errors
> Date:   Thu, 8 Nov 2018 09:30:24 -0500
> From:   Kim van der Riet <kv...@redhat.com>
> To:     users@qpid.apache.org
>
>
>
> On 11/7/18 3:18 PM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > Ok, i am still trying to see what part of my java application is causing
> > that issue, yes that issue is happening intermittently. Regarding
> > "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
> > previous error JERR_JCNTL_AIOCMPLWAIT?
> In my mind, it is more likely the other way around. But the logs should
> tell you that. It would be best to start with a clean store before each
> test so you don't inherit issues from a previous test or run.
> >
> > Does message size contribute to this issue?
>
> Yes, but only in the sense that the size alters the packing of the write
> buffers, and the timing of when they are written. Also, the number of
> simultaneous producers and consumers will affect this. In particular,
> when two consumers are simultaneously sending messages to the same
> queue, also if a consumer is consuming from a queue while a producer is
> also sending are going to be the main factors in any race condition such
> as I suspect this is. Playing with those will give clues as to what is
> happening. You could try the following, each time starting with a clean
> store:
>
> 1. Only allowing a single producer, followed by a single consumer (ie
> not at the same time);
>
> 2. Allowing a single producer and a single consumer to operate
> simultaneously;
>
> 3. Allowing multiple producers (I don't know if your use-case has this)
> only
>
> 4. Allowing multiple consumers.
>
> Once you have isolated which scenarios cause the problem, then try
> varying the message size. The answers to these will help isolating where
> the issue is happening.
>
> >
> > Thanks,
> > Ram
> >
> > On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
> > wrote:
> >
> >> No, they are not.
> >>
> >> These two defines govern the number of sleeps and the sleep time while
> >> waiting for before throwing an exception during recovery only. They do
> >> not play a role during normal operation.
> >>
> >> If you are able to compile the broker code, you can try playing with
> >> these values. But I don't think they will make much difference to the
> >> overall problem. I think some of the other errors you have been seeing
> >> prior to this one are closer to where the real problem lies - such as
> >> the JRNL_WMGR_ENQDISCONT error.
> >>
> >> Do you have a reproducer of any kind? Does this error occur predictably
> >> under some or other conditions?
> >>
> >> Thanks,
> >>
> >> Kim van der Riet
> >>
> >> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
> >>> Kim,
> >>>
> >>> I see these two settings from code, can these be configurable?
> >>>
> >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>
> >>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
> >>>
> >>>
> >>> Ram
> >>>
> >>> On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
> >>> rammohanganap@gmail.com> wrote:
> >>>
> >>>> Thank you Kim, i will try your suggestions.
> >>>>
> >>>> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
> >> wrote:
> >>>>> This error is a linearstore issue. It looks as though there is a
> >>>>> single
> >>>>> write operation to disk that has become stuck, and is holding up all
> >>>>> further write operations. This happens because there is a fixed
> >> circular
> >>>>> pool of memory pages used for the AIO operations to disk, and when
> one
> >>>>> of these is "busy" (indicated by the A letter in the page state map),
> >>>>> write operations cannot continue until it is cleared. It it does not
> >>>>> clear within a certain time, then an exception is thrown, which
> >>>>> usually
> >>>>> results in the broker closing the connection.
> >>>>>
> >>>>> The events leading up to a "stuck" write operation are complex and
> >>>>> sometimes difficult to reproduce. If you have a reproducer, then I
> >> would
> >>>>> be interested to see it! Even so, the ability to reproduce on another
> >>>>> machine is hard as it depends on such things as disk write speed, the
> >>>>> disk controller characteristics, the number of threads in the thread
> >>>>> pool (ie CPU type), memory and other hardware-related things.
> >>>>>
> >>>>> There are two linearstore parameters that you can try playing with to
> >>>>> see if you can change the behavior of the store:
> >>>>>
> >>>>> wcache-page-size: This sets the size of each page in the write
> buffer.
> >>>>> Larger page size is good for large messages, a smaller size will help
> >> if
> >>>>> you have small messages.
> >>>>>
> >>>>> wchache-num-pages: The total number of pages in the write buffer.
> >>>>>
> >>>>> Use the --help on the broker with the linearstore loaded to see more
> >>>>> details on this. I hope that helps a little.
> >>>>>
> >>>>> Kim van der Riet
> >>>>>
> >>>>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> >>>>>> Any help in understand why/when broker throws those errors and stop
> >>>>>> receiving message would be appreciated.
> >>>>>>
> >>>>>> Not sure if any kernel tuning or broker tuning needs to be done to
> >>>>>> solve this issue.
> >>>>>>
> >>>>>> Thanks in advance,
> >>>>>> Ram
> >>>>>>
> >>>>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> >>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>
> >>>>>>> Also from this log message (store level) it seems like waiting for
> >> AIO
> >>>>> to
> >>>>>>> complete.
> >>>>>>>
> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
> "<journal
> >>>>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> >>>>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> >>>>>>> ps=[-------------------------A------]
> >>>>>>>
> >>>>>>> page_state ps=[-------------------------A------] where A is
> >>>>> AIO_PENDING
> >>>>>>> aer=1 _aio_evt_rem; ///< Remaining AIO events
> >>>>>>>
> >>>>>>> When there is or there are pending AIO, does broker close the
> >>>>> connection?
> >>>>>>> is there any tuning that can be done to resolve this?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ram
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> >>>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> I was check the code and i see these lines for that AIO timeout.
> >>>>>>>>
> >>>>>>>> case
> >>>>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
> >>>>>>>> if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
> >>>>>>>> THROW_STORE_EXCEPTION("Timeout waiting for
> >> AIO in
> >>>>>>>> MessageStoreImpl::recoverMessages()");
> >>>>>>>> ::usleep(AIO_SLEEP_TIME_US);
> >>>>>>>> break;
> >>>>>>>>
> >>>>>>>> And these are the defaults
> >>>>>>>>
> >>>>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>>>>>>
> >>>>>>>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page
> >> is
> >>>>>>>> waiting for AIO.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> So does page got blocked and its waiting for page availability?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Ram
> >>>>>>>>
> >>>>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> >>>>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after
> >>>>>>>>> that
> >>>>> we
> >>>>>>>>> see this message
> >>>>>>>>>
> >>>>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> >>>>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
> >>>>> offs=0x107680
> >>>>>>>>> (likely journal overwrite boundary); 19 filler record(s)
> required.
> >>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
> >>>>> fid=0x4605b
> >>>>>>>>> offs=0x107680
> >>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover
> >>>>>>>>> phase
> >>>>> logs
> >>>>>>>>> It worked fine for a day and started throwing this message:
> >>>>>>>>>
> >>>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
> >> "<name>":
> >>>>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
> >>>>> pi=25 pc=8
> >>>>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> >>>>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot
> >>>>>>>>> deliver
> >>>>> to
> >>>>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
> >>>>> failed:
> >>>>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
> >>>>> JERR_JCNTL_AIOCMPLWAIT:
> >>>>>>>>> Timeout waiting for AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
> >>>>> framing-error:
> >>>>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
> >>>>> 0x0202
> >>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>>>> waiting for
> >>>>>>>>> AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
> >>>>> <queue-name>:
> >>>>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
> >>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>>>> waiting for
> >>>>>>>>> AIOs to complete.
> >>>>>>>>>
> >>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
> >>>>> illegal-argument:
> >>>>>>>>> Value for replyText is too large(320)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Ram
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> >>>>>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> No, local disk.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
> >> wrote:
> >>>>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> >>>>>>>>>>>> Gordon,
> >>>>>>>>>>>>
> >>>>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35
> >>>>>>>>>>>> version
> >>>>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
> >>>>> scenario
> >>>>>>>>>>> its
> >>>>>>>>>>>> happening but after i restart broker and if we wait for few
> >>>>>>>>>>>> days
> >>>>> its
> >>>>>>>>>>>> happening again. From the above logs do you have any
> >>>>>>>>>>>> pointers to
> >>>>>>>>>>> check?
> >>>>>>>>>>>
> >>>>>>>>>>> Are you using NFS?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>
> >>>>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >> For additional commands, e-mail: users-help@qpid.apache.org
> >>
> >>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

Resending, did not show up on the list the first time I sent it...



-------- Forwarded Message --------
Subject: 	Re: qpid-cpp-0.35 errors
Date: 	Thu, 8 Nov 2018 09:30:24 -0500
From: 	Kim van der Riet <kv...@redhat.com>
To: 	users@qpid.apache.org



On 11/7/18 3:18 PM, rammohan ganapavarapu wrote:
> Kim,
>
> Ok, i am still trying to see what part of my java application is causing
> that issue, yes that issue is happening intermittently. Regarding
> "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
> previous error JERR_JCNTL_AIOCMPLWAIT?
In my mind, it is more likely the other way around. But the logs should 
tell you that. It would be best to start with a clean store before each 
test so you don't inherit issues from a previous test or run.
>
> Does message size contribute to this issue?

Yes, but only in the sense that the size alters the packing of the write 
buffers, and the timing of when they are written. Also, the number of 
simultaneous producers and consumers will affect this. In particular, 
when two consumers are simultaneously sending messages to the same 
queue, also if a consumer is consuming from a queue while a producer is 
also sending are going to be the main factors in any race condition such 
as I suspect this is. Playing with those will give clues as to what is 
happening. You could try the following, each time starting with a clean 
store:

1. Only allowing a single producer, followed by a single consumer (ie 
not at the same time);

2. Allowing a single producer and a single consumer to operate 
simultaneously;

3. Allowing multiple producers (I don't know if your use-case has this) only

4. Allowing multiple consumers.

Once you have isolated which scenarios cause the problem, then try 
varying the message size. The answers to these will help isolating where 
the issue is happening.

>
> Thanks,
> Ram
>
> On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> No, they are not.
>>
>> These two defines govern the number of sleeps and the sleep time while
>> waiting for before throwing an exception during recovery only. They do
>> not play a role during normal operation.
>>
>> If you are able to compile the broker code, you can try playing with
>> these values. But I don't think they will make much difference to the
>> overall problem. I think some of the other errors you have been seeing
>> prior to this one are closer to where the real problem lies - such as
>> the JRNL_WMGR_ENQDISCONT error.
>>
>> Do you have a reproducer of any kind? Does this error occur predictably
>> under some or other conditions?
>>
>> Thanks,
>>
>> Kim van der Riet
>>
>> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
>>> Kim,
>>>
>>> I see these two settings from code, can these be configurable?
>>>
>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>
>>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
>>>
>>>
>>> Ram
>>>
>>> On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Thank you Kim, i will try your suggestions.
>>>>
>>>> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
>> wrote:
>>>>> This error is a linearstore issue. It looks as though there is a 
>>>>> single
>>>>> write operation to disk that has become stuck, and is holding up all
>>>>> further write operations. This happens because there is a fixed
>> circular
>>>>> pool of memory pages used for the AIO operations to disk, and when one
>>>>> of these is "busy" (indicated by the A letter in the page state map),
>>>>> write operations cannot continue until it is cleared. It it does not
>>>>> clear within a certain time, then an exception is thrown, which 
>>>>> usually
>>>>> results in the broker closing the connection.
>>>>>
>>>>> The events leading up to a "stuck" write operation are complex and
>>>>> sometimes difficult to reproduce. If you have a reproducer, then I
>> would
>>>>> be interested to see it! Even so, the ability to reproduce on another
>>>>> machine is hard as it depends on such things as disk write speed, the
>>>>> disk controller characteristics, the number of threads in the thread
>>>>> pool (ie CPU type), memory and other hardware-related things.
>>>>>
>>>>> There are two linearstore parameters that you can try playing with to
>>>>> see if you can change the behavior of the store:
>>>>>
>>>>> wcache-page-size: This sets the size of each page in the write buffer.
>>>>> Larger page size is good for large messages, a smaller size will help
>> if
>>>>> you have small messages.
>>>>>
>>>>> wchache-num-pages: The total number of pages in the write buffer.
>>>>>
>>>>> Use the --help on the broker with the linearstore loaded to see more
>>>>> details on this. I hope that helps a little.
>>>>>
>>>>> Kim van der Riet
>>>>>
>>>>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>>>>>> Any help in understand why/when broker throws those errors and stop
>>>>>> receiving message would be appreciated.
>>>>>>
>>>>>> Not sure if any kernel tuning or broker tuning needs to be done to
>>>>>> solve this issue.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Ram
>>>>>>
>>>>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>
>>>>>>> Also from this log message (store level) it seems like waiting for
>> AIO
>>>>> to
>>>>>>> complete.
>>>>>>>
>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>>>>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>>>>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>>>>>>> ps=[-------------------------A------]
>>>>>>>
>>>>>>> page_state ps=[-------------------------A------] where A is
>>>>> AIO_PENDING
>>>>>>> aer=1 _aio_evt_rem; ///< Remaining AIO events
>>>>>>>
>>>>>>> When there is or there are pending AIO, does broker close the
>>>>> connection?
>>>>>>> is there any tuning that can be done to resolve this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ram
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>
>>>>>>>> I was check the code and i see these lines for that AIO timeout.
>>>>>>>>
>>>>>>>> case
>>>>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>>>>>>> if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>>>>>>> THROW_STORE_EXCEPTION("Timeout waiting for
>> AIO in
>>>>>>>> MessageStoreImpl::recoverMessages()");
>>>>>>>> ::usleep(AIO_SLEEP_TIME_US);
>>>>>>>> break;
>>>>>>>>
>>>>>>>> And these are the defaults
>>>>>>>>
>>>>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>>>>>>
>>>>>>>> #define AIO_SLEEP_TIME_US 10 // 0.01 ms
>>>>>>>>
>>>>>>>>
>>>>>>>> RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page
>> is
>>>>>>>> waiting for AIO.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So does page got blocked and its waiting for page availability?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ram
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after 
>>>>>>>>> that
>>>>> we
>>>>>>>>> see this message
>>>>>>>>>
>>>>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>>>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>>>>> offs=0x107680
>>>>>>>>> (likely journal overwrite boundary); 19 filler record(s) required.
>>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>>>>> fid=0x4605b
>>>>>>>>> offs=0x107680
>>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover 
>>>>>>>>> phase
>>>>> logs
>>>>>>>>> It worked fine for a day and started throwing this message:
>>>>>>>>>
>>>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>> "<name>":
>>>>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>>>>> pi=25 pc=8
>>>>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>>>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot 
>>>>>>>>> deliver
>>>>> to
>>>>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>>>>> failed:
>>>>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>>>>> JERR_JCNTL_AIOCMPLWAIT:
>>>>>>>>> Timeout waiting for AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>>>>> framing-error:
>>>>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>>>>> 0x0202
>>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>>>> waiting for
>>>>>>>>> AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>>>>> <queue-name>:
>>>>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>>>> waiting for
>>>>>>>>> AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>>>>> illegal-argument:
>>>>>>>>> Value for replyText is too large(320)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ram
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> No, local disk.
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
>> wrote:
>>>>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>>>>>>>>>> Gordon,
>>>>>>>>>>>>
>>>>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 
>>>>>>>>>>>> version
>>>>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>>>>> scenario
>>>>>>>>>>> its
>>>>>>>>>>>> happening but after i restart broker and if we wait for few 
>>>>>>>>>>>> days
>>>>> its
>>>>>>>>>>>> happening again. From the above logs do you have any 
>>>>>>>>>>>> pointers to
>>>>>>>>>>> check?
>>>>>>>>>>>
>>>>>>>>>>> Are you using NFS?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

On 11/7/18 3:18 PM, rammohan ganapavarapu wrote:
> Kim,
>
> Ok, i am still trying to see what part of my java application is causing
> that issue, yes that issue is happening intermittently. Regarding
> "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
> previous error JERR_JCNTL_AIOCMPLWAIT?
In my mind, it is more likely the other way around. But the logs should 
tell you that. It would be best to start with a clean store before each 
test so you don't inherit issues from a previous test or run.
>
> Does message size contribute to this issue?

Yes, but only in the sense that the size alters the packing of the write 
buffers, and the timing of when they are written. Also, the number of 
simultaneous producers and consumers will affect this. In particular, 
when two consumers are simultaneously sending messages to the same 
queue, also if a consumer is consuming from a queue while a producer is 
also sending are going to be the main factors in any race condition such 
as I suspect this is. Playing with those will give clues as to what is 
happening. You could try the following, each time starting with a clean 
store:

1. Only allowing a single producer, followed by a single consumer (ie 
not at the same time);

2. Allowing a single producer and a single consumer to operate 
simultaneously;

3. Allowing multiple producers (I don't know if your use-case has this) only

4. Allowing multiple consumers.

Once you have isolated which scenarios cause the problem, then try 
varying the message size. The answers to these will help isolating where 
the issue is happening.

>
> Thanks,
> Ram
>
> On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> No, they are not.
>>
>> These two defines govern the number of sleeps and the sleep time while
>> waiting for before throwing an exception during recovery only. They do
>> not play a role during normal operation.
>>
>> If you are able to compile the broker code, you can try playing with
>> these values. But I don't think they will make much difference to the
>> overall problem. I think some of the other errors you have been seeing
>> prior to this one are closer to where the real problem lies - such as
>> the JRNL_WMGR_ENQDISCONT error.
>>
>> Do you have a reproducer of any kind? Does this error occur predictably
>> under some or other conditions?
>>
>> Thanks,
>>
>> Kim van der Riet
>>
>> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
>>> Kim,
>>>
>>> I see these two settings from code, can these be configurable?
>>>
>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>
>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>>
>>>
>>> Ram
>>>
>>> On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Thank you Kim, i will try your suggestions.
>>>>
>>>> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
>> wrote:
>>>>> This error is a linearstore issue. It looks as though there is a single
>>>>> write operation to disk that has become stuck, and is holding up all
>>>>> further write operations. This happens because there is a fixed
>> circular
>>>>> pool of memory pages used for the AIO operations to disk, and when one
>>>>> of these is "busy" (indicated by the A letter in the  page state map),
>>>>> write operations cannot continue until it is cleared. It it does not
>>>>> clear within a certain time, then an exception is thrown, which usually
>>>>> results in the broker closing the connection.
>>>>>
>>>>> The events leading up to a "stuck" write operation are complex and
>>>>> sometimes difficult to reproduce. If you have a reproducer, then I
>> would
>>>>> be interested to see it! Even so, the ability to reproduce on another
>>>>> machine is hard as it depends on such things as disk write speed, the
>>>>> disk controller characteristics, the number of threads in the thread
>>>>> pool (ie CPU type), memory and other hardware-related things.
>>>>>
>>>>> There are two linearstore parameters that you can try playing with to
>>>>> see if you can change the behavior of the store:
>>>>>
>>>>> wcache-page-size: This sets the size of each page in the write buffer.
>>>>> Larger page size is good for large messages, a smaller size will help
>> if
>>>>> you have small messages.
>>>>>
>>>>> wchache-num-pages: The total number of pages in the write buffer.
>>>>>
>>>>> Use the --help on the broker with the linearstore loaded to see more
>>>>> details on this. I hope that helps a little.
>>>>>
>>>>> Kim van der Riet
>>>>>
>>>>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>>>>>> Any help in understand why/when broker throws those errors and stop
>>>>>> receiving message would be appreciated.
>>>>>>
>>>>>> Not sure if any kernel tuning or broker tuning needs to be done to
>>>>>> solve this issue.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Ram
>>>>>>
>>>>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>
>>>>>>> Also from this log message (store level) it seems like waiting for
>> AIO
>>>>> to
>>>>>>> complete.
>>>>>>>
>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>>>>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>>>>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>>>>>>> ps=[-------------------------A------]
>>>>>>>
>>>>>>> page_state ps=[-------------------------A------]  where A is
>>>>> AIO_PENDING
>>>>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>>>>>>>
>>>>>>> When there is or there are pending AIO, does broker close the
>>>>> connection?
>>>>>>> is there any tuning that can be done to resolve this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ram
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>
>>>>>>>> I was check the code and i see these lines for that AIO timeout.
>>>>>>>>
>>>>>>>>                  case
>>>>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>>>>>>>                    if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>>>>>>>                        THROW_STORE_EXCEPTION("Timeout waiting for
>> AIO in
>>>>>>>> MessageStoreImpl::recoverMessages()");
>>>>>>>>                    ::usleep(AIO_SLEEP_TIME_US);
>>>>>>>>                    break;
>>>>>>>>
>>>>>>>> And these are the defaults
>>>>>>>>
>>>>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>>>>>>
>>>>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>>>>>>>
>>>>>>>>
>>>>>>>>      RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page
>> is
>>>>>>>> waiting for AIO.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So does page got blocked and its waiting for page availability?
>>>>>>>>
>>>>>>>>
>>>>>>>> Ram
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that
>>>>> we
>>>>>>>>> see this message
>>>>>>>>>
>>>>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>>>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>>>>> offs=0x107680
>>>>>>>>> (likely journal overwrite boundary); 19 filler record(s) required.
>>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>>>>> fid=0x4605b
>>>>>>>>> offs=0x107680
>>>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
>>>>> logs
>>>>>>>>> It worked fine for a day and started throwing this message:
>>>>>>>>>
>>>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>> "<name>":
>>>>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>>>>> pi=25 pc=8
>>>>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>>>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver
>>>>> to
>>>>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>>>>> failed:
>>>>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>>>>> JERR_JCNTL_AIOCMPLWAIT:
>>>>>>>>> Timeout waiting for AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>>>>> framing-error:
>>>>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>>>>> 0x0202
>>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>>>> waiting for
>>>>>>>>> AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>>>>> <queue-name>:
>>>>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>>>> waiting for
>>>>>>>>> AIOs to complete.
>>>>>>>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>>>>> illegal-argument:
>>>>>>>>> Value for replyText is too large(320)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ram
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> No, local disk.
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
>> wrote:
>>>>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>>>>>>>>>> Gordon,
>>>>>>>>>>>>
>>>>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
>>>>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>>>>> scenario
>>>>>>>>>>> its
>>>>>>>>>>>> happening but after i restart broker and if we wait for few days
>>>>> its
>>>>>>>>>>>> happening again. From the above logs do you have any pointers to
>>>>>>>>>>> check?
>>>>>>>>>>>
>>>>>>>>>>> Are you using NFS?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

But how does NFS will cause this issue, i am interested to see because we
are using NFS (V4 version) in some environments, so wanted to learn tunings
when we use NFS.

Thanks,
Ram

On Fri, Nov 9, 2018 at 6:48 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Sorry, i thought it's NFS but it's actually SAN storage volume.
>
> Thanks,
> Ram
>
> On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:
>
>> On 08/11/18 16:56, rammohan ganapavarapu wrote:
>> > I was wrong about the NFS for qpid journal files, looks like they are on
>> > NFS, so does NFS cause this issue?
>>
>> Yes, I believe it does. What version of NFS are you using?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Sorry, i thought it's NFS but it's actually SAN storage volume.

Thanks,
Ram

On Fri, Nov 9, 2018, 2:10 AM Gordon Sim <gsim@redhat.com wrote:

> On 08/11/18 16:56, rammohan ganapavarapu wrote:
> > I was wrong about the NFS for qpid journal files, looks like they are on
> > NFS, so does NFS cause this issue?
>
> Yes, I believe it does. What version of NFS are you using?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Gordon Sim <gs...@redhat.com>.

On 08/11/18 16:56, rammohan ganapavarapu wrote:
> I was wrong about the NFS for qpid journal files, looks like they are on
> NFS, so does NFS cause this issue?

Yes, I believe it does. What version of NFS are you using?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Do you have any kernel (net/disk) tuning recommendations for qpid-cpp with
linearstore?

Ram

On Thu, Nov 8, 2018 at 8:56 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim/Gordon,
>
> I was wrong about the NFS for qpid journal files, looks like they are on
> NFS, so does NFS cause this issue?
>
> Ram
>
> On Wed, Nov 7, 2018 at 12:18 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Kim,
>>
>> Ok, i am still trying to see what part of my java application is causing
>> that issue, yes that issue is happening intermittently. Regarding
>> "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
>> previous error JERR_JCNTL_AIOCMPLWAIT?
>>
>> Does message size contribute to this issue?
>>
>> Thanks,
>> Ram
>>
>> On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
>> wrote:
>>
>>> No, they are not.
>>>
>>> These two defines govern the number of sleeps and the sleep time while
>>> waiting for before throwing an exception during recovery only. They do
>>> not play a role during normal operation.
>>>
>>> If you are able to compile the broker code, you can try playing with
>>> these values. But I don't think they will make much difference to the
>>> overall problem. I think some of the other errors you have been seeing
>>> prior to this one are closer to where the real problem lies - such as
>>> the JRNL_WMGR_ENQDISCONT error.
>>>
>>> Do you have a reproducer of any kind? Does this error occur predictably
>>> under some or other conditions?
>>>
>>> Thanks,
>>>
>>> Kim van der Riet
>>>
>>> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
>>> > Kim,
>>> >
>>> > I see these two settings from code, can these be configurable?
>>> >
>>> > #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>> >
>>> > #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>> >
>>> >
>>> > Ram
>>> >
>>> > On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
>>> > rammohanganap@gmail.com> wrote:
>>> >
>>> >> Thank you Kim, i will try your suggestions.
>>> >>
>>> >> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
>>> wrote:
>>> >>
>>> >>> This error is a linearstore issue. It looks as though there is a
>>> single
>>> >>> write operation to disk that has become stuck, and is holding up all
>>> >>> further write operations. This happens because there is a fixed
>>> circular
>>> >>> pool of memory pages used for the AIO operations to disk, and when
>>> one
>>> >>> of these is "busy" (indicated by the A letter in the  page state
>>> map),
>>> >>> write operations cannot continue until it is cleared. It it does not
>>> >>> clear within a certain time, then an exception is thrown, which
>>> usually
>>> >>> results in the broker closing the connection.
>>> >>>
>>> >>> The events leading up to a "stuck" write operation are complex and
>>> >>> sometimes difficult to reproduce. If you have a reproducer, then I
>>> would
>>> >>> be interested to see it! Even so, the ability to reproduce on another
>>> >>> machine is hard as it depends on such things as disk write speed, the
>>> >>> disk controller characteristics, the number of threads in the thread
>>> >>> pool (ie CPU type), memory and other hardware-related things.
>>> >>>
>>> >>> There are two linearstore parameters that you can try playing with to
>>> >>> see if you can change the behavior of the store:
>>> >>>
>>> >>> wcache-page-size: This sets the size of each page in the write
>>> buffer.
>>> >>> Larger page size is good for large messages, a smaller size will
>>> help if
>>> >>> you have small messages.
>>> >>>
>>> >>> wchache-num-pages: The total number of pages in the write buffer.
>>> >>>
>>> >>> Use the --help on the broker with the linearstore loaded to see more
>>> >>> details on this. I hope that helps a little.
>>> >>>
>>> >>> Kim van der Riet
>>> >>>
>>> >>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>>> >>>> Any help in understand why/when broker throws those errors and stop
>>> >>>> receiving message would be appreciated.
>>> >>>>
>>> >>>> Not sure if any kernel tuning or broker tuning needs to be done to
>>> >>>> solve this issue.
>>> >>>>
>>> >>>> Thanks in advance,
>>> >>>> Ram
>>> >>>>
>>> >>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>>> >>>> rammohanganap@gmail.com> wrote:
>>> >>>>
>>> >>>>> Also from this log message (store level) it seems like waiting for
>>> AIO
>>> >>> to
>>> >>>>> complete.
>>> >>>>>
>>> >>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>>> "<journal
>>> >>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>>> >>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>>> >>>>> ps=[-------------------------A------]
>>> >>>>>
>>> >>>>> page_state ps=[-------------------------A------]  where A is
>>> >>> AIO_PENDING
>>> >>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>>> >>>>>
>>> >>>>> When there is or there are pending AIO, does broker close the
>>> >>> connection?
>>> >>>>> is there any tuning that can be done to resolve this?
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>> Ram
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>>> >>>>> rammohanganap@gmail.com> wrote:
>>> >>>>>
>>> >>>>>> I was check the code and i see these lines for that AIO timeout.
>>> >>>>>>
>>> >>>>>>                 case
>>> >>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>> >>>>>>                   if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>> >>>>>>                       THROW_STORE_EXCEPTION("Timeout waiting for
>>> AIO in
>>> >>>>>> MessageStoreImpl::recoverMessages()");
>>> >>>>>>                   ::usleep(AIO_SLEEP_TIME_US);
>>> >>>>>>                   break;
>>> >>>>>>
>>> >>>>>> And these are the defaults
>>> >>>>>>
>>> >>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>> >>>>>>
>>> >>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>     RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next
>>> page is
>>> >>>>>> waiting for AIO.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> So does page got blocked and its waiting for page availability?
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Ram
>>> >>>>>>
>>> >>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>> >>>>>> rammohanganap@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after
>>> that
>>> >>> we
>>> >>>>>>> see this message
>>> >>>>>>>
>>> >>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>> >>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>>> >>> offs=0x107680
>>> >>>>>>> (likely journal overwrite boundary); 19 filler record(s)
>>> required.
>>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> >>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>>> >>> fid=0x4605b
>>> >>>>>>> offs=0x107680
>>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> >>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover
>>> phase
>>> >>> logs
>>> >>>>>>> It worked fine for a day and started throwing this message:
>>> >>>>>>>
>>> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>>> "<name>":
>>> >>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>>> >>> pi=25 pc=8
>>> >>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>> >>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot
>>> deliver
>>> >>> to
>>> >>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>>> >>> failed:
>>> >>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>>> >>> JERR_JCNTL_AIOCMPLWAIT:
>>> >>>>>>> Timeout waiting for AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> >>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>>> >>> framing-error:
>>> >>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>>> >>> 0x0202
>>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> >>> waiting for
>>> >>>>>>> AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>>> >>> <queue-name>:
>>> >>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> >>> waiting for
>>> >>>>>>> AIOs to complete.
>>> >>>>>>>
>>> >>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>>> >>> illegal-argument:
>>> >>>>>>> Value for replyText is too large(320)
>>> >>>>>>>
>>> >>>>>>> Thanks,
>>> >>>>>>> Ram
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>> >>>>>>> rammohanganap@gmail.com> wrote:
>>> >>>>>>>
>>> >>>>>>>> No, local disk.
>>> >>>>>>>>
>>> >>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
>>> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>> >>>>>>>>>> Gordon,
>>> >>>>>>>>>>
>>> >>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35
>>> version
>>> >>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>>> >>> scenario
>>> >>>>>>>>> its
>>> >>>>>>>>>> happening but after i restart broker and if we wait for few
>>> days
>>> >>> its
>>> >>>>>>>>>> happening again. From the above logs do you have any pointers
>>> to
>>> >>>>>>>>> check?
>>> >>>>>>>>>
>>> >>>>>>>>> Are you using NFS?
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>> ---------------------------------------------------------------------
>>> >>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> >>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> >>> For additional commands, e-mail: users-help@qpid.apache.org
>>> >>>
>>> >>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim/Gordon,

I was wrong about the NFS for qpid journal files, looks like they are on
NFS, so does NFS cause this issue?

Ram

On Wed, Nov 7, 2018 at 12:18 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Kim,
>
> Ok, i am still trying to see what part of my java application is causing
> that issue, yes that issue is happening intermittently. Regarding
> "JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
> previous error JERR_JCNTL_AIOCMPLWAIT?
>
> Does message size contribute to this issue?
>
> Thanks,
> Ram
>
> On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
> wrote:
>
>> No, they are not.
>>
>> These two defines govern the number of sleeps and the sleep time while
>> waiting for before throwing an exception during recovery only. They do
>> not play a role during normal operation.
>>
>> If you are able to compile the broker code, you can try playing with
>> these values. But I don't think they will make much difference to the
>> overall problem. I think some of the other errors you have been seeing
>> prior to this one are closer to where the real problem lies - such as
>> the JRNL_WMGR_ENQDISCONT error.
>>
>> Do you have a reproducer of any kind? Does this error occur predictably
>> under some or other conditions?
>>
>> Thanks,
>>
>> Kim van der Riet
>>
>> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
>> > Kim,
>> >
>> > I see these two settings from code, can these be configurable?
>> >
>> > #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>> >
>> > #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>> >
>> >
>> > Ram
>> >
>> > On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
>> > rammohanganap@gmail.com> wrote:
>> >
>> >> Thank you Kim, i will try your suggestions.
>> >>
>> >> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
>> wrote:
>> >>
>> >>> This error is a linearstore issue. It looks as though there is a
>> single
>> >>> write operation to disk that has become stuck, and is holding up all
>> >>> further write operations. This happens because there is a fixed
>> circular
>> >>> pool of memory pages used for the AIO operations to disk, and when one
>> >>> of these is "busy" (indicated by the A letter in the  page state map),
>> >>> write operations cannot continue until it is cleared. It it does not
>> >>> clear within a certain time, then an exception is thrown, which
>> usually
>> >>> results in the broker closing the connection.
>> >>>
>> >>> The events leading up to a "stuck" write operation are complex and
>> >>> sometimes difficult to reproduce. If you have a reproducer, then I
>> would
>> >>> be interested to see it! Even so, the ability to reproduce on another
>> >>> machine is hard as it depends on such things as disk write speed, the
>> >>> disk controller characteristics, the number of threads in the thread
>> >>> pool (ie CPU type), memory and other hardware-related things.
>> >>>
>> >>> There are two linearstore parameters that you can try playing with to
>> >>> see if you can change the behavior of the store:
>> >>>
>> >>> wcache-page-size: This sets the size of each page in the write buffer.
>> >>> Larger page size is good for large messages, a smaller size will help
>> if
>> >>> you have small messages.
>> >>>
>> >>> wchache-num-pages: The total number of pages in the write buffer.
>> >>>
>> >>> Use the --help on the broker with the linearstore loaded to see more
>> >>> details on this. I hope that helps a little.
>> >>>
>> >>> Kim van der Riet
>> >>>
>> >>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>> >>>> Any help in understand why/when broker throws those errors and stop
>> >>>> receiving message would be appreciated.
>> >>>>
>> >>>> Not sure if any kernel tuning or broker tuning needs to be done to
>> >>>> solve this issue.
>> >>>>
>> >>>> Thanks in advance,
>> >>>> Ram
>> >>>>
>> >>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>> >>>> rammohanganap@gmail.com> wrote:
>> >>>>
>> >>>>> Also from this log message (store level) it seems like waiting for
>> AIO
>> >>> to
>> >>>>> complete.
>> >>>>>
>> >>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>> >>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>> >>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>> >>>>> ps=[-------------------------A------]
>> >>>>>
>> >>>>> page_state ps=[-------------------------A------]  where A is
>> >>> AIO_PENDING
>> >>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>> >>>>>
>> >>>>> When there is or there are pending AIO, does broker close the
>> >>> connection?
>> >>>>> is there any tuning that can be done to resolve this?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Ram
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>> >>>>> rammohanganap@gmail.com> wrote:
>> >>>>>
>> >>>>>> I was check the code and i see these lines for that AIO timeout.
>> >>>>>>
>> >>>>>>                 case
>> >>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>> >>>>>>                   if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>> >>>>>>                       THROW_STORE_EXCEPTION("Timeout waiting for
>> AIO in
>> >>>>>> MessageStoreImpl::recoverMessages()");
>> >>>>>>                   ::usleep(AIO_SLEEP_TIME_US);
>> >>>>>>                   break;
>> >>>>>>
>> >>>>>> And these are the defaults
>> >>>>>>
>> >>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>> >>>>>>
>> >>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>> >>>>>>
>> >>>>>>
>> >>>>>>     RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next
>> page is
>> >>>>>> waiting for AIO.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> So does page got blocked and its waiting for page availability?
>> >>>>>>
>> >>>>>>
>> >>>>>> Ram
>> >>>>>>
>> >>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>> >>>>>> rammohanganap@gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after
>> that
>> >>> we
>> >>>>>>> see this message
>> >>>>>>>
>> >>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>> >>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>> >>> offs=0x107680
>> >>>>>>> (likely journal overwrite boundary); 19 filler record(s) required.
>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> >>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>> >>> fid=0x4605b
>> >>>>>>> offs=0x107680
>> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> >>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover
>> phase
>> >>> logs
>> >>>>>>> It worked fine for a day and started throwing this message:
>> >>>>>>>
>> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
>> "<name>":
>> >>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>> >>> pi=25 pc=8
>> >>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>> >>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot
>> deliver
>> >>> to
>> >>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>> >>> failed:
>> >>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>> >>> JERR_JCNTL_AIOCMPLWAIT:
>> >>>>>>> Timeout waiting for AIOs to complete.
>> >>>>>>>
>> >>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> >>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>> >>> framing-error:
>> >>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>> >>> 0x0202
>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>> >>> waiting for
>> >>>>>>> AIOs to complete.
>> >>>>>>>
>> >>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>> >>> <queue-name>:
>> >>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>> >>> waiting for
>> >>>>>>> AIOs to complete.
>> >>>>>>>
>> >>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>> >>> illegal-argument:
>> >>>>>>> Value for replyText is too large(320)
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Ram
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>> >>>>>>> rammohanganap@gmail.com> wrote:
>> >>>>>>>
>> >>>>>>>> No, local disk.
>> >>>>>>>>
>> >>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
>> wrote:
>> >>>>>>>>
>> >>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>> >>>>>>>>>> Gordon,
>> >>>>>>>>>>
>> >>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35
>> version
>> >>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>> >>> scenario
>> >>>>>>>>> its
>> >>>>>>>>>> happening but after i restart broker and if we wait for few
>> days
>> >>> its
>> >>>>>>>>>> happening again. From the above logs do you have any pointers
>> to
>> >>>>>>>>> check?
>> >>>>>>>>>
>> >>>>>>>>> Are you using NFS?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>> ---------------------------------------------------------------------
>> >>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>>>>>
>> >>>>>>>>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

Ok, i am still trying to see what part of my java application is causing
that issue, yes that issue is happening intermittently. Regarding
"JERR_WMGR_ENQDISCONT" error, may be they are chained exceptions from the
previous error JERR_JCNTL_AIOCMPLWAIT?

Does message size contribute to this issue?

Thanks,
Ram

On Wed, Nov 7, 2018 at 11:37 AM Kim van der Riet <kv...@redhat.com>
wrote:

> No, they are not.
>
> These two defines govern the number of sleeps and the sleep time while
> waiting for before throwing an exception during recovery only. They do
> not play a role during normal operation.
>
> If you are able to compile the broker code, you can try playing with
> these values. But I don't think they will make much difference to the
> overall problem. I think some of the other errors you have been seeing
> prior to this one are closer to where the real problem lies - such as
> the JRNL_WMGR_ENQDISCONT error.
>
> Do you have a reproducer of any kind? Does this error occur predictably
> under some or other conditions?
>
> Thanks,
>
> Kim van der Riet
>
> On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
> > Kim,
> >
> > I see these two settings from code, can these be configurable?
> >
> > #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >
> > #define AIO_SLEEP_TIME_US  10 // 0.01 ms
> >
> >
> > Ram
> >
> > On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> >> Thank you Kim, i will try your suggestions.
> >>
> >> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com
> wrote:
> >>
> >>> This error is a linearstore issue. It looks as though there is a single
> >>> write operation to disk that has become stuck, and is holding up all
> >>> further write operations. This happens because there is a fixed
> circular
> >>> pool of memory pages used for the AIO operations to disk, and when one
> >>> of these is "busy" (indicated by the A letter in the  page state map),
> >>> write operations cannot continue until it is cleared. It it does not
> >>> clear within a certain time, then an exception is thrown, which usually
> >>> results in the broker closing the connection.
> >>>
> >>> The events leading up to a "stuck" write operation are complex and
> >>> sometimes difficult to reproduce. If you have a reproducer, then I
> would
> >>> be interested to see it! Even so, the ability to reproduce on another
> >>> machine is hard as it depends on such things as disk write speed, the
> >>> disk controller characteristics, the number of threads in the thread
> >>> pool (ie CPU type), memory and other hardware-related things.
> >>>
> >>> There are two linearstore parameters that you can try playing with to
> >>> see if you can change the behavior of the store:
> >>>
> >>> wcache-page-size: This sets the size of each page in the write buffer.
> >>> Larger page size is good for large messages, a smaller size will help
> if
> >>> you have small messages.
> >>>
> >>> wchache-num-pages: The total number of pages in the write buffer.
> >>>
> >>> Use the --help on the broker with the linearstore loaded to see more
> >>> details on this. I hope that helps a little.
> >>>
> >>> Kim van der Riet
> >>>
> >>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> >>>> Any help in understand why/when broker throws those errors and stop
> >>>> receiving message would be appreciated.
> >>>>
> >>>> Not sure if any kernel tuning or broker tuning needs to be done to
> >>>> solve this issue.
> >>>>
> >>>> Thanks in advance,
> >>>> Ram
> >>>>
> >>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> >>>> rammohanganap@gmail.com> wrote:
> >>>>
> >>>>> Also from this log message (store level) it seems like waiting for
> AIO
> >>> to
> >>>>> complete.
> >>>>>
> >>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
> >>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> >>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> >>>>> ps=[-------------------------A------]
> >>>>>
> >>>>> page_state ps=[-------------------------A------]  where A is
> >>> AIO_PENDING
> >>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
> >>>>>
> >>>>> When there is or there are pending AIO, does broker close the
> >>> connection?
> >>>>> is there any tuning that can be done to resolve this?
> >>>>>
> >>>>> Thanks,
> >>>>> Ram
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> >>>>> rammohanganap@gmail.com> wrote:
> >>>>>
> >>>>>> I was check the code and i see these lines for that AIO timeout.
> >>>>>>
> >>>>>>                 case
> >>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
> >>>>>>                   if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
> >>>>>>                       THROW_STORE_EXCEPTION("Timeout waiting for
> AIO in
> >>>>>> MessageStoreImpl::recoverMessages()");
> >>>>>>                   ::usleep(AIO_SLEEP_TIME_US);
> >>>>>>                   break;
> >>>>>>
> >>>>>> And these are the defaults
> >>>>>>
> >>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>>>>
> >>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
> >>>>>>
> >>>>>>
> >>>>>>     RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page
> is
> >>>>>> waiting for AIO.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> So does page got blocked and its waiting for page availability?
> >>>>>>
> >>>>>>
> >>>>>> Ram
> >>>>>>
> >>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> >>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>
> >>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that
> >>> we
> >>>>>>> see this message
> >>>>>>>
> >>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> >>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
> >>> offs=0x107680
> >>>>>>> (likely journal overwrite boundary); 19 filler record(s) required.
> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
> >>> fid=0x4605b
> >>>>>>> offs=0x107680
> >>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
> >>> logs
> >>>>>>> It worked fine for a day and started throwing this message:
> >>>>>>>
> >>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal
> "<name>":
> >>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
> >>> pi=25 pc=8
> >>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> >>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver
> >>> to
> >>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
> >>> failed:
> >>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
> >>> JERR_JCNTL_AIOCMPLWAIT:
> >>>>>>> Timeout waiting for AIOs to complete.
> >>>>>>>
> >>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
> >>> framing-error:
> >>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
> >>> 0x0202
> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>> waiting for
> >>>>>>> AIOs to complete.
> >>>>>>>
> >>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
> >>> <queue-name>:
> >>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
> >>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> >>> waiting for
> >>>>>>> AIOs to complete.
> >>>>>>>
> >>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
> >>> illegal-argument:
> >>>>>>> Value for replyText is too large(320)
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Ram
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> >>>>>>> rammohanganap@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> No, local disk.
> >>>>>>>>
> >>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> >>>>>>>>>> Gordon,
> >>>>>>>>>>
> >>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
> >>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
> >>> scenario
> >>>>>>>>> its
> >>>>>>>>>> happening but after i restart broker and if we wait for few days
> >>> its
> >>>>>>>>>> happening again. From the above logs do you have any pointers to
> >>>>>>>>> check?
> >>>>>>>>>
> >>>>>>>>> Are you using NFS?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>>>>
> >>>>>>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

No, they are not.

These two defines govern the number of sleeps and the sleep time while 
waiting for before throwing an exception during recovery only. They do 
not play a role during normal operation.

If you are able to compile the broker code, you can try playing with 
these values. But I don't think they will make much difference to the 
overall problem. I think some of the other errors you have been seeing 
prior to this one are closer to where the real problem lies - such as 
the JRNL_WMGR_ENQDISCONT error.

Do you have a reproducer of any kind? Does this error occur predictably 
under some or other conditions?

Thanks,

Kim van der Riet

On 11/7/18 12:51 PM, rammohan ganapavarapu wrote:
> Kim,
>
> I see these two settings from code, can these be configurable?
>
> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>
> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>
>
> Ram
>
> On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Thank you Kim, i will try your suggestions.
>>
>> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com wrote:
>>
>>> This error is a linearstore issue. It looks as though there is a single
>>> write operation to disk that has become stuck, and is holding up all
>>> further write operations. This happens because there is a fixed circular
>>> pool of memory pages used for the AIO operations to disk, and when one
>>> of these is "busy" (indicated by the A letter in the  page state map),
>>> write operations cannot continue until it is cleared. It it does not
>>> clear within a certain time, then an exception is thrown, which usually
>>> results in the broker closing the connection.
>>>
>>> The events leading up to a "stuck" write operation are complex and
>>> sometimes difficult to reproduce. If you have a reproducer, then I would
>>> be interested to see it! Even so, the ability to reproduce on another
>>> machine is hard as it depends on such things as disk write speed, the
>>> disk controller characteristics, the number of threads in the thread
>>> pool (ie CPU type), memory and other hardware-related things.
>>>
>>> There are two linearstore parameters that you can try playing with to
>>> see if you can change the behavior of the store:
>>>
>>> wcache-page-size: This sets the size of each page in the write buffer.
>>> Larger page size is good for large messages, a smaller size will help if
>>> you have small messages.
>>>
>>> wchache-num-pages: The total number of pages in the write buffer.
>>>
>>> Use the --help on the broker with the linearstore loaded to see more
>>> details on this. I hope that helps a little.
>>>
>>> Kim van der Riet
>>>
>>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>>>> Any help in understand why/when broker throws those errors and stop
>>>> receiving message would be appreciated.
>>>>
>>>> Not sure if any kernel tuning or broker tuning needs to be done to
>>>> solve this issue.
>>>>
>>>> Thanks in advance,
>>>> Ram
>>>>
>>>> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>>>> rammohanganap@gmail.com> wrote:
>>>>
>>>>> Also from this log message (store level) it seems like waiting for AIO
>>> to
>>>>> complete.
>>>>>
>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>>>>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>>>>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>>>>> ps=[-------------------------A------]
>>>>>
>>>>> page_state ps=[-------------------------A------]  where A is
>>> AIO_PENDING
>>>>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>>>>>
>>>>> When there is or there are pending AIO, does broker close the
>>> connection?
>>>>> is there any tuning that can be done to resolve this?
>>>>>
>>>>> Thanks,
>>>>> Ram
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>>>>> rammohanganap@gmail.com> wrote:
>>>>>
>>>>>> I was check the code and i see these lines for that AIO timeout.
>>>>>>
>>>>>>                 case
>>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>>>>>                   if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>>>>>                       THROW_STORE_EXCEPTION("Timeout waiting for AIO in
>>>>>> MessageStoreImpl::recoverMessages()");
>>>>>>                   ::usleep(AIO_SLEEP_TIME_US);
>>>>>>                   break;
>>>>>>
>>>>>> And these are the defaults
>>>>>>
>>>>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>>>>
>>>>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>>>>>
>>>>>>
>>>>>>     RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
>>>>>> waiting for AIO.
>>>>>>
>>>>>>
>>>>>>
>>>>>> So does page got blocked and its waiting for page availability?
>>>>>>
>>>>>>
>>>>>> Ram
>>>>>>
>>>>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>
>>>>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that
>>> we
>>>>>>> see this message
>>>>>>>
>>>>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>>>>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>>> offs=0x107680
>>>>>>> (likely journal overwrite boundary); 19 filler record(s) required.
>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>> "<journal-name>": Recover phase write: Wrote filler record:
>>> fid=0x4605b
>>>>>>> offs=0x107680
>>>>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>>>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
>>> logs
>>>>>>> It worked fine for a day and started throwing this message:
>>>>>>>
>>>>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
>>>>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>>> pi=25 pc=8
>>>>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>>>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver
>>> to
>>>>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>>> failed:
>>>>>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>>> JERR_JCNTL_AIOCMPLWAIT:
>>>>>>> Timeout waiting for AIOs to complete.
>>>>>>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>>> framing-error:
>>>>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>>> 0x0202
>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> waiting for
>>>>>>> AIOs to complete.
>>>>>>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>>> <queue-name>:
>>>>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>>>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>>> waiting for
>>>>>>> AIOs to complete.
>>>>>>>
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>>>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>>> illegal-argument:
>>>>>>> Value for replyText is too large(320)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ram
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>>>>>> rammohanganap@gmail.com> wrote:
>>>>>>>
>>>>>>>> No, local disk.
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>>>>>>>> Gordon,
>>>>>>>>>>
>>>>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
>>>>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>>> scenario
>>>>>>>>> its
>>>>>>>>>> happening but after i restart broker and if we wait for few days
>>> its
>>>>>>>>>> happening again. From the above logs do you have any pointers to
>>>>>>>>> check?
>>>>>>>>>
>>>>>>>>> Are you using NFS?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim,

I see these two settings from code, can these be configurable?

#define MAX_AIO_SLEEPS 100000 // tot: ~1 sec

#define AIO_SLEEP_TIME_US  10 // 0.01 ms


Ram

On Wed, Nov 7, 2018 at 7:04 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Thank you Kim, i will try your suggestions.
>
> On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com wrote:
>
>> This error is a linearstore issue. It looks as though there is a single
>> write operation to disk that has become stuck, and is holding up all
>> further write operations. This happens because there is a fixed circular
>> pool of memory pages used for the AIO operations to disk, and when one
>> of these is "busy" (indicated by the A letter in the  page state map),
>> write operations cannot continue until it is cleared. It it does not
>> clear within a certain time, then an exception is thrown, which usually
>> results in the broker closing the connection.
>>
>> The events leading up to a "stuck" write operation are complex and
>> sometimes difficult to reproduce. If you have a reproducer, then I would
>> be interested to see it! Even so, the ability to reproduce on another
>> machine is hard as it depends on such things as disk write speed, the
>> disk controller characteristics, the number of threads in the thread
>> pool (ie CPU type), memory and other hardware-related things.
>>
>> There are two linearstore parameters that you can try playing with to
>> see if you can change the behavior of the store:
>>
>> wcache-page-size: This sets the size of each page in the write buffer.
>> Larger page size is good for large messages, a smaller size will help if
>> you have small messages.
>>
>> wchache-num-pages: The total number of pages in the write buffer.
>>
>> Use the --help on the broker with the linearstore loaded to see more
>> details on this. I hope that helps a little.
>>
>> Kim van der Riet
>>
>> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
>> > Any help in understand why/when broker throws those errors and stop
>> > receiving message would be appreciated.
>> >
>> > Not sure if any kernel tuning or broker tuning needs to be done to
>> > solve this issue.
>> >
>> > Thanks in advance,
>> > Ram
>> >
>> > On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
>> > rammohanganap@gmail.com> wrote:
>> >
>> >> Also from this log message (store level) it seems like waiting for AIO
>> to
>> >> complete.
>> >>
>> >> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>> >> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>> >> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>> >> ps=[-------------------------A------]
>> >>
>> >> page_state ps=[-------------------------A------]  where A is
>> AIO_PENDING
>> >> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>> >>
>> >> When there is or there are pending AIO, does broker close the
>> connection?
>> >> is there any tuning that can be done to resolve this?
>> >>
>> >> Thanks,
>> >> Ram
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>> >> rammohanganap@gmail.com> wrote:
>> >>
>> >>> I was check the code and i see these lines for that AIO timeout.
>> >>>
>> >>>                case
>> qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>> >>>                  if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>> >>>                      THROW_STORE_EXCEPTION("Timeout waiting for AIO in
>> >>> MessageStoreImpl::recoverMessages()");
>> >>>                  ::usleep(AIO_SLEEP_TIME_US);
>> >>>                  break;
>> >>>
>> >>> And these are the defaults
>> >>>
>> >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>> >>>
>> >>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>> >>>
>> >>>
>> >>>    RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
>> >>> waiting for AIO.
>> >>>
>> >>>
>> >>>
>> >>> So does page got blocked and its waiting for page availability?
>> >>>
>> >>>
>> >>> Ram
>> >>>
>> >>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>> >>> rammohanganap@gmail.com> wrote:
>> >>>
>> >>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that
>> we
>> >>>> see this message
>> >>>>
>> >>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>> >>>> "<journal-name>": Bad record alignment found at fid=0x4605b
>> offs=0x107680
>> >>>> (likely journal overwrite boundary); 19 filler record(s) required.
>> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> >>>> "<journal-name>": Recover phase write: Wrote filler record:
>> fid=0x4605b
>> >>>> offs=0x107680
>> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> >>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
>> logs
>> >>>>
>> >>>> It worked fine for a day and started throwing this message:
>> >>>>
>> >>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
>> >>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
>> pi=25 pc=8
>> >>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>> >>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver
>> to
>> >>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
>> failed:
>> >>>> jexception 0x0202 jcntl::handle_aio_wait() threw
>> JERR_JCNTL_AIOCMPLWAIT:
>> >>>> Timeout waiting for AIOs to complete.
>> >>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> >>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
>> framing-error:
>> >>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
>> 0x0202
>> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>> waiting for
>> >>>> AIOs to complete.
>> >>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
>> >>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
>> <queue-name>:
>> >>>> MessageStoreImpl::store() failed: jexception 0x0202
>> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
>> waiting for
>> >>>> AIOs to complete.
>> >>>>
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
>> >>>> qpid.server-ip:5672-client-ip:44457 closed by error:
>> illegal-argument:
>> >>>> Value for replyText is too large(320)
>> >>>>
>> >>>> Thanks,
>> >>>> Ram
>> >>>>
>> >>>>
>> >>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>> >>>> rammohanganap@gmail.com> wrote:
>> >>>>
>> >>>>> No, local disk.
>> >>>>>
>> >>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>> >>>>>
>> >>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>> >>>>>>> Gordon,
>> >>>>>>>
>> >>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
>> >>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what
>> scenario
>> >>>>>> its
>> >>>>>>> happening but after i restart broker and if we wait for few days
>> its
>> >>>>>>> happening again. From the above logs do you have any pointers to
>> >>>>>> check?
>> >>>>>>
>> >>>>>> Are you using NFS?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> >>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>> >>>>>>
>> >>>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Thank you Kim, i will try your suggestions.

On Wed, Nov 7, 2018, 6:58 AM Kim van der Riet <kvanderr@redhat.com wrote:

> This error is a linearstore issue. It looks as though there is a single
> write operation to disk that has become stuck, and is holding up all
> further write operations. This happens because there is a fixed circular
> pool of memory pages used for the AIO operations to disk, and when one
> of these is "busy" (indicated by the A letter in the  page state map),
> write operations cannot continue until it is cleared. It it does not
> clear within a certain time, then an exception is thrown, which usually
> results in the broker closing the connection.
>
> The events leading up to a "stuck" write operation are complex and
> sometimes difficult to reproduce. If you have a reproducer, then I would
> be interested to see it! Even so, the ability to reproduce on another
> machine is hard as it depends on such things as disk write speed, the
> disk controller characteristics, the number of threads in the thread
> pool (ie CPU type), memory and other hardware-related things.
>
> There are two linearstore parameters that you can try playing with to
> see if you can change the behavior of the store:
>
> wcache-page-size: This sets the size of each page in the write buffer.
> Larger page size is good for large messages, a smaller size will help if
> you have small messages.
>
> wchache-num-pages: The total number of pages in the write buffer.
>
> Use the --help on the broker with the linearstore loaded to see more
> details on this. I hope that helps a little.
>
> Kim van der Riet
>
> On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> > Any help in understand why/when broker throws those errors and stop
> > receiving message would be appreciated.
> >
> > Not sure if any kernel tuning or broker tuning needs to be done to
> > solve this issue.
> >
> > Thanks in advance,
> > Ram
> >
> > On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> > rammohanganap@gmail.com> wrote:
> >
> >> Also from this log message (store level) it seems like waiting for AIO
> to
> >> complete.
> >>
> >> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
> >> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> >> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> >> ps=[-------------------------A------]
> >>
> >> page_state ps=[-------------------------A------]  where A is AIO_PENDING
> >> aer=1 _aio_evt_rem;          ///< Remaining AIO events
> >>
> >> When there is or there are pending AIO, does broker close the
> connection?
> >> is there any tuning that can be done to resolve this?
> >>
> >> Thanks,
> >> Ram
> >>
> >>
> >>
> >>
> >> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> >> rammohanganap@gmail.com> wrote:
> >>
> >>> I was check the code and i see these lines for that AIO timeout.
> >>>
> >>>                case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
> >>>                  if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
> >>>                      THROW_STORE_EXCEPTION("Timeout waiting for AIO in
> >>> MessageStoreImpl::recoverMessages()");
> >>>                  ::usleep(AIO_SLEEP_TIME_US);
> >>>                  break;
> >>>
> >>> And these are the defaults
> >>>
> >>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
> >>>
> >>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
> >>>
> >>>
> >>>    RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
> >>> waiting for AIO.
> >>>
> >>>
> >>>
> >>> So does page got blocked and its waiting for page availability?
> >>>
> >>>
> >>> Ram
> >>>
> >>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> >>> rammohanganap@gmail.com> wrote:
> >>>
> >>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we
> >>>> see this message
> >>>>
> >>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> >>>> "<journal-name>": Bad record alignment found at fid=0x4605b
> offs=0x107680
> >>>> (likely journal overwrite boundary); 19 filler record(s) required.
> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>> "<journal-name>": Recover phase write: Wrote filler record:
> fid=0x4605b
> >>>> offs=0x107680
> >>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
> >>>> "<journal-name>": Recover phase write: Wr... few more Recover phase
> logs
> >>>>
> >>>> It worked fine for a day and started throwing this message:
> >>>>
> >>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
> >>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr:
> pi=25 pc=8
> >>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> >>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
> >>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store()
> failed:
> >>>> jexception 0x0202 jcntl::handle_aio_wait() threw
> JERR_JCNTL_AIOCMPLWAIT:
> >>>> Timeout waiting for AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>> 2018-10-28 12:27:01 [Broker] error Connection exception:
> framing-error:
> >>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception
> 0x0202
> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> waiting for
> >>>> AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue
> <queue-name>:
> >>>> MessageStoreImpl::store() failed: jexception 0x0202
> >>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
> waiting for
> >>>> AIOs to complete.
> >>>>
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> >>>> 2018-10-28 12:27:01 [Protocol] error Connection
> >>>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
> >>>> Value for replyText is too large(320)
> >>>>
> >>>> Thanks,
> >>>> Ram
> >>>>
> >>>>
> >>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> >>>> rammohanganap@gmail.com> wrote:
> >>>>
> >>>>> No, local disk.
> >>>>>
> >>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
> >>>>>
> >>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> >>>>>>> Gordon,
> >>>>>>>
> >>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
> >>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario
> >>>>>> its
> >>>>>>> happening but after i restart broker and if we wait for few days
> its
> >>>>>>> happening again. From the above logs do you have any pointers to
> >>>>>> check?
> >>>>>>
> >>>>>> Are you using NFS?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>>>>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>>>>
> >>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Kim van der Riet <kv...@redhat.com>.

This error is a linearstore issue. It looks as though there is a single 
write operation to disk that has become stuck, and is holding up all 
further write operations. This happens because there is a fixed circular 
pool of memory pages used for the AIO operations to disk, and when one 
of these is "busy" (indicated by the A letter in the  page state map), 
write operations cannot continue until it is cleared. It it does not 
clear within a certain time, then an exception is thrown, which usually 
results in the broker closing the connection.

The events leading up to a "stuck" write operation are complex and 
sometimes difficult to reproduce. If you have a reproducer, then I would 
be interested to see it! Even so, the ability to reproduce on another 
machine is hard as it depends on such things as disk write speed, the 
disk controller characteristics, the number of threads in the thread 
pool (ie CPU type), memory and other hardware-related things.

There are two linearstore parameters that you can try playing with to 
see if you can change the behavior of the store:

wcache-page-size: This sets the size of each page in the write buffer. 
Larger page size is good for large messages, a smaller size will help if 
you have small messages.

wchache-num-pages: The total number of pages in the write buffer.

Use the --help on the broker with the linearstore loaded to see more 
details on this. I hope that helps a little.

Kim van der Riet

On 11/6/18 2:12 PM, rammohan ganapavarapu wrote:
> Any help in understand why/when broker throws those errors and stop
> receiving message would be appreciated.
>
> Not sure if any kernel tuning or broker tuning needs to be done to
> solve this issue.
>
> Thanks in advance,
> Ram
>
> On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Also from this log message (store level) it seems like waiting for AIO to
>> complete.
>>
>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
>> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
>> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
>> ps=[-------------------------A------]
>>
>> page_state ps=[-------------------------A------]  where A is AIO_PENDING
>> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>>
>> When there is or there are pending AIO, does broker close the connection?
>> is there any tuning that can be done to resolve this?
>>
>> Thanks,
>> Ram
>>
>>
>>
>>
>> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
>> rammohanganap@gmail.com> wrote:
>>
>>> I was check the code and i see these lines for that AIO timeout.
>>>
>>>                case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>>                  if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>>                      THROW_STORE_EXCEPTION("Timeout waiting for AIO in
>>> MessageStoreImpl::recoverMessages()");
>>>                  ::usleep(AIO_SLEEP_TIME_US);
>>>                  break;
>>>
>>> And these are the defaults
>>>
>>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>>
>>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>>
>>>
>>>    RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
>>> waiting for AIO.
>>>
>>>
>>>
>>> So does page got blocked and its waiting for page availability?
>>>
>>>
>>> Ram
>>>
>>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we
>>>> see this message
>>>>
>>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>>> "<journal-name>": Bad record alignment found at fid=0x4605b offs=0x107680
>>>> (likely journal overwrite boundary); 19 filler record(s) required.
>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>> "<journal-name>": Recover phase write: Wrote filler record: fid=0x4605b
>>>> offs=0x107680
>>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>>> "<journal-name>": Recover phase write: Wr... few more Recover phase logs
>>>>
>>>> It worked fine for a day and started throwing this message:
>>>>
>>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
>>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=25 pc=8
>>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
>>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() failed:
>>>> jexception 0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT:
>>>> Timeout waiting for AIOs to complete.
>>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>> 2018-10-28 12:27:01 [Broker] error Connection exception: framing-error:
>>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>>>> AIOs to complete.
>>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue <queue-name>:
>>>> MessageStoreImpl::store() failed: jexception 0x0202
>>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>>>> AIOs to complete.
>>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
>>>> Value for replyText is too large(320)
>>>>
>>>> Thanks,
>>>> Ram
>>>>
>>>>
>>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>>> rammohanganap@gmail.com> wrote:
>>>>
>>>>> No, local disk.
>>>>>
>>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>>>>>
>>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>>>>> Gordon,
>>>>>>>
>>>>>>> We are using java client 0.28 version and qpidd-cpp 1.35 version
>>>>>>> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario
>>>>>> its
>>>>>>> happening but after i restart broker and if we wait for few days its
>>>>>>> happening again. From the above logs do you have any pointers to
>>>>>> check?
>>>>>>
>>>>>> Are you using NFS?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>>
>>>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Any help in understand why/when broker throws those errors and stop
receiving message would be appreciated.

Not sure if any kernel tuning or broker tuning needs to be done to
solve this issue.

Thanks in advance,
Ram

On Tue, Nov 6, 2018 at 8:35 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Also from this log message (store level) it seems like waiting for AIO to
> complete.
>
> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
> name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
> wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
> ps=[-------------------------A------]
>
> page_state ps=[-------------------------A------]  where A is AIO_PENDING
> aer=1 _aio_evt_rem;          ///< Remaining AIO events
>
> When there is or there are pending AIO, does broker close the connection?
> is there any tuning that can be done to resolve this?
>
> Thanks,
> Ram
>
>
>
>
> On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> I was check the code and i see these lines for that AIO timeout.
>>
>>               case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>>                 if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>>                     THROW_STORE_EXCEPTION("Timeout waiting for AIO in
>> MessageStoreImpl::recoverMessages()");
>>                 ::usleep(AIO_SLEEP_TIME_US);
>>                 break;
>>
>> And these are the defaults
>>
>> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>>
>> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>>
>>
>>   RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
>> waiting for AIO.
>>
>>
>>
>> So does page got blocked and its waiting for page availability?
>>
>>
>> Ram
>>
>> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
>> rammohanganap@gmail.com> wrote:
>>
>>>
>>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we
>>> see this message
>>>
>>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>>> "<journal-name>": Bad record alignment found at fid=0x4605b offs=0x107680
>>> (likely journal overwrite boundary); 19 filler record(s) required.
>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> "<journal-name>": Recover phase write: Wrote filler record: fid=0x4605b
>>> offs=0x107680
>>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>>> "<journal-name>": Recover phase write: Wr... few more Recover phase logs
>>>
>>> It worked fine for a day and started throwing this message:
>>>
>>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
>>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=25 pc=8
>>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
>>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() failed:
>>> jexception 0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT:
>>> Timeout waiting for AIOs to complete.
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> 2018-10-28 12:27:01 [Broker] error Connection exception: framing-error:
>>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>>> AIOs to complete.
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue <queue-name>:
>>> MessageStoreImpl::store() failed: jexception 0x0202
>>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>>> AIOs to complete.
>>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>> 2018-10-28 12:27:01 [Protocol] error Connection
>>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
>>> Value for replyText is too large(320)
>>>
>>> Thanks,
>>> Ram
>>>
>>>
>>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>>> rammohanganap@gmail.com> wrote:
>>>
>>>> No, local disk.
>>>>
>>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>>>>
>>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>>> > Gordon,
>>>>> >
>>>>> > We are using java client 0.28 version and qpidd-cpp 1.35 version
>>>>> > (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario
>>>>> its
>>>>> > happening but after i restart broker and if we wait for few days its
>>>>> > happening again. From the above logs do you have any pointers to
>>>>> check?
>>>>>
>>>>> Are you using NFS?
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>>
>>>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Also from this log message (store level) it seems like waiting for AIO to
complete.

2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<journal
name>": get_events() returned JERR_JCNTL_AIOCMPLWAIT;
wmgr_status: wmgr: pi=25 pc=8 po=0 aer=1 edac=TFFF
ps=[-------------------------A------]

page_state ps=[-------------------------A------]  where A is AIO_PENDING
aer=1 _aio_evt_rem;          ///< Remaining AIO events

When there is or there are pending AIO, does broker close the connection?
is there any tuning that can be done to resolve this?

Thanks,
Ram




On Mon, Nov 5, 2018 at 8:55 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> I was check the code and i see these lines for that AIO timeout.
>
>               case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
>                 if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
>                     THROW_STORE_EXCEPTION("Timeout waiting for AIO in
> MessageStoreImpl::recoverMessages()");
>                 ::usleep(AIO_SLEEP_TIME_US);
>                 break;
>
> And these are the defaults
>
> #define MAX_AIO_SLEEPS 100000 // tot: ~1 sec
>
> #define AIO_SLEEP_TIME_US  10 // 0.01 ms
>
>
>   RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
> waiting for AIO.
>
>
>
> So does page got blocked and its waiting for page availability?
>
>
> Ram
>
> On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>>
>> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we
>> see this message
>>
>> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
>> "<journal-name>": Bad record alignment found at fid=0x4605b offs=0x107680
>> (likely journal overwrite boundary); 19 filler record(s) required.
>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> "<journal-name>": Recover phase write: Wrote filler record: fid=0x4605b
>> offs=0x107680
>> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal
>> "<journal-name>": Recover phase write: Wr... few more Recover phase logs
>>
>> It worked fine for a day and started throwing this message:
>>
>> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
>> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=25 pc=8
>> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
>> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
>> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() failed:
>> jexception 0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT:
>> Timeout waiting for AIOs to complete.
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> 2018-10-28 12:27:01 [Broker] error Connection exception: framing-error:
>> Queue <queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>> AIOs to complete.
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>> 2018-10-28 12:27:01 [Protocol] error Connection
>> qpid.server-ip:5672-client-ip:44457 closed by error: Queue <queue-name>:
>> MessageStoreImpl::store() failed: jexception 0x0202
>> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
>> AIOs to complete.
>> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>> 2018-10-28 12:27:01 [Protocol] error Connection
>> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
>> Value for replyText is too large(320)
>>
>> Thanks,
>> Ram
>>
>>
>> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
>> rammohanganap@gmail.com> wrote:
>>
>>> No, local disk.
>>>
>>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>>>
>>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>>> > Gordon,
>>>> >
>>>> > We are using java client 0.28 version and qpidd-cpp 1.35 version
>>>> > (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario
>>>> its
>>>> > happening but after i restart broker and if we wait for few days its
>>>> > happening again. From the above logs do you have any pointers to
>>>> check?
>>>>
>>>> Are you using NFS?
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>>
>>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

I was check the code and i see these lines for that AIO timeout.

              case qpid::linearstore::journal::RHM_IORES_PAGE_AIOWAIT:
                if (++aio_sleep_cnt > MAX_AIO_SLEEPS)
                    THROW_STORE_EXCEPTION("Timeout waiting for AIO in
MessageStoreImpl::recoverMessages()");
                ::usleep(AIO_SLEEP_TIME_US);
                break;

And these are the defaults

#define MAX_AIO_SLEEPS 100000 // tot: ~1 sec

#define AIO_SLEEP_TIME_US  10 // 0.01 ms


  RHM_IORES_PAGE_AIOWAIT, ///< IO operation suspended - next page is
waiting for AIO.



So does page got blocked and its waiting for page availability?


Ram

On Mon, Nov 5, 2018 at 8:00 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

>
> Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we see
> this message
>
> 2018-10-27 18:58:25 [Store] warning Linear Store: Journal
> "<journal-name>": Bad record alignment found at fid=0x4605b offs=0x107680
> (likely journal overwrite boundary); 19 filler record(s) required.
> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal "<journal-name>":
> Recover phase write: Wrote filler record: fid=0x4605b offs=0x107680
> 2018-10-27 18:58:25 [Store] notice Linear Store: Journal "<journal-name>":
> Recover phase write: Wr... few more Recover phase logs
>
> It worked fine for a day and started throwing this message:
>
> 2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
> get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=25 pc=8
> po=0 aer=1 edac=TFFF ps=[-------------------------A------]
> 2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
> queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() failed:
> jexception 0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT:
> Timeout waiting for AIOs to complete.
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> 2018-10-28 12:27:01 [Broker] error Connection exception: framing-error:
> Queue <queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
> AIOs to complete.
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
> 2018-10-28 12:27:01 [Protocol] error Connection
> qpid.server-ip:5672-client-ip:44457 closed by error: Queue <queue-name>:
> MessageStoreImpl::store() failed: jexception 0x0202
> jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
> AIOs to complete.
> (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
> 2018-10-28 12:27:01 [Protocol] error Connection
> qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
> Value for replyText is too large(320)
>
> Thanks,
> Ram
>
>
> On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> No, local disk.
>>
>> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>>
>>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>>> > Gordon,
>>> >
>>> > We are using java client 0.28 version and qpidd-cpp 1.35 version
>>> > (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario its
>>> > happening but after i restart broker and if we wait for few days its
>>> > happening again. From the above logs do you have any pointers to check?
>>>
>>> Are you using NFS?
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Actually we have upgraded from qpid-cpp 0.28 to 1.35 and after that we see
this message

2018-10-27 18:58:25 [Store] warning Linear Store: Journal "<journal-name>":
Bad record alignment found at fid=0x4605b offs=0x107680 (likely journal
overwrite boundary); 19 filler record(s) required.
2018-10-27 18:58:25 [Store] notice Linear Store: Journal "<journal-name>":
Recover phase write: Wrote filler record: fid=0x4605b offs=0x107680
2018-10-27 18:58:25 [Store] notice Linear Store: Journal "<journal-name>":
Recover phase write: Wr... few more Recover phase logs

It worked fine for a day and started throwing this message:

2018-10-28 12:27:01 [Store] critical Linear Store: Journal "<name>":
get_events() returned JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=25 pc=8
po=0 aer=1 edac=TFFF ps=[-------------------------A------]
2018-10-28 12:27:01 [Broker] warning Exchange <name> cannot deliver to
queue <queue-name>: Queue <queue-name>: MessageStoreImpl::store() failed:
jexception 0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT:
Timeout waiting for AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
2018-10-28 12:27:01 [Broker] error Connection exception: framing-error:
Queue <queue-name>: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
2018-10-28 12:27:01 [Protocol] error Connection
qpid.server-ip:5672-client-ip:44457 closed by error: Queue <queue-name>:
MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
2018-10-28 12:27:01 [Protocol] error Connection
qpid.server-ip:5672-client-ip:44457 closed by error: illegal-argument:
Value for replyText is too large(320)

Thanks,
Ram

On Mon, Nov 5, 2018 at 3:34 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> No, local disk.
>
> On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:
>
>> On 05/11/18 22:58, rammohan ganapavarapu wrote:
>> > Gordon,
>> >
>> > We are using java client 0.28 version and qpidd-cpp 1.35 version
>> > (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario its
>> > happening but after i restart broker and if we wait for few days its
>> > happening again. From the above logs do you have any pointers to check?
>>
>> Are you using NFS?
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

No, local disk.

On Mon, Nov 5, 2018 at 3:26 PM Gordon Sim <gs...@redhat.com> wrote:

> On 05/11/18 22:58, rammohan ganapavarapu wrote:
> > Gordon,
> >
> > We are using java client 0.28 version and qpidd-cpp 1.35 version
> > (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario its
> > happening but after i restart broker and if we wait for few days its
> > happening again. From the above logs do you have any pointers to check?
>
> Are you using NFS?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Gordon Sim <gs...@redhat.com>.

On 05/11/18 22:58, rammohan ganapavarapu wrote:
> Gordon,
> 
> We are using java client 0.28 version and qpidd-cpp 1.35 version
> (qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario its
> happening but after i restart broker and if we wait for few days its
> happening again. From the above logs do you have any pointers to check?

Are you using NFS?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Gordon,

We are using java client 0.28 version and qpidd-cpp 1.35 version
(qpid-cpp-server-1.35.0-1.el7.x86_64), i dont know at what scenario its
happening but after i restart broker and if we wait for few days its
happening again. From the above logs do you have any pointers to check?

We are using linear store not legacy.

When i do netstat -an |grep 5672, i see two ESTABLISHED connections for a
client host.

This is the queue config:

qpid-config queues
Queue Name                                Attributes
=================================================================
1cdfd9cd-227d-4b84-9539-ab4d67ee5a1f:0.0  auto-del excl
q-001       --durable --file-size=2000 --file-count=24
--max-queue-size=1073741824 --max-queue-count=1000000
--limit-policy=flow-to-disk --argument no-local=False
q-001-dl    --durable --file-size=6000 --file-count=4
--max-queue-size=52428800 --max-queue-count=100000
--limit-policy=flow-to-disk --argument no-local=False

Some one posted long back similar issue but dont see any solution,
http://qpid.2158936.n2.nabble.com/RE-qpid-Java-client-unable-to-send-messages-td7613136.html

Thanks
Ram

On Mon, Nov 5, 2018 at 2:32 PM Gordon Sim <gs...@redhat.com> wrote:

> On 05/11/18 20:57, rammohan ganapavarapu wrote:
> > Actually there are no messages in queue, all they messages got consumed
> by
> > consumer.
>
> But it still will not enqueue any further messages? Can you reproduce
> this easily?
>
> One other suggestion is to try with the linear store rather than the
> legacy store if possible.
>
> > I also observe two tcp connections to each client and for this
> > client only one tcp connection. Why does qpid creates two connections?
>
> I don't think it does. Which client and version are you using? How are
> you observing the two connections?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Gordon Sim <gs...@redhat.com>.

On 05/11/18 20:57, rammohan ganapavarapu wrote:
> Actually there are no messages in queue, all they messages got consumed by
> consumer.

But it still will not enqueue any further messages? Can you reproduce 
this easily?

One other suggestion is to try with the linear store rather than the 
legacy store if possible.

> I also observe two tcp connections to each client and for this
> client only one tcp connection. Why does qpid creates two connections?

I don't think it does. Which client and version are you using? How are 
you observing the two connections?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

I also see this in qpidd logs for broker,store and protocol. Please help me
to understand what it means? why i am i getting "Timeout waiting for AIOs
to complete" ? does it means some thing wrong with journal files?

2018-02-28 13:19:00 [Store] critical Journal "q-001": get_events() returned
JERR_JCNTL_AIOCMPLWAIT; wmgr_status: wmgr: pi=2 pc=45 po=0 aer=1 edac:TFFF
ps=[--A-----------------------------] wrfc: state: Active fcntl[1]: pfid=1
ws=11524 wc=11268 rs=0 rc=0 ec=6 ac=1
2018-02-28 13:19:00 [Broker] warning Exchange ex-001 cannot deliver to
queue q-001: Queue q-001: MessageStoreImpl::store() failed: jexception
0x0202 jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout
waiting for AIOs to complete.
(/home/fliu/rpmbuild/BUILD/qpid-0.28/cpp/src/qpid/legacystore/MessageStoreImpl.cpp:1357)
2018-02-28 13:19:00 [Broker] error Connection exception: framing-error:
Queue q-001: MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/fliu/rpmbuild/BUILD/qpid-0.28/cpp/src/qpid/legacystore/MessageStoreImpl.cpp:1357)
2018-02-28 13:19:00 [Protocol] error Connection
qpid.1.2.3.4:5672-1.2.3.5:28188 closed by error: Queue q-001:
MessageStoreImpl::store() failed: jexception 0x0202
jcntl::handle_aio_wait() threw JERR_JCNTL_AIOCMPLWAIT: Timeout waiting for
AIOs to complete.
(/home/fliu/rpmbuild/BUILD/qpid-0.28/cpp/src/qpid/legacystore/MessageStoreImpl.cpp:1357)(501)

On Mon, Nov 5, 2018 at 12:57 PM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Actually there are no messages in queue, all they messages got consumed by
> consumer. I also observe two tcp connections to each client and for this
> client only one tcp connection. Why does qpid creates two connections?
>
> Ram
>
> On Mon, Nov 5, 2018, 11:09 AM Gordon Sim <gsim@redhat.com wrote:
>
>> Can you drain the queue?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: users-help@qpid.apache.org
>>
>>

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Actually there are no messages in queue, all they messages got consumed by
consumer. I also observe two tcp connections to each client and for this
client only one tcp connection. Why does qpid creates two connections?

Ram

On Mon, Nov 5, 2018, 11:09 AM Gordon Sim <gsim@redhat.com wrote:

> Can you drain the queue?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: qpid-cpp-0.35 errors

Posted by Gordon Sim <gs...@redhat.com>.

Can you drain the queue?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: qpid-cpp-0.35 errors

Posted by rammohan ganapavarapu <ra...@gmail.com>.

Kim/Gordon,

After this message broker is not accepting any more messages and keep
throwing this message.

Thanks,
Ram

On Fri, Nov 2, 2018 at 8:59 AM rammohan ganapavarapu <
rammohanganap@gmail.com> wrote:

> Any help in understating this error message would be appreciated.
>
> Ram
>
> On Wed, Oct 31, 2018 at 5:47 AM rammohan ganapavarapu <
> rammohanganap@gmail.com> wrote:
>
>> Kim,
>>
>> Any idea about this error?
>>
>> Thanks,
>> Ram
>>
>> On Tue, Oct 30, 2018, 2:13 PM Gordon Sim <gs...@redhat.com> wrote:
>>
>>> On 30/10/18 18:59, rammohan ganapavarapu wrote:
>>> > There are two more error from my original post, can some one help me to
>>> > understand when qpid throws these error?
>>> >
>>> >
>>> >     1. 1. 2018-10-22 08:05:30 [Broker] error Channel exception:
>>> >     not-attached: Channel 0 is not attached
>>> >
>>>  (/builddir/build/BUILD/qpid-cpp-1.35.0/src/qpid/amqp_0_10/SessionHandler.cpp:39)
>>>
>>> The one above is comon when you are sending asynchronously, and a
>>> previous message caused the session to be ended with an exception frame.
>>> Any subsequent messages that were sent before the client received the
>>> exception frame result in above error.
>>>
>>> >     2. 2018-10-30 14:30:36 [Broker] error Connection exception:
>>> >     framing-error: Queue ax-q-axgroup-001-consumer-group-001:
>>> >     MessageStoreImpl::store() failed: jexception 0x0803
>>> wmgr::enqueue() threw
>>> >     JERR_WMGR_ENQDISCONT: Enqueued new dtok when previous enqueue
>>> returned
>>> >     partly completed (state ENQ_PART). (This data_tok: id=1714315
>>> state=NONE)
>>> >
>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)
>>> >     3. 2018-10-30 14:30:36 [Protocol] error Connection
>>> >     qpid.10.68.94.134:5672-10.68.94.127:39458 closed by error: Queue
>>> >     ax-q-axgroup-001-consumer-group-001: MessageStoreImpl::store()
>>> failed:
>>> >     jexception 0x0803 wmgr::enqueue() threw JERR_WMGR_ENQDISCONT:
>>> Enqueued new
>>> >     dtok when previous enqueue returned partly completed (state
>>> ENQ_PART).
>>> >     (This data_tok: id=1714315 state=NONE)
>>> >
>>>  (/home/rganapavarapu/rpmbuild/BUILD/qpid-cpp-1.35.0/src/qpid/linearstore/MessageStoreImpl.cpp:1211)(501)
>>>
>>> Not sure what the 'partly completed state' means here. Kim, any thoughts?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>>