You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/10/20 08:12:55 UTC

[GitHub] [pulsar] e-marchand-exensa opened a new issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

e-marchand-exensa opened a new issue #8307:
URL: https://github.com/apache/pulsar/issues/8307


   **Describe the bug**
   I saw this warning log:
   
   `WAR|20/093812.590 o.a.p.s.i.n.u.HashedWheelTimer@ulsar-timer-10-1 An exception was thrown by TimerTask.
   java.lang.NullPointerException: null
   	at org.apache.pulsar.client.impl.ConsumerBase.notifyPendingBatchReceivedCallBack(ConsumerBase.java:567)
   	at org.apache.pulsar.client.impl.ConsumerImpl.completeOpBatchReceive(ConsumerImpl.java:1422)
   	at org.apache.pulsar.client.impl.ConsumerBase.pendingBatchReceiveTask(ConsumerBase.java:600)
   	at org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
   	at org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
   	at org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
   	at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.base/java.lang.Thread.run(Thread.java:834)
   `
   
   **To Reproduce**
   I'm using consumers with `BatchReceivePolicy` and polling `pulsarConsumer.batchReceive()`.
   
   **Expected behavior**
   I guess no NPE should occur.
   
   **Desktop (please complete the following information):**
   openjdk version "11.0.8" 2020-07-14
   OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
   OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716438279


   Hi @lhotari do you think this could fix an issue with a blocking `pulsarConsumer.batchReceive()` while using a `BatchReceivePolicy` with a timeout set ? I set a timeout to 250ms but it may fail to return after waiting for 60s.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716607914


   > Wow, that's not what I had supposed and do not match what I observe either. But may be I don't understand what you mean exactly. Most of my consumers does not receive any messages but still does not block indefinitely. It happens "sometimes" (in the closing process of my application) and I'm hard to reproduce the problem.
   
   @e-marchand-exensa based on the source code, calling `batchReceive` will block forever unless there are messages in the topic or new messages arrive while the call is in progress. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-717118365


   > So, do you think the fix on the NPE issue here may avoid the non-rescheduling or do I need to open another issue ?
   
   @e-marchand-exensa yes I believe that the issue is fixed with PR #8326 . That should be part of the 2.6.2 release.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-839851849


   > It's possible that you are encountering a different issue since there are no log messages.
   > Do the messages get delivered when you restart the client side? If not, then it must be a problem on the broker side. For example, some fixes have been recently made to a condition where the message delivery stops, an issue reported as #6054 .
   
   I'm not 100% sure the messages get delivered, but the backlog goes empty after restart, as far as I could see. Still, the future should have completed.
   
   > Would you be able to create a thread dump of the Java process where this occurs? When using the Pulsar docker images, a thread dump can be created by running `jstack 1` in the shell.
   
   I do have a thread dump before stopping the application.
   [wdl_prod_22_thread_dump.log](https://github.com/apache/pulsar/files/6467048/wdl_prod_22_thread_dump.log)
   
   > Are you using an ordinary consumer subscribing to a single topic or is it a multi-topic consumer?
   
   It is a single topic consumer. Here is the extract for the `consumer` creation.
   ```
   final var builder = pulsarClient.newConsumer()
         .consumerName( consumerName )
         .topic( topic )
         .subscriptionName( subscription )
         .subscriptionType( SubscriptionType.Exclusive )
         .subscriptionInitialPosition( SubscriptionInitialPosition.Latest )
         .batchReceivePolicy( batchPolicy )
         .receiverQueueSize( batchPolicy.getMaxNumMessages() > 0 ? 2*batchPolicy.getMaxNumMessages() : 1024 )
         .maxTotalReceiverQueueSizeAcrossPartitions( 32 * 1024 )
         .acknowledgmentGroupTime( 250, TimeUnit.MILLISECONDS );
   ```
   with the following `batchPolicy`
   ```
   BatchReceivePolicy.builder()
         .maxNumMessages( 42 )
         .maxNumBytes( -1 ) // FIXME: because of issue #7696 in Pulsar v2.5.2
         .timeout( 1, TimeUnit.SECONDS );
   ```
   For this topic, `maxNumMessages` is 1024 and `timeout` is 1000ms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-713614877


   FYI, I'm currently working on the same source code files where that NPE happens. I'm pretty confident that this will be fixed by PR #8326 . 
   
   The reason for the NPE seems to be that op is null on this line:
   https://github.com/apache/pulsar/blob/v2.5.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java#L600


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716471985


   > Hi @lhotari do you think this could fix an issue with a blocking `pulsarConsumer.batchReceive()` while using a `BatchReceivePolicy` with a timeout set ? I set a timeout to 250ms but it may fail to return after waiting for 60s.
   
   The problem is that `Consumer.batchReceive` doesn't currently provide a way to set the timeout for a batchReceive call. The `BatchReceivePolicy` controls the batch creation, but when no messages are received, the timeout will have no impact on a batchReceive call. With the #8326 changes, it's possible to set the timeout for batchReceiveAsync call in JDK 9+, for example:
   ```
   CompletableFuture<Messages<?>> batchResult = consumer.batchReceiveAsync().orTimeout(250, TimeUnit.MILLISECONDS);
   ```
   This means that there is a workaround for your problem when using the async API (the improvement will be available in 2.6.2 release) and JDK9+ (which brings the `orTimeout` method to CompletableFuture). 
   
   For the non-async API, I'd suggest creating a feature request for adding `batchReceive(int timeout, TimeUnit unit)` to `org.apache.pulsar.client.api.Consumer`. There's current `receive(int timeout, TimeUnit unit)` so it would make sense to have a similar method for `batchReceive`. I noticed that you have already created #7696 . However I think that having a timeout for batchReceive method is a new feature and is not something that should or is controlled by BatchReceivePolicy. Therefore creating a new feature request would be a way to proceed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716617777


   @lhotari pretty sure it is not the behavior I see in v2.5.2. I will try to dive into it again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa edited a comment on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa edited a comment on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-856773958


   @lhotari @codelipenghui  do you think #10768 could solve this issue ? Regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-839744437


   > Since this issue, I'm still using `pulsarConsumer.batchReceiveAsync().get( 10, TimeUnit.SECONDS )` in my client code to be sure not to block even if the `BatchReceivePolicy` is configured with a timeout of 1s using Pulsar v2.6.2.
   > 
   > I just encountered the issue again, the future not completing after 10s and the consumer never receiving anything anymore while the backlog is growing. **This time, I can't see any log from Pulsar (client side)**. All off the many others consumers are still working properly. It seems a random thing, so I guess a race condition somewhere.
   > 
   > I don't see anything in Pulsar v2.6.3 or v2.7.1 that could solve this issue, do you have any insight to share ?
   > 
   > Regards.
   
   @e-marchand-exensa  It's possible that you are encountering a different issue since there are no log messages. 
   Do the messages get delivered when you restart the client side? If not, then it must be a problem on the broker side. For example, some fixes have been recently made to a condition where the message delivery stops, an issue reported as #6054 .
   Would you be able to create a thread dump of the Java process where this occurs? When using the Pulsar docker images, a thread dump can be created by running `jstack 1` in the shell.
   
   Are you using an ordinary consumer subscribing to a single topic or is it a multi-topic consumer?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716641987


   @e-marchand-exensa yes you are right, that it should behave that way. I wasn't fully understanding how the batch receive has been implemented. I now see how it works. 
   
   The reason why the batch receive stops working is that when an unhandled exception happens in the https://github.com/apache/pulsar/blob/v2.5.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java#L591-L608 ,
   the next `ConsumerBase.pendingBatchReceiveTask` call will never get rescheduled on the last line.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-856773958


   @lhotari do you think #10768 could solve this issue ? Regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716506358


   @lhotari , thanks for your inputs.
   
   > The problem is that `Consumer.batchReceive` doesn't currently provide a way to set the timeout for a batchReceive call. The `BatchReceivePolicy` controls the batch creation, but when no messages are received, the timeout will have no impact on a batchReceive call.
   
   Wow, that's not what I had supposed and do not match what I observe either. But may be I don't understand what you mean **exactly**. Most of my consumers does not receive any messages but still does not block indefinitely. It happens "sometimes" (in the closing process of my application) and I'm hard to reproduce the problem.
   
   As a work around, I'm using `pulsarConsumer.batchReceiveAsync().get(int timeout, TimeUnit.unit)`, but a timeout in this case should **not** be an exception as it is totally normal to not get any message in a 250ms timeframe. I suppose `consumer.batchReceiveAsync().orTimeout(int timeout, TimeUnit unit)` imply the same semantic issue.
   
   You are right about the need for a `batchReceive(int timeout, TimeUnit unit)` if the policy does not ensure this, but this case must made be clear in the policy documentation as I don't think it is the expected behavior for most users and it seems I still don't understand the current one.
   
   Regards,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-717044001


   @lhotari Thanks a lot for your answers.
   
   So, do you think the fix on the NPE issue here may avoid the non-rescheduling or do I need to open another issue ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716607914


   > Wow, that's not what I had supposed and do not match what I observe either. But may be I don't understand what you mean exactly. Most of my consumers does not receive any messages but still does not block indefinitely. It happens "sometimes" (in the closing process of my application) and I'm hard to reproduce the problem.
   
   @e-marchand-exensa based on the source code, calling `batchReceive` will block forever unless there are messages in the topic or new messages arrive while the call in progress. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] e-marchand-exensa commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
e-marchand-exensa commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-838625517


   Hi @lhotari, @sijie,
   
   Since this issue, I'm still using `pulsarConsumer.batchReceiveAsync().get( 10, TimeUnit.SECONDS )` in my client code to be sure not to block even if the `BatchReceivePolicy` is configured with a timeout of 1s using Pulsar v2.6.2.
   
   I just encountered the issue again, the future not completing after 10s and the consumer never receiving anything anymore while the backlog is growing. **This time, I can't see any log from Pulsar (client side)**. All off the many others consumers are still working properly. It seems a random thing, so I guess a race condition somewhere.
   
   I don't see anything in Pulsar v2.6.3 or v2.7.1 that could solve this issue, do you have any insight to share ?
   
   Regards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716471985


   > Hi @lhotari do you think this could fix an issue with a blocking `pulsarConsumer.batchReceive()` while using a `BatchReceivePolicy` with a timeout set ? I set a timeout to 250ms but it may fail to return after waiting for 60s.
   
   @e-marchand-exensa The problem is that `Consumer.batchReceive` doesn't currently provide a way to set the timeout for a batchReceive call. The  `BatchReceivePolicy` controls the batch creation, but when no messages are received, the timeout will have no impact on a batchReceive call. With the #8326 changes, it's possible to set the timeout for batchReceiveAsync call in JDK 9+, for example:
   ```
   CompletableFuture<Messages<?>> batchResult = consumer.batchReceiveAsync().orTimeout(250, TimeUnit.MILLISECONDS);
   ```
   This means that there is a workaround for your problem when using the async API (the improvement will be available in 2.6.2 release) and JDK9+ (which brings the `orTimeout` method to CompletableFuture). 
   
   For the non-async API, I'd suggest creating a feature request for adding `batchReceive(int timeout, TimeUnit unit)` to `org.apache.pulsar.client.api.Consumer`. There's current `receive(int timeout, TimeUnit unit)` so it would make sense to have a similar method for `batchReceive`. I noticed that you have already created #7696 . However I think that having a timeout for batchReceive method is a new feature and is not something that should or is controlled by BatchReceivePolicy. Therefore creating a new feature request would be a way to proceed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari commented on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716607914


   > Wow, that's not what I had supposed and do not match what I observe either. But may be I don't understand what you mean exactly. Most of my consumers does not receive any messages but still does not block indefinitely. It happens "sometimes" (in the closing process of my application) and I'm hard to reproduce the problem.
   
   @e-marchand-exensa based on the source code, calling `batchReceive` will block forever unless there are messages in the topic or new messages while the call in progress. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on issue #8307: [Java] [v2.5.2] An exception was thrown by TimerTask: NullPointerException

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on issue #8307:
URL: https://github.com/apache/pulsar/issues/8307#issuecomment-716641987


   @e-marchand-exensa yes you are right, that it should behave that way. I wasn't fully understanding how the batch receive has been implemented. I now see how it works. 
   
   The reason why the batch receive stops working is that when an unhandled exception happens in
   
   https://github.com/apache/pulsar/blob/d2438089f1f7c6ec6f9814fd257d147e2c660fd9/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java#L591-L608
   
   the next `ConsumerBase.pendingBatchReceiveTask` call will never get rescheduled on the last line.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org