You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/12/20 19:02:12 UTC

[GitHub] [lucene] stevenschlansker opened a new issue, #12026: CollectionTerminationException is used for flow control

stevenschlansker opened a new issue, #12026:
URL: https://github.com/apache/lucene/issues/12026

   ### Description
   
   I am doing some performance profiling with Java Flight Recorder (if you haven't tried it yet, you should, it's awesome!) and am getting diagnostic warnings about thrown exception counts.
   
   In a few minutes long performance test, I see 25,000 instances of CollectionTerminationException thrown from within our search engine. In Java, the general guidance is not to use exceptions as flow control constructs - they are expensive to create (filling in stack trace, although this cost can be partially mitigated) and not particularly efficient to throw.
   
   It looks like LeafCollector uses this exception to terminate early, if there are no documents of interest left.
   Would it be possible, perhaps for Lucene 10, to change this interface to no longer use exceptions for flow control?
   
   Examples of alternate patterns include a null return value, a compound return type like a Optional<LeafCollector>, or flipping the control around to `visitLeafCollector(Consumer<LeafCollector>)` which might or might not call the consumer.
   
   It might even be possible to adapt the old pattern to the new style transparently to users, through clever default method implementations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360063018

   Would a change that at least pre-allocates a single exception instance, rather than making a new one every time, be welcome / desired?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360140114

   Our current use case involves doing a number of existence-tests of boolean queries of term queries, like `count(docType = A AND docId = B) == 0`, which is why we end up doing so many small count operations. Since it is the conjunction of two terms, I don't think the term frequency by itself helps us.
   
   I'll look into restructuring our code at a higher level to avoid this, and make sure the debug non safepoints flag is on. Thanks for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360058121

   This is during an intensive indexing operation, and it is using a near-real-time reader created from an IndexWriter. I wonder if we are somehow in an unusual spot where we are seeing atypical performance characteristics. Or, perhaps the profiler is lying to me...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360183432

   Another note: we have `-XX:-OmitStackTraceInFastThrow` on our command line, which inhibits JVM optimization for exception stack traces. So that probably makes it yet more expensive for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360081092

   You are doing count queries that get implemented with Weight.count(). Separately, that's really strange, i dont know what you are doing, but that isn't very typical. e.g. if you are just counting occurrences of terms for some purpose, you maybe just want to call `docFreq`.
   
   The issue with preallocating exceptions is you lose the stack trace, not a good tradeoff. But it might be ok to preallocate one inside `TopHitCountCollector` where it early-terminates when `Weight.count` is supported for the segment... I'm not sure. I'd first want to know what the use-case is for doing thousands of count queries this way, as it isn't typical. If you are bottlenecked on cost of creating an exception then seems to me everything is impossibly fast for your app and there's nothing to worry about. I'm really not convinced the cost here is large.
   
   Please look at how the mechanism is used, it is not just something you change to an `Optional` or similar. For example a Scorer can early-terminate a segment from something like `nextDoc` which is the intended use-case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360373248

   that one isn't relevant, it doesn't apply to our own custom exception here. I would just rerun with DebugNonSafePoints before even worrying about it. Otherwise profiler can waste immense amounts of your time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360056272

   Here's an example of what I am seeing:
   
   ![image](https://user-images.githubusercontent.com/129097/208750636-e1f3316f-e6c7-4d65-a5ef-cd67e72a8190.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360035214

   It is harmless, it isn't a performance issue really, it only happens at most once per segment. Exception is the correct mechanism as it is an exceptional case to bail on processing here.
   
   I don't think we should use a more complex abstraction for early-termination. I know this may go against the java religious principles, but simpler trumps all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker closed issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker closed issue #12026: CollectionTerminationException is used for flow control
URL: https://github.com/apache/lucene/issues/12026


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360097390

   Also, when using this flightrecorder tool, to help reduce noise you need to pass a few flags (e.g. `DebugNonSafepoints`) or you WILL see a ton of ghosts, this could easily be one of them. They document this in the java documentation but you need to be sure to pass it to your app, example:
   
   https://github.com/apache/lucene/blob/main/gradle/testing/profiling.gradle#L34-L35


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360058923

   It's also worth noting that we are firing off many small count queries, rather than executing one large query. So maybe we are visiting all segments many more times than usual.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on issue #12026: CollectionTerminationException is used for flow control

Posted by GitBox <gi...@apache.org>.
rmuir commented on issue #12026:
URL: https://github.com/apache/lucene/issues/12026#issuecomment-1360146258

   ok, thanks for the explanation. `BooleanWeight.count` can only help in "rare cases" IMO here, but it isn't rare enough in your measurements (e.g. maybe for some segments, all documents are docType='A' so it can optimize). 
   
   I still am suspicious about exceptions looking unrealistically costly with this profiler (have seen it before) if you are allowing safepoint bias.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org