You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/08/03 21:16:59 UTC

[GitHub] [lucene] yugushihuang opened a new pull request, #1057: Add a codec class to track merge time of each index part

yugushihuang opened a new pull request, #1057:
URL: https://github.com/apache/lucene/pull/1057

   This PR add filter class for each index format classes and their writer/producer/reader/consumer and add a tracking codec class to track merge time of each index part.
   [Jira Issue Link](https://issues.apache.org/jira/projects/LUCENE/issues/LUCENE-10670)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1212962044

   dude, that's nothing compared to the cost of disabling bulk merge, which will also force all stored fields to be recompressed every merge as well.
   
   sorry, i'm completely against this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] yugushihuang commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
yugushihuang commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1258240704

   @rmuir Thanks for pointing this out. I will close this PR and investigate other way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1223506286

   it uses `instanceof`: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L686
   
   there's nothing to "do about it", except to say, don't wrap entire codec if you want to add a stats interface. Add a stats interface if you want a stats interface. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1226029428

   > I like the idea of making `SegmentMerger` implement some sort of a Stats interface that can return a metrics object. We can populate the TimeMetric object in `SegmentMerger.merge()` as each codec's merge returns. It solves what is needed here, and is easy to extend.
   
   I would suggest IndexWriter instead. it is already the public interface, just expose some counters. SegmentMerger is an implementation detail. In fact there are already some counters there (e.g. flushCount), but only exposed for testing purposes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] yugushihuang closed pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
yugushihuang closed pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part
URL: https://github.com/apache/lucene/pull/1057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
msokolov commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1212151454

   > I really don't think we should be doing this with a codec-wrapper. you can get this data alrady from InfoStream!
   
   InfoStream is not very structured though -- it's hard to see how you could extract merge times from it. You'd have to parse formatted Strings, right? It also has this comment in its javadoc: `NOTE: Enabling infostreams may cause performance degradation in some components.`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1211278416

   this will actually slow down the merge heavily, by preventing things like optimized bulk merges of stored fields.
   
   I really don't think we should be doing this with a codec-wrapper. you can get this data alrady from InfoStream!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] vigyasharma commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
vigyasharma commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1224694259

   > it uses `instanceof`: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java#L686
   
   Thanks for sharing that link @rmuir. Intrigued, I did a quick search for `instanceof` in the `codecs/`, and found such checks in more than one place. 
   
   It seems that when we add a functionality in a new codec version, we use such type equality checks to ensure that consuming code is compatible. Adding a generic codec wrapper that branches from the inheritance chain will break such checks. Replacing such checks is a whole other discussion and not really the problem we are solving here.
   
   I like the idea of making `SegmentMerger` implement some sort of a Stats interface that can return a metrics object. We can populate the TimeMetric object in `SegmentMerger.merge()` as each codec's merge returns. It solves what is needed here, and is easy to extend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
msokolov commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1213141620

   Thanks for helping out here, @rmuir - can you explain how it is that bulk operations would be prevented by this change? I don't really understand, but maybe if you explained we could figure out what to do about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] yugushihuang commented on pull request #1057: LUCENE-10670: Add a codec class to track merge time of each index part

Posted by GitBox <gi...@apache.org>.
yugushihuang commented on PR #1057:
URL: https://github.com/apache/lucene/pull/1057#issuecomment-1208231388

   > The idea seems good - we want to track merge times separately for each format, right? I wonder if the Filter classes belong in a monitoring package though. It's also a confusing name since we already have a `monitor` package which is something completely different. Perhaps a better place for the `Filter*` classes would be `org.apache.lucene.codecs.filter` (in `core` module), and then we could put the `MergeTimeTrackingCodec` in the `sandbox` module somewhere? Maybe in `o.a.l.sandbox.metrics`? (to avoid confusion with monitor)
   
   Yes, it use to track merge time for each format. Thanks for the suggestion, I will make revision accordingly. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org