You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/14 02:33:38 UTC

[GitHub] [lucene] LuXugang opened a new issue, #11770: Optimization for time series data

LuXugang opened a new issue, #11770:
URL: https://github.com/apache/lucene/issues/11770

   ### Description
   
   Hi, recently I read a [paper](https://www.vldb.org/pvldb/vol15/p3472-yu.pdf) from [VLDB](https://vldb.org/2022/?paper-session) said it gains significant performance improvements against Lucene. It achieves 20x performance increase with standard queries, and 10x performance increase with histogram queries in massive log query scenarios.
   
   After read the whole content, it seems that the core idea in this paper is similar to `IndexSortSortedNumericDocValuesRangeQuery`, dose someone have free time to read this paper and have a discussion here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on issue #11770: Optimization for time series data

Posted by GitBox <gi...@apache.org>.
jpountz commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1247796727

   > it seems that the core idea in this paper is similar to IndexSortSortedNumericDocValuesRangeQuery
   
   This is my understanding as well, though it says it uses the BKD tree to figure out the range of doc IDs, not doc values, which seems to be the idea that is proposed at https://github.com/apache/lucene/pull/687 (which I just realized I had completely forgotten about :grimacing:).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] LuXugang commented on issue #11770: Optimization for time series data

Posted by GitBox <gi...@apache.org>.
LuXugang commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1248298615

   > Could you tell me which lucene's files should I read, so I could implement that algorithm?
   
   I think you could first read  `IndexSortSortedNumericDocValuesRangeQuery`, then you would understand more about that paper.  I would also be more than happy to learn from each other about Lucene with [WeChat](https://www.amazingkoala.com.cn/Lucene/2018/1204/22.html).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tang-hi commented on issue #11770: Optimization for time series data

Posted by GitBox <gi...@apache.org>.
tang-hi commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1247622672

   Hi,LuXugang.
      I have roughly read that paper. And I think it has a lot of  interesting optimizations for lucene.
     I' m really interested about **the reverse binary search algorithm for tail queries**  which was mentioned in paper,although I am not quite familiar with lucene's query implmentation😭. 
    Could you tell me which lucene's files should I read, so I could implement that algorithm?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] tang-hi commented on issue #11770: Optimization for time series data

Posted by GitBox <gi...@apache.org>.
tang-hi commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1250180597

   > > Could you tell me which lucene's files should I read, so I could implement that algorithm?
   > 
   > Hi, @tang-hi . I think you could first read `IndexSortSortedNumericDocValuesRangeQuery`, then you would understand more about that paper. I would also be more than happy to learn from each other about Lucene with [WeChat](https://www.amazingkoala.com.cn/Lucene/2018/1204/22.html).
   
   thanks!I will read it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org