You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2014/05/16 12:32:28 UTC
[jira] [Commented] (LUCENE-5675) "ID postings format"
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998862#comment-13998862 ]
ASF subversion and git services commented on LUCENE-5675:
---------------------------------------------------------
Commit 1594960 from [~rcmuir] in branch 'dev/branches/lucene5675'
[ https://svn.apache.org/r1594960 ]
LUCENE-5675: create branch for playing around
> "ID postings format"
> --------------------
>
> Key: LUCENE-5675
> URL: https://issues.apache.org/jira/browse/LUCENE-5675
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Robert Muir
>
> Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter.
> To some extend BlockTree can "sometimes" help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory.
> I don't think we are using everything we know: particularly the version semantics.
> Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version < V in that segment very efficiently.
> Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit.
> As far as API, i think for users to provide "IDs with versions" to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a "consumer" of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org