You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2017/01/25 19:40:27 UTC
[jira] [Comment Edited] (LUCENE-7659) IndexWriter should expose field names

    [ https://issues.apache.org/jira/browse/LUCENE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838422#comment-15838422 ] 

Ishan Chattopadhyaya edited comment on LUCENE-7659 at 1/25/17 7:40 PM:
-----------------------------------------------------------------------

Thanks [~jpountz] for looking into this.

bq. If I understand the Solr issue correctly, your use-case is to check whether an update can be applied using dv-updates only, or whether it requires an regular update. Do I get it right?
Yes, exactly.

bq. maybe a better way to address this use-case would be to either try the dv-only update and fallback to a regular update if it failed
There are few issues with that approach: 1. When a user's command comes in, it has operations like ("set": 3), or ("inc": 5). At the UpdateProcessor, we resolve it to a merged document (either partial document, or a regular full document) by pulling the last document from the index (or transaction log) to merge the command with that document. We then send the "resolved" document (partial or full) to the DirectUpdateHandler, which performs the IW update. However, by this time, if the IW were to throw an exception for a partial update from the IW.updateDocValues() method, we have already lost the information about the original operation ("set", "inc" etc.), but instead just have the merged values.
2. The second problem is that if we wish to handle the exception for IW.updateDocValues() and decide to fallback on regular update, we could now potentially be merging against a different previous document than the one that was merged with in the failed attempt. 3. The performance cost of a regular update would increase due to merging twice against the previously indexed document.

bq. change the semantics of dv updates to create fields if they did not exist already
I agree that this is the cleanest way forward. From the IndexWriter's API standpoint, I think it would certainly be cleanest if updateDocValues() method were to create non-existent DVs. Till the time we have such functionality in the updateDocValues() method, do you think we could expose the field names through a method marked as internal and/or experimental, with the intention of phasing it out after we have such functionality in IW's updateDocValues()?


was (Author: ichattopadhyaya):
Thanks [~jpountz] for looking into this.

bq. If I understand the Solr issue correctly, your use-case is to check whether an update can be applied using dv-updates only, or whether it requires an regular update. Do I get it right?
Yes, exactly.

bq. maybe a better way to address this use-case would be to either try the dv-only update and fallback to a regular update if it failed
There are few issues with that approach: 1. When a user's command comes in, it has operations like {"set": 3}, or {"inc": 5}. At the UpdateProcessor, we resolve it to a merged document (either partial document, or a regular full document) by pulling the last document from the index (or transaction log) to merge the command with that document. We then send the "resolved" document (partial or full) to the DirectUpdateHandler, which performs the IW update. However, by this time, if the IW were to throw an exception for a partial update from the IW.updateDocValues() method, we have already lost the information about the original operation ("set", "inc" etc.), but instead just have the merged values.
2. The second problem is that if we wish to handle the exception for IW.updateDocValues() and decide to fallback on regular update, we could now potentially be merging against a different previous document than the one that was merged with in the failed attempt. 3. The performance cost of a regular update would increase due to merging twice against the previously indexed document.

bq. change the semantics of dv updates to create fields if they did not exist already
I agree that this is the cleanest way forward. From the IndexWriter's API standpoint, I think it would certainly be cleanest if updateDocValues() method were to create non-existent DVs. Till the time we have such functionality in the updateDocValues() method, do you think we could expose the field names through a method marked as internal and/or experimental, with the intention of phasing it out after we have such functionality in IW's updateDocValues()?

> IndexWriter should expose field names
> -------------------------------------
>
>                 Key: LUCENE-7659
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7659
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ishan Chattopadhyaya
>         Attachments: LUCENE-7659.patch
>
>
> While working on SOLR-5944, I needed a way to know whether applying an update to a DV is possible (i.e. the DV exists or not), while deciding upon whether or not to apply the update as an in-place update or a regular full document update. This information is present at the IndexWriter in a FieldInfos instance, and can be exposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org