You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by maedhroz <gi...@git.apache.org> on 2016/06/29 19:36:02 UTC

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

GitHub user maedhroz opened a pull request:

    https://github.com/apache/lucene-solr/pull/47

    SOLR-8858 SolrIndexSearcher#doc() Completely Ignores Field Filters Unless Lazy Field Loading is Enabled

    https://issues.apache.org/jira/browse/SOLR-8858

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maedhroz/lucene-solr SOLR-8858-trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #47
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by maedhroz <gi...@git.apache.org>.

Github user maedhroz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69510618
  
    --- Diff: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java ---
    @@ -766,16 +766,29 @@ public Document doc(int i, Set<String> fields) throws IOException {
         }
     
         final DirectoryReader reader = getIndexReader();
    -    if (!enableLazyFieldLoading || fields == null) {
    -      d = reader.document(i);
    +    boolean containsAllFields = true;
    +    
    +    if (fields != null) {
    +      if (enableLazyFieldLoading) {
    +        final SetNonLazyFieldSelector visitor = new SetNonLazyFieldSelector(fields, reader, i);
    +        reader.document(i, visitor);
    +        d = visitor.doc;
    +      } else {
    +        d = reader.document(i, fields);
    +        containsAllFields = false;
    +      }
         } else {
    -      final SetNonLazyFieldSelector visitor = new SetNonLazyFieldSelector(fields, reader, i);
    -      reader.document(i, visitor);
    -      d = visitor.doc;
    +      d = reader.document(i);
         }
     
         if (documentCache != null) {
    -      documentCache.put(i, d);
    +      // Only cache the already retrieved document if it is complete... 
    +      if (containsAllFields) {
    +        documentCache.put(i, d);
    +      } else {
    +        // ...and retrieve a complete document for caching otherwise.
    +        documentCache.put(i, reader.document(i));
    +      }
    --- End diff --
    
    @shalinmangar I've made a pass at restoring the previous caching behavior (with some comments around the rationale). It feels like we might have a complete solution at this point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by shalinmangar <gi...@git.apache.org>.

Github user shalinmangar commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69263675
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    I don't understand this change. Why should QueryComponent know about whether lazy loading is enabled or not?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by maedhroz <gi...@git.apache.org>.

Github user maedhroz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69787760
  
    --- Diff: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java ---
    @@ -766,12 +766,16 @@ public Document doc(int i, Set<String> fields) throws IOException {
         }
     
         final DirectoryReader reader = getIndexReader();
    -    if (!enableLazyFieldLoading || fields == null) {
    -      d = reader.document(i);
    +    if (fields != null) {
    +      if (enableLazyFieldLoading) {
    +        final SetNonLazyFieldSelector visitor = new SetNonLazyFieldSelector(fields, reader, i);
    +        reader.document(i, visitor);
    +        d = visitor.doc;
    +      } else {
    +        d = reader.document(i, fields);
    --- End diff --
    
    Right. I *think* I addressed that in the last commit, since only full documents are cached now. The problem is that the overhead of doing this might be unacceptable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by shalinmangar <gi...@git.apache.org>.

Github user shalinmangar commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69419263
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    Thanks for investigating and the detailed note.
    
    I don't think DistribJoinFromCollectionTest or TestCloudDeleteByQuery are doing anything wrong. The difference, as you said, is that your patch changes `doc()` to actually respect the fields list and when lazy loading is disabled, proceeds to cache an _incomplete_ document returned by Lucene. So this patch changes Solr from always caching either complete documents (when lazy loading is disabled) or lazy documents (when lazy loading is enabled) to caching potentially incomplete documents which have no idea about the other (non-requested) fields. So if we want to go back to the old behavior then option 'a' is the way to go when lazy loading is disabled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by dsmiley <gi...@git.apache.org>.

Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69763447
  
    --- Diff: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java ---
    @@ -766,12 +766,16 @@ public Document doc(int i, Set<String> fields) throws IOException {
         }
     
         final DirectoryReader reader = getIndexReader();
    -    if (!enableLazyFieldLoading || fields == null) {
    -      d = reader.document(i);
    +    if (fields != null) {
    +      if (enableLazyFieldLoading) {
    +        final SetNonLazyFieldSelector visitor = new SetNonLazyFieldSelector(fields, reader, i);
    +        reader.document(i, visitor);
    +        d = visitor.doc;
    +      } else {
    +        d = reader.document(i, fields);
    --- End diff --
    
    This could lead to a bug when there is a document cache, since we'd cache a partial document without lazy loading.  Then imagine a subsequent doc(i,otherFields), is called and then a document is returned without those fields even if the doc on disk might actually has those fields.
    
    On line 770 if (enableLazyFieldLoading) could become: if (enableLazyFieldLoading || documentCache != null).  In this sense, "enableLazyFieldLoading" would have no effect unless there is no doc cache... I'm not sure what to think of that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by maedhroz <gi...@git.apache.org>.

Github user maedhroz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69373171
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    In the current master (without my patch), the query stage shard request for join in `DistribJoinFromCollectionTest` will pull the document from `SolrIndexSearcher#doc()' with only `id` in the specified `fields`. It does not use lazy field loading, and so uses a `DocumentStoredFieldVisitor` with no `fields` specified to load the whole document, and then put it in the `documentCache`. If we used lazy field loading, the cached document would still have some representation of all the fields, albeit lazy ones.
    
    With only the `SolrIndexSearcher` piece of my patch in place, the `TestSubQueryTransformer` failures are easy to avoidl, and I was able to fix them by simply reading the JavaDoc. (See the [comment](https://github.com/apache/lucene-solr/pull/47/files/4f9e67c63ce5130795df647ef5e86ae970601cb6#r69015716) below.) `DistribJoinFromCollectionTest` (and `TestCloudDeleteByQuery`) fails, because when, as I've laid out above, `doc()` actually respects the `fields` list during the main query phase, it caches a document that *only contains those fields*. When the actual field retrieval stage of the query hits the shard, `doc()` spits out a document that doesn't have the all fields in `fl`. (I'm not sure `DistribJoinFromCollectionTest` or `TestCloudDeleteByQuery` are doing something wrong, and they actually *pass* if they enable lazy field loading.)
    
    The reason I raised this issue in the first place is that I have a custom `StoredFieldsVisitor` that relies on `DocumentStoredFieldVisitor` providing the fields requested by the query. The unfortunate thing is that I think the `QueryComponent` bit of this PR isn't actually compatible with that, and I think that will need to be reverted no matter what. The only other ways I can imagine fixing this are:
    
    a.) Always cache an entire document, regardless of what we return from `doc()`. (Seems like it adds overhead.)
    b.) Skip caching under certain conditions, like if the `fields` list only contains the unique key (or key and score). (Seems very reliant on `QueryComponent` still.)
    c.) Always use lazy loading. (Seems invasive, but most of the examples I see use it anyway.)
    
    I don't love any of these options, but I'd be interested to get more informed opinions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by shalinmangar <gi...@git.apache.org>.

Github user shalinmangar commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69316868
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    I can see why this change was made but my point is that there is probably an assumption in the join queries or in the test (I haven't looked) which make it fail without this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by maedhroz <gi...@git.apache.org>.

Github user maedhroz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69363380
  
    --- Diff: solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    I agree that what I have here is unsatisfying. I've been debugging through the fields retrieval phase of the join in `DistribJoinFromCollectionTest`, so hopefully that turns up something...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/lucene-solr/pull/47


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org