You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by moshebla <gi...@git.apache.org> on 2018/07/12 11:15:46 UTC

[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

GitHub user moshebla opened a pull request:

    https://github.com/apache/lucene-solr/pull/416

    WIP: SOLR-12519

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/moshebla/lucene-solr SOLR-12519-3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #416
    
----
commit 2412a96fdb2c856adcfcb0bbccdaa336ef0b224c
Author: user <us...@...>
Date:   2018-06-05T08:58:10Z

    first tests

commit e8f41e33ec99c2b7c6c565f5ea30c80d4739e84f
Author: user <us...@...>
Date:   2018-06-06T00:31:47Z

    only index fields from conf

commit 446f05a711add0632484449c96532e56b53793fc
Author: Moshe <mo...@...>
Date:   2018-06-06T13:36:31Z

    SOLR-12441: tests with query

commit 9128f5955c9034ceec7d8fb8614287b10adc1172
Author: Moshe <mo...@...>
Date:   2018-06-11T07:54:56Z

    SOLR-12441: use EnumSet for conf

commit 5ecd071c5c97b9120f3a9a633dc5542919575d6b
Author: Moshe <mo...@...>
Date:   2018-06-25T09:03:17Z

    SOLR-12441: add tests for children too

commit d0dc305a5a918bf02a88dd5887e91d13b10fe92f
Author: Moshe <mo...@...>
Date:   2018-06-26T06:29:45Z

    SOLR-12441: use iterator for field values

commit 1b115cdc7846647859ce2dbaf46ef316422e8bd8
Author: Moshe <mo...@...>
Date:   2018-06-26T07:02:58Z

    SOLR-12441: remove NestedFlags.ALL and LEVEL_FIELD_NAME

commit 4942520dee2312c4428d3e1aacdaa561049525b7
Author: Moshe <mo...@...>
Date:   2018-06-26T08:06:05Z

    SOLR-12441: raise exception if fieldName contains splitChar

commit 2cad6c95fac369ce99b3a4a113162ad1cd2e6e8e
Author: Moshe <mo...@...>
Date:   2018-06-26T08:25:56Z

    SOLR-12441: change addField to setField

commit 8f57c21070678804cf2aa487e79faf6bb1d3fbc4
Author: Moshe <mo...@...>
Date:   2018-06-26T08:37:58Z

    SOLR-12441: make nestedurp test exception less doc specific

commit 9e3cd08149c47f86c24caba0946033a98091be95
Author: Moshe <mo...@...>
Date:   2018-06-26T09:39:44Z

    SOLR-12441: rename to nestedUpdateProcessor

commit 14b676f0124c6be92fbb7d900c97b3616e045e88
Author: Moshe <mo...@...>
Date:   2018-06-26T10:11:20Z

    SOLR-12441: config param fix error message

commit 421fbccd5586b19a0d523e2d9f97bc67b4f1d756
Author: user <us...@...>
Date:   2018-06-04T05:22:29Z

    add skip and limit stream

commit 6c4c767d8999a0ed076674de99df84133a5e53c8
Author: Moshe <mo...@...>
Date:   2018-06-26T13:49:12Z

    SOLR-12441: remove "Deeply" from urp

commit b0119523a3cc4d31224152dd76c0d0065d0b49fc
Author: Moshe <mo...@...>
Date:   2018-06-26T14:42:32Z

    SOLR-12441: replace EnumSet config with two booleans

commit d7448f293573d9a2a45a958fb45236539915b5e6
Author: Moshe <mo...@...>
Date:   2018-06-26T14:46:29Z

    SOLR-12441: all test use PATH_SEP_CHAR constant

commit 8c2b1589a61efcc6902aa634316dacfddde122ef
Author: Moshe <mo...@...>
Date:   2018-06-28T05:36:37Z

    SOLR-12441: failing test for idless child docs

commit 1600fa14034cacc5814e1b3ba8c61868e02ddf01
Author: Moshe <mo...@...>
Date:   2018-07-02T06:26:26Z

    SOLR-12441: req.getSchema().getUniqueKeyField().getName() as field

commit f773bffa9d0b0e65c07351643ff6c8e630e7e61a
Author: Moshe <mo...@...>
Date:   2018-07-02T06:49:03Z

    SOLR-12441: combined moved NestedUpdateProcessor into NestedUpdateProcessorFactory

commit 9f46adee1962b491c61507e07222b5b183da1b6c
Author: Moshe <mo...@...>
Date:   2018-07-02T07:10:36Z

    SOLR-12441: use updateJ for NestedUpdateProcessor tests

commit ed4b0f264d6e88eef301f6c6c28eb47e4ddeb712
Author: Moshe <mo...@...>
Date:   2018-07-03T11:29:50Z

    SOLR-12441: use string concatenation

commit b77645abd628babb8ea26a29b96c117ddcc02a95
Author: Moshe <mo...@...>
Date:   2018-07-04T13:03:52Z

    SOLR-12519: add transformer and initial tests

commit 5388c8fa195705faf93fae9a76fea1e442601a68
Author: Moshe <mo...@...>
Date:   2018-07-04T13:04:35Z

    SOLR-12519: add deeply nested transformer to transformer factory

commit c2061d66776b54847af58a0cbffe22c264bc703e
Author: Moshe <mo...@...>
Date:   2018-07-04T13:05:05Z

    SOLR-12519: do not iterate over SolrDocument when adding a child doc

commit 7cb6730eec69ebffb7f5bf6cae04aae237828ba7
Author: Moshe <mo...@...>
Date:   2018-07-04T13:56:28Z

    SOLR-12441: fix nestedupdateprocessor id generator algorithm

commit 9823c92d9ce4a9e5aac296256f0dce5bacfa34c0
Author: Moshe <mo...@...>
Date:   2018-07-04T13:58:29Z

    SOLR-12441: fix id less child docs test for nestedUpdateProcessor

commit 1dc731d4c2360450eaece5948984d45e3b040b27
Author: Moshe <mo...@...>
Date:   2018-07-05T05:02:33Z

    SOLR-12441: use string concat for NestedUpdateProcessor.generateChildUniqueId

commit 98408761ebb593401f58d5c03be5fcfd543329e8
Author: Moshe <mo...@...>
Date:   2018-07-05T05:02:56Z

    SOLR-12441: improve id less child doc test

commit 1fed36e0e67a8ed548ad1dc897e116500087124d
Author: Moshe <mo...@...>
Date:   2018-07-05T05:05:21Z

    SOLR-12441: add "NEST" prefix to nested internal fields var name

commit 4276cf917e5b29a12c1ca93b314deda827ba6547
Author: Moshe <mo...@...>
Date:   2018-07-05T05:47:00Z

    SOLR-12441: add num sep char to nested URP

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    It has been a while since I last worked on this,
    and after the merge with your branch some comments were overwritten.
    Do you remember what changes are still needed?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I hope this is in a good enough shape for review.
    There was an overlap between this ticket and the XMLLoader, so I hope I did not miss anything.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r203718591
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformerFactory.java ---
    @@ -0,0 +1,367 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.regex.Pattern;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.QueryBitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.common.SolrException;
    +import org.apache.solr.common.SolrException.ErrorCode;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.StrUtils;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.QParser;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.apache.solr.search.SyntaxError;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +import static org.apache.solr.schema.IndexSchema.ROOT_FIELD_NAME;
    +
    +/**
    + *
    + * @since solr 4.9
    + *
    + * This transformer returns all descendants of each parent document in a flat list nested inside the parent document.
    + *
    + *
    + * The "parentFilter" parameter is mandatory.
    + * Optionally you can provide a "childFilter" param to filter out which child documents should be returned and a
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * Examples -
    + * [child parentFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue" limit=20]
    + */
    +public class DeeplyNestedChildDocTransformerFactory extends TransformerFactory {
    --- End diff --
    
    FWIW I'm not convinced we need a distinct class from ChildDocTransformer.  I think CDT is fine... the part that changes is what follows obtaining the matching child doc iterator.   That could split out to two different methods -- plain/flat (current) and nested with ancestors (new).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208807039
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    This is what I would expect would happen. I am going to try it to check this works as expected. Would not want to suddenly change this for everyone :worried: 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206901995
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -168,35 +172,57 @@ private static String id() {
         return "" + counter.incrementAndGet();
       }
     
    +  private static void cleanSolrDocumentFields(SolrDocument input) {
    +    for(Map.Entry<String, Object> field: input) {
    +      Object val = field.getValue();
    +      if(val instanceof Collection) {
    +        Object newVals = ((Collection) val).stream().map((item) -> (cleanIndexableField(item)))
    +            .collect(Collectors.toList());
    +        input.setField(field.getKey(), newVals);
    +        continue;
    +      } else {
    +        input.setField(field.getKey(), cleanIndexableField(field.getValue()));
    +      }
    +    }
    +  }
    +
    +  private static Object cleanIndexableField(Object field) {
    +    if(field instanceof IndexableField) {
    +      return ((IndexableField) field).stringValue();
    +    } else if(field instanceof SolrDocument) {
    +      cleanSolrDocumentFields((SolrDocument) field);
    +    }
    +    return field;
    +  }
    +
       private static String grandChildDocTemplate(int id) {
         int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    -    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "toppings=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 3) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Regular>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 4) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], " +
    -        "_nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 3) + ">, _root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 5) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Chocolate>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 6) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5)+ ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 7) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5) + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}]}";
    +    return "SolrDocument{id="+ id + ", type_s=[" + types[docNum % types.length] + "], name_s=[" + names[docNum % names.length] + "], " +
    --- End diff --
    
    Keeping one ID is fine; we certainly don't need additional ones.  Maybe consider using letters or names for IDs instead of incrementing counters.  Anything to help make reading a doc/child structure more readily apparent.  Anything to reduce string interpolation here is also a win IMO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209068211
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    I'm really not a fan of {{anonChildDocs}} flag; I regret I conjured up the idea.  If we have "nest" schema fields, the user wants nested documents (including field/label association), if the schema doesn't it ought to work as it used to.  I think this is straight-forward to reason about and document.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211256948
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -264,7 +309,7 @@ private static Object cleanIndexableField(Object field) {
       }
     
       private static String grandChildDocTemplate(int id) {
    -    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    int docNum = (id / sumOfDocsPerNestedDocument) % numberOfDocsPerNestedTest; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    --- End diff --
    
    This is looking kinda complicated now (same for fullNestedDocTemplate).  Does it really matter how the type_s value is chosen exactly?  I don't know; I confess to have glossed over this aspect of the test; I don't get the point.  I wonder if whatever the test is trying to truly test here might be simplified to go about it in some other means.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210309845
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -227,27 +225,28 @@ private static String getPathByDocId(int segDocId, SortedDocValues segPathDocVal
     
       /**
        *
    -   * @param segDocBaseId base docID of the segment
    -   * @param RootId docID if the current root document
    -   * @param lastDescendantId lowest docID of the root document's descendant
    -   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   * @param RootDocId docID if the current root document
    +   * @param lowestChildDocId lowest docID of the root document's descendant
    +   * @return the docID to loop and not surpass limit of descendants to match specified by query
        */
    -  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    -    int i = segDocBaseId + RootId - 1; // the child document with the highest docID
    -    final int prevSegRootId = segDocBaseId + lastDescendantId;
    -    assert prevSegRootId < i; // previous rootId should be smaller then current RootId
    +  private int calcDocIdToIterateFrom(int lowestChildDocId, int RootDocId) {
    +    assert lowestChildDocId < RootDocId; // first childDocId should be smaller then current RootId
    --- End diff --
    
    Yes it would,
    I'll add a test to ensure its behaviour.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204771149
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,163 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    --- End diff --
    
    As I've suggested in a previous issue, I think it's simpler and more complete to test by evaluating the entire result as a string compared to an expected string.  Otherwise, as a reviewer I'm asking myself what assertions would be useful for this query that you didn't think of.  I think this test philosophy here is especially valuable because fundamentally you're building this interesting structured response.  Essentially everything in the response is pertinent to the test -- contains very little immaterial noise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205477315
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by DeeplyNestedChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private String getLastPath(String path) {
    +    if(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) == -1) {
    --- End diff --
    
    You are double-calculating the lastIndexOf in this method.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204764287
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -91,15 +100,37 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
     
         Query childFilterQuery = null;
         if(childFilter != null) {
    -      try {
    -        childFilterQuery = QParser.getParser( childFilter, req).getQuery();
    -      } catch (SyntaxError syntaxError) {
    -        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +      if(buildHierarchy) {
    +        childFilter = buildHierarchyChildFilterString(childFilter);
    +        return new DeeplyNestedChildDocTransformer(field, parentsFilter, req,
    +            getChildQuery(childFilter, req), limit);
           }
    +      childFilterQuery = getChildQuery(childFilter, req);
    +    } else if(buildHierarchy) {
    +      return new DeeplyNestedChildDocTransformer(field, parentsFilter, req, null, limit);
         }
     
         return new ChildDocTransformer( field, parentsFilter, uniqueKeyField, req.getSchema(), childFilterQuery, limit);
       }
    +
    +  private static Query getChildQuery(String childFilter, SolrQueryRequest req) {
    +    try {
    +      return QParser.getParser( childFilter, req).getQuery();
    +    } catch (SyntaxError syntaxError) {
    +      throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +    }
    +  }
    +
    +  protected static String buildHierarchyChildFilterString(String queryString) {
    --- End diff --
    
    When writing parsing code like this, it helps tremendously to add a comment showing the example input.  Here you could also comment on what the resulting query would be.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204813187
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    It's probably easiest to remove it later, probably at the point we add children to document.  Or try the Multimap<String<MultiMap<String,SolrDocument>> idea I had, wherein the intermediate String is the label.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213292329
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -109,9 +109,14 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           // Loop each child ID up to the parent (exclusive).
           for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) {
     
    -        // get the path.  (note will default to ANON_CHILD_KEY if not in schema or oddly blank)
    +        // get the path.  (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank)
             String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
     
    +        if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) {
    +          // is not a descendant of the transformed doc, return fast.
    +          return;
    --- End diff --
    
    woah; shouldn't this be "continue"?  We should have a test that would fail on this bug.  All it would take would be an additional child doc that is not underneath the input root/transformed doc but follows it (as provided on input).  Some of the first docs we iterate over here might not descend from rootDocId


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210263030
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,253 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      IndexSchema schema = searcher.getSchema();
    +      SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +      Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +      final int lastChildId = segBaseId + segPrevRootId + 1;
    +      boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +
    +      // Loop each child ID up to the parent (exclusive).
    +      for (int docId = limit == - 1? lastChildId: calcLimitIndex(segBaseId, segRootId, segPrevRootId + 1); docId < rootDocId; ++docId) {
    +
    +        // get the path.  (note will default to ANON_CHILD_KEY if not in schema or oddly blank)
    +        String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +        // Is this doc a direct ancestor of another doc we've seen?
    +        boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +        // Do we need to do anything with this doc (either ancestor or matched the child query)
    +        if (isAncestor || childDocSet == null || childDocSet.exists(docId)) {
    +          // load the doc
    +          SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId), schema, childReturnFields);
    +          if (shouldDecorateWithDVs) {
    +            docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +          }
    +
    +          if (isAncestor) {
    +            // if this path has pending child docs, add them.
    +            addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +          }
    +
    +          // get parent path
    +          String parentDocPath = getParentPath(fullDocPath);
    +          String lastPath = getLastPath(fullDocPath);
    +          // put into pending:
    +          // trim path if the doc was inside array, see trimPathIfArrayDoc()
    +          // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +          // -> inner MultiMap key ingredients
    +          // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +          // -> inner MultiMap key lonelyGrandChild#
    +          pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +              .put(trimLastPoundIfArray(lastPath), doc); // multimap add (won't replace)
    +        }
    +      }
    +
    +      // only children of parent remain
    +      assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +      addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +
    +    } catch (IOException e) {
    +      //TODO DWS: reconsider this unusual error handling approach; shouldn't we rethrow?
    +      log.warn("Could not fetch child documents", e);
    +      rootDoc.put(getName(), "Could not fetch child documents");
    +    }
    +  }
    +
    +  private static void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  private static void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // if no paths; we do not need to add the child document's relation to its parent document.
    +    if (cDocsPath.equals(ANON_CHILD_KEY)) {
    +      parent.addChildDocuments(children);
    +      return;
    +    }
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by ChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private static String getLastPath(String path) {
    +    int lastIndexOfPathSepChar = path.lastIndexOf(PATH_SEP_CHAR);
    +    if(lastIndexOfPathSepChar == -1) {
    +      return path;
    +    }
    +    return path.substring(lastIndexOfPathSepChar + 1);
    +  }
    +
    +  private static String trimLastPoundIfArray(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    final int indexOfSepChar = path.lastIndexOf(NUM_SEP_CHAR);
    +    if (indexOfSepChar == -1) {
    +      return path;
    +    }
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = indexOfSepChar == lastIndex;
    +    return singleDocVal ? path: path.substring(0, indexOfSepChar);
    +  }
    +
    +  private static String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path : path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  private static String getParentPath(String currDocPath) {
    +    // chop off leaf (after last '/')
    +    // if child of leaf then return null (special value)
    +    int lastPathIndex = currDocPath.lastIndexOf(PATH_SEP_CHAR);
    +    return lastPathIndex == -1 ? null : currDocPath.substring(0, lastPathIndex);
    +  }
    +
    +  /** Looks up the nest path.  If there is none, returns {@link #ANON_CHILD_KEY}. */
    +  private static String getPathByDocId(int segDocId, SortedDocValues segPathDocValues) throws IOException {
    +    int numToAdvance = segPathDocValues.docID() == -1 ? segDocId : segDocId - (segPathDocValues.docID());
    +    assert numToAdvance >= 0;
    +    boolean advanced = segPathDocValues.advanceExact(segDocId);
    +    if (!advanced) {
    +      return ANON_CHILD_KEY;
    +    }
    +    return segPathDocValues.binaryValue().utf8ToString();
    +  }
    +
    +  /**
    +   *
    +   * @param segDocBaseId base docID of the segment
    +   * @param RootId docID if the current root document
    +   * @param lastDescendantId lowest docID of the root document's descendant
    +   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   */
    +  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    --- End diff --
    
    The reason the sort is docID descending, is because this method initialises the index to the highest docID to match exhaust the limit, if possible, or the last descendant document.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206012956
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -168,35 +172,57 @@ private static String id() {
         return "" + counter.incrementAndGet();
       }
     
    +  private static void cleanSolrDocumentFields(SolrDocument input) {
    +    for(Map.Entry<String, Object> field: input) {
    +      Object val = field.getValue();
    +      if(val instanceof Collection) {
    +        Object newVals = ((Collection) val).stream().map((item) -> (cleanIndexableField(item)))
    +            .collect(Collectors.toList());
    +        input.setField(field.getKey(), newVals);
    +        continue;
    +      } else {
    +        input.setField(field.getKey(), cleanIndexableField(field.getValue()));
    +      }
    +    }
    +  }
    +
    +  private static Object cleanIndexableField(Object field) {
    +    if(field instanceof IndexableField) {
    +      return ((IndexableField) field).stringValue();
    +    } else if(field instanceof SolrDocument) {
    +      cleanSolrDocumentFields((SolrDocument) field);
    +    }
    +    return field;
    +  }
    +
       private static String grandChildDocTemplate(int id) {
         int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    -    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "toppings=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 3) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Regular>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 4) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], " +
    -        "_nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 3) + ">, _root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 5) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Chocolate>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 6) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5)+ ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 7) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5) + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}]}";
    +    return "SolrDocument{id="+ id + ", type_s=[" + types[docNum % types.length] + "], name_s=[" + names[docNum % names.length] + "], " +
    --- End diff --
    
    It's a shame to have all this embedded ID calculation business. During cleaning can they be removed (both "id" and nest parent and root) and we still have enough distinguishing characteristics of the docs to know which is which?  Seems that way.  It adds a lot of noise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    OK, We'll scratch that for now, and discuss this in a separate issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205126064
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -132,54 +126,49 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
                 // load the doc
                 SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
                     schema, new SolrReturnFields());
    -            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
                 if (shouldDecorateWithDVs) {
                   docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
                 }
                 // get parent path
                 // put into pending
                 String parentDocPath = lookupParentPath(fullDocPath);
    -            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
     
    -            // if this path has pending child docs, add them.
    -            if (isAncestor) {
    -              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    -              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
                 }
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimIfSingleDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
               }
             }
     
             // only children of parent remain
             assert pendingParentPathsToChildren.keySet().size() == 1;
     
    -        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
           }
         } catch (IOException e) {
           rootDoc.put(getName(), "Could not fetch child Documents");
         }
       }
     
    -  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    -    // lookup leaf key for these children using path
    -    // depending on the label, add to the parent at the right key/label
    -    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    -    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    -    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    -      List<SolrDocument> list = new ArrayList<>();
    -      parent.setField(trimmedPath, list);
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    --- End diff --
    
    Ah, I see (I didn't look at Multimap's iteration options when I wrote that).  Your code here is good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I added extra logic to return as soon as we are able to determine that the root doc has no child documents.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204765898
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    +    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    +      List<SolrDocument> list = new ArrayList<>();
    +      parent.setField(trimmedPath, list);
    +    }
    +    parent.addField(trimmedPath, child);
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child) {
    +    String docPath = getSolrFieldString(child.getFirstValue(NEST_PATH_FIELD_NAME), schema.getFieldType(NEST_PATH_FIELD_NAME));
    --- End diff --
    
    I don't think we should assume that materialized SolrDocument contains any particular field.  The nest fields are internal details and furthermore one day this transformer will likely have an "fl".  So this means you should use DocValues to fetch this field.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208506508
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -66,34 +66,34 @@ public void testAllParams() throws Exception {
       private void testChildDoctransformerXML() {
         String test1[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='2'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[3]/str[@name='id']='4'" ,
    -        "/response/result/doc[1]/doc[4]/str[@name='id']='5'" ,
    -        "/response/result/doc[1]/doc[5]/str[@name='id']='6'" ,
    -        "/response/result/doc[1]/doc[6]/str[@name='id']='7'"};
    +        "/response/result/doc[1]/arr[@name='_childDocuments_']/doc[1]/str[@name='id']='2'" ,
    --- End diff --
    
    One major difference is the way nested XML is returned, since now it has a key it belongs to.
    Thought it would make the review process easier for you if I pointed these differences out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    In my implementation I have used a Multimap<String,SolrDocument> for the pending data, so we can decide if we put the number in the array of the parent's key. That way, There can be an option to add all parents if the child documents parent is inside a list, or only the child doc's direct parent. This can be done by the way we store the pending child doc's key, with or with out the array index.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213541579
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -124,10 +124,11 @@ public void testParentFilterLimitJSON() throws Exception {
     
         assertJQ(req("q", "type_s:donut",
             "sort", "id asc",
    -        "fl", "id, type_s, toppings, _nest_path_, [child limit=1]",
    +        "fl", "id, type_s, lonely, lonelyGrandChild, test_s, test2_s, _nest_path_, [child limit=1]",
    --- End diff --
    
    To my point I wrote in JIRA:  It's sad that when I see this I have no idea if it's right/wrong without having to go look at indexSampleData then think about it.  No?  (this isn't a critique of you in particular; lots of tests including some I've written look like the current tests here).   imagine one doc with some nested docs, all of which only have their ID.  Since they only have their ID, it's not a lot of literal text in JSON.  The BeforeClass unmatched docs cold use negative IDs to easily know who's who.  Any way if you would rather leave this as a "TODO" for another day then I understand.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205403130
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,163 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    --- End diff --
    
    I have just updated this test, hopefully it is a lot better now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    > what if the user also requests all the comments which in the same thread, thus being in the same array just a path above?
    
    Sounds like a distinct issue.  It'd add some complexity... for example the very first child doc ID might be the 2nd child of its parent.  We'd need to grab the first doc of the parent, which would have a docID earlier than we have even seen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209071109
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/schema15.xml ---
    @@ -567,7 +567,17 @@
       <field name="_root_" type="string" indexed="true" stored="true"/>
       <!-- required for NestedUpdateProcessor -->
       <field name="_nest_parent_" type="string" indexed="true" stored="true"/>
    -  <field name="_nest_path_" type="string" indexed="true" stored="true"/>
    +  <field name="_nest_path_" type="descendants_path" indexed="true" multiValued="false" docValues="true" stored="false" useDocValuesAsStored="false"/>
    +  <fieldType name="descendants_path" class="solr.SortableTextField">
    +    <analyzer type="index">
    +      <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(^.*.*$)" replacement="$0/"/>
    --- End diff --
    
    Can you please provide an example input String (here in GH) that has multiple levels and comment out it looks like when it's done?  I know how to read regexps but I need to stare at them a bit to figure them out, so lets make it easier to read/review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    Before I look further I want to mention 2 things:
    * There needn't be separation between the non-nested and nested algorithm.  We could simply use the nested algorithm but special-case when the SortedDocValues we get is null due to non-existent path.  In that event, all docs get added to the root, anonymously (unlabelled).  I suspect doing it this way is less code and will be sufficiently clear but we'll see?
    * My pseudocode had a TODO about double-resolving the path in order to get the label at the time the docs are added to the parent.  If we make the pending data structure a Multimap<Integer,Multimap<String,SolrDocument>> then at the time we read the document and get the path we could store the relationship at this time.  Or perhaps you have a better idea.  I thought of putting the label into a temporary field of the SolrDocument; though I'm not sure I like that any better; maybe less.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204762445
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -61,6 +65,10 @@
      */
     public class ChildDocTransformerFactory extends TransformerFactory {
     
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +  private static final String sRootFilter = "*:* NOT " + NEST_PATH_FIELD_NAME + ":*";
    --- End diff --
    
    We're discussing indexing NEST_PATH_FIELD_NAME tokenized, which would make this sRootFilter query more expensive.  Additionally, there might not be a nest path for users not tracking the deep nested stuff.  Lets instead use this:  `*:* NOT _ROOT_:*` since that field is also only populated for child docs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206184346
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    Would the transformer need to support the old method of adding childDocuments to the _childDocuments_ field?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204772998
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    --- End diff --
    
    I think this TODO is not an issue based on your placement of the path in the doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211108109
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -87,7 +87,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final int segBaseId = leafReaderContext.docBase;
           final int segRootId = rootDocId - segBaseId;
           final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    -      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +      final int segPrevRootId = rootDocId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (rootDocId - 1)) {
    --- End diff --
    
    Yes,
    I have added this conditional to the test.
    Let's hope I did not misread your intentions :).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    Fixed SOLRJ tests which failed limit assertion for ChildDocTransformer


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    was commited in commit 5a0e7a615a9b1e7ac97c6b0f9e5604dcc1aeb03f


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204802880
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    Should we figure out a way to store it somewhere else?
    Or is this acceptable to remove it later?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211257783
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -40,22 +43,52 @@
       private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
       private static final String[] names = {"Yaz", "Jazz", "Costa"};
       private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static boolean useSegments;
    +  private static int randomDocTopId = 0;
    +  private static String filterOtherSegments;
    --- End diff --
    
    minor point: rename `filterOtherSegments` to `fqToExcludeNontestedDocs` and add a comment explaining *why* we even have non-tested docs -- it's to perturb the Lucene segments a bit to ensure the transformer works with and without docs in other segments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211315461
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -40,22 +43,52 @@
       private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
       private static final String[] names = {"Yaz", "Jazz", "Costa"};
       private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static boolean useSegments;
    +  private static int randomDocTopId = 0;
    +  private static String filterOtherSegments;
     
       @BeforeClass
       public static void beforeClass() throws Exception {
         initCore("solrconfig-update-processor-chains.xml", "schema-nest.xml"); // use "nest" schema
    +    useSegments = random().nextBoolean();
    +    if(useSegments) {
    +      final int numOfDocs = 10;
    +      for(int i = 0; i < numOfDocs; ++i) {
    +        updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +        if(random().nextBoolean()) {
    +          assertU(commit());
    +        }
    +      }
    +      assertU(commit());
    +      randomDocTopId = counter.get();
    +      filterOtherSegments = "{!frange l=" + randomDocTopId + " incl=false}idInt";
    +    } else {
    +      filterOtherSegments = "*:*";
    +    }
       }
     
       @After
       public void after() throws Exception {
    -    clearIndex();
    +    if (!useSegments) {
    --- End diff --
    
    You could keep this if you want (use BeforeClass but have delete query in after()).  But if you do, it could be simplified: Don't conditionally call "clearIndex"; simply always do this delete.  And there's a simpler overloaded version -- `delQ(queryhere)`  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210273819
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,253 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      IndexSchema schema = searcher.getSchema();
    +      SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +      Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +      final int lastChildId = segBaseId + segPrevRootId + 1;
    +      boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +
    +      // Loop each child ID up to the parent (exclusive).
    +      for (int docId = limit == - 1? lastChildId: calcLimitIndex(segBaseId, segRootId, segPrevRootId + 1); docId < rootDocId; ++docId) {
    +
    +        // get the path.  (note will default to ANON_CHILD_KEY if not in schema or oddly blank)
    +        String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +        // Is this doc a direct ancestor of another doc we've seen?
    +        boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +        // Do we need to do anything with this doc (either ancestor or matched the child query)
    +        if (isAncestor || childDocSet == null || childDocSet.exists(docId)) {
    +          // load the doc
    +          SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId), schema, childReturnFields);
    +          if (shouldDecorateWithDVs) {
    +            docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +          }
    +
    +          if (isAncestor) {
    +            // if this path has pending child docs, add them.
    +            addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +          }
    +
    +          // get parent path
    +          String parentDocPath = getParentPath(fullDocPath);
    +          String lastPath = getLastPath(fullDocPath);
    +          // put into pending:
    +          // trim path if the doc was inside array, see trimPathIfArrayDoc()
    +          // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +          // -> inner MultiMap key ingredients
    +          // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +          // -> inner MultiMap key lonelyGrandChild#
    +          pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +              .put(trimLastPoundIfArray(lastPath), doc); // multimap add (won't replace)
    +        }
    +      }
    +
    +      // only children of parent remain
    +      assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +      addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +
    +    } catch (IOException e) {
    +      //TODO DWS: reconsider this unusual error handling approach; shouldn't we rethrow?
    +      log.warn("Could not fetch child documents", e);
    +      rootDoc.put(getName(), "Could not fetch child documents");
    +    }
    +  }
    +
    +  private static void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  private static void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // if no paths; we do not need to add the child document's relation to its parent document.
    +    if (cDocsPath.equals(ANON_CHILD_KEY)) {
    +      parent.addChildDocuments(children);
    +      return;
    +    }
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by ChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private static String getLastPath(String path) {
    +    int lastIndexOfPathSepChar = path.lastIndexOf(PATH_SEP_CHAR);
    +    if(lastIndexOfPathSepChar == -1) {
    +      return path;
    +    }
    +    return path.substring(lastIndexOfPathSepChar + 1);
    +  }
    +
    +  private static String trimLastPoundIfArray(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    final int indexOfSepChar = path.lastIndexOf(NUM_SEP_CHAR);
    +    if (indexOfSepChar == -1) {
    +      return path;
    +    }
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = indexOfSepChar == lastIndex;
    +    return singleDocVal ? path: path.substring(0, indexOfSepChar);
    +  }
    +
    +  private static String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path : path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  private static String getParentPath(String currDocPath) {
    +    // chop off leaf (after last '/')
    +    // if child of leaf then return null (special value)
    +    int lastPathIndex = currDocPath.lastIndexOf(PATH_SEP_CHAR);
    +    return lastPathIndex == -1 ? null : currDocPath.substring(0, lastPathIndex);
    +  }
    +
    +  /** Looks up the nest path.  If there is none, returns {@link #ANON_CHILD_KEY}. */
    +  private static String getPathByDocId(int segDocId, SortedDocValues segPathDocValues) throws IOException {
    +    int numToAdvance = segPathDocValues.docID() == -1 ? segDocId : segDocId - (segPathDocValues.docID());
    +    assert numToAdvance >= 0;
    +    boolean advanced = segPathDocValues.advanceExact(segDocId);
    +    if (!advanced) {
    +      return ANON_CHILD_KEY;
    +    }
    +    return segPathDocValues.binaryValue().utf8ToString();
    +  }
    +
    +  /**
    +   *
    +   * @param segDocBaseId base docID of the segment
    +   * @param RootId docID if the current root document
    +   * @param lastDescendantId lowest docID of the root document's descendant
    +   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   */
    +  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    --- End diff --
    
    (I commented elsewhere that I don't understand your claim about the ordering)
    
    I'm now commenting here on this line to suggest simplifying (at least conceptually) this method.  You're having it take a mixture of arguments with varying segment vs global bases, and I think it can be simplified.  I think this method signature would be simpler as" `calcDocIdToIterateFrom(firstChildDocId, rootDocId, limit)`  and the caller can more easily set docId to this in the loop init part and not conditionally look at limit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r203722021
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformerFactory.java ---
    @@ -0,0 +1,367 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.regex.Pattern;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.QueryBitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.common.SolrException;
    +import org.apache.solr.common.SolrException.ErrorCode;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.StrUtils;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.QParser;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.apache.solr.search.SyntaxError;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +import static org.apache.solr.schema.IndexSchema.ROOT_FIELD_NAME;
    +
    +/**
    + *
    + * @since solr 4.9
    + *
    + * This transformer returns all descendants of each parent document in a flat list nested inside the parent document.
    + *
    + *
    + * The "parentFilter" parameter is mandatory.
    + * Optionally you can provide a "childFilter" param to filter out which child documents should be returned and a
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * Examples -
    + * [child parentFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue" limit=20]
    + */
    +public class DeeplyNestedChildDocTransformerFactory extends TransformerFactory {
    +
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +
    +  @Override
    +  public DocTransformer create(String field, SolrParams params, SolrQueryRequest req) {
    +    SchemaField uniqueKeyField = req.getSchema().getUniqueKeyField();
    +    if(uniqueKeyField == null) {
    +      throw new SolrException( ErrorCode.BAD_REQUEST,
    +          " ChildDocTransformer requires the schema to have a uniqueKeyField." );
    +    }
    +
    +    String childFilter = params.get( "childFilter" );
    +    String nestPath = null;
    +    int limit = params.getInt( "limit", 10 );
    +
    +    Query childFilterQuery = null;
    +    List<String> split = null;
    +    List<String> splitPath = null;
    +    if(childFilter != null) {
    +      split = StrUtils.splitSmart(childFilter, ':');
    +      splitPath = StrUtils.splitSmart(split.get(0), PATH_SEP_CHAR.charAt(0));
    +      try {
    +        if (childFilter.contains(PATH_SEP_CHAR)) {
    +          nestPath = String.join(PATH_SEP_CHAR, splitPath.subList(0, splitPath.size() - 1));
    +          // TODO: filter out parents who's childDocs don't match the original childFilter
    +          childFilter = "(" + splitPath.get(splitPath.size() - 1) + ":\"" + split.get(split.size() - 1) + "\" AND " + NEST_PATH_FIELD_NAME + ":\"" + nestPath + "/\")";
    +        }
    +        childFilterQuery = QParser.getParser(childFilter, req).getQuery();
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +      }
    +    }
    +
    +    String parentFilter = params.get( "parentFilter" );
    +
    +    BitSetProducer parentsFilter = null;
    +
    +    if(parentFilter != null) {
    +      try {
    +        Query parentFilterQuery = QParser.getParser( parentFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    +      }
    +    } else {
    +      String sRootFilter = "{!frange l=1 u=1}strdist(" + req.getSchema().getUniqueKeyField().getName() + "," + ROOT_FIELD_NAME + ",edit)";
    +      try {
    +        Query parentFilterQuery = QParser.getParser(sRootFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    +      }
    +    }
    +
    +    if(childFilterQuery == null) {
    +      return new DeeplyNestedChildDocTransformer(field, parentsFilter, req, limit);
    +    }
    +    return new DeeplyNestedFilterChildDocTransformer(field, parentsFilter, req, childFilterQuery, nestPath!=null? generatePattern(splitPath): null, limit);
    +  }
    +
    +  private Pattern generatePattern(List<String> pathList) {
    +    if(pathList.size() <= 2) {
    +      return Pattern.compile(pathList.get(0) + NUM_SEP_CHAR + "\\d");
    +    }
    +    return Pattern.compile(String.join(NUM_SEP_CHAR + "\\d" + PATH_SEP_CHAR, pathList.subList(0, pathList.size() - 1)) + NUM_SEP_CHAR + "\\d");
    +  }
    +}
    +
    +class DeeplyNestedFilterChildDocTransformer extends DeeplyNestedChildDocTransformerBase {
    +
    +  private Query childFilterQuery;
    +  private Pattern nestPathMatcher;
    +
    +  public DeeplyNestedFilterChildDocTransformer( String name, final BitSetProducer parentsFilter,
    +                              final SolrQueryRequest req, final Query childFilterQuery, Pattern pathPattern, int limit) {
    +    super(name, parentsFilter, req, limit);
    +    this.childFilterQuery = childFilterQuery;
    +    this.nestPathMatcher = pathPattern;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int docid) {
    --- End diff --
    
    This is a going to be a real challenge to come up with something that is understandable (doesn't appear *too* complicated).  Right now it appears too complicated to me; it's hard to review.  Tonight I think I'll propose some pseudocode on how to structure it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    Rebased on master to include DocFetcher improvements,
    which helped remove some boilerplate code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205851891
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,227 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Iterator;
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.document.StoredField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +    counter.set(0); // reset id counter
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +        "/response/docs/[0]/toppings/[1]/type_s==[Chocolate]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]",
    +        "/response/docs/[0]/toppings/[1]/ingredients/[1]/name_s==[cocoa]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]",
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc", "fl", "*, _nest_path_, [child hierarchy=true]")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, fullNestedDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*, _nest_path_, [child hierarchy=true]"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testExactPath() throws Exception {
    +    indexSampleData(2);
    +    String[] tests = {
    +        "/response/numFound==4",
    +        "/response/docs/[0]/_nest_path_=='toppings#0'",
    +        "/response/docs/[1]/_nest_path_=='toppings#0'",
    +        "/response/docs/[2]/_nest_path_=='toppings#1'",
    +        "/response/docs/[3]/_nest_path_=='toppings#1'",
    +    };
    +
    +    assertJQ(req("q", "_nest_path_:*toppings/",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +
    +    assertJQ(req("q", "+_nest_path_:\"toppings/\"",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +    };
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings/type_s:Regular']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testGrandChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]"
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, grandChildDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testSingularChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[cake]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]"
    +    };
    +
    +    assertJQ(req("q", "type_s:cake",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='lonely" + PATH_SEP_CHAR + "lonelyGrandChild" + PATH_SEP_CHAR + "test2_s:secondTest']"),
    +        tests);
    +  }
    +
    +  private void indexSampleData(int numDocs) throws Exception {
    +    for(int i = 0; i < numDocs; ++i) {
    +      updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +    }
    +    assertU(commit());
    +  }
    +
    +  private static String id() {
    +    return "" + counter.incrementAndGet();
    +  }
    +
    +  private static String grandChildDocTemplate(int id) {
    +    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    --- End diff --
    
    Another option, perhaps easiest, is to pre-process to Solr document before calling toString.  This could remove "noise" fields like root.  If an IndexableField is there then replace with stringValue().


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213270075
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,263 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +  private final boolean isNestedSchema;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, boolean isNestedSchema, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +    this.isNestedSchema = isNestedSchema;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +
    +      final int segPrevRootId = segRootId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (segRootId - 1)) {
    +        // doc has no children, return fast
    +        return;
    +      }
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +      // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs,
    +      // and the iterator can not be reversed.
    +      final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME));
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +      final int lastChildId = segBaseId + segPrevRootId + 1;
    +      // Loop each child ID up to the parent (exclusive).
    +      for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) {
    +
    +        // get the path.  (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank)
    +        String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +        if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) {
    --- End diff --
    
    Perhaps a better way to do this would be building a new Filter for every transformed doc e.g. `_nest_path_:transformedDocPath`?
    I am not quite sure of the performance overhead such technique would impose,
    WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205124868
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -132,54 +126,49 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
                 // load the doc
                 SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
                     schema, new SolrReturnFields());
    -            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
                 if (shouldDecorateWithDVs) {
                   docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
                 }
                 // get parent path
                 // put into pending
                 String parentDocPath = lookupParentPath(fullDocPath);
    -            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
     
    -            // if this path has pending child docs, add them.
    -            if (isAncestor) {
    -              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    -              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
                 }
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimIfSingleDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
               }
             }
     
             // only children of parent remain
             assert pendingParentPathsToChildren.keySet().size() == 1;
     
    -        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
           }
         } catch (IOException e) {
           rootDoc.put(getName(), "Could not fetch child Documents");
         }
       }
     
    -  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    -    // lookup leaf key for these children using path
    -    // depending on the label, add to the parent at the right key/label
    -    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    -    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    -    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    -      List<SolrDocument> list = new ArrayList<>();
    -      parent.setField(trimmedPath, list);
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    --- End diff --
    
    it seems like MultiMap entries are not returned as the collections for each key, but are instead returned as entries with each key added to it. It seems more efficient to add the whole array at once, reducing the amount of lookups that have to be made for each child. We could do this in one simple look up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205482972
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,227 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Iterator;
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.document.StoredField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +    counter.set(0); // reset id counter
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +        "/response/docs/[0]/toppings/[1]/type_s==[Chocolate]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]",
    +        "/response/docs/[0]/toppings/[1]/ingredients/[1]/name_s==[cocoa]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]",
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc", "fl", "*, _nest_path_, [child hierarchy=true]")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, fullNestedDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*, _nest_path_, [child hierarchy=true]"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testExactPath() throws Exception {
    +    indexSampleData(2);
    +    String[] tests = {
    +        "/response/numFound==4",
    +        "/response/docs/[0]/_nest_path_=='toppings#0'",
    +        "/response/docs/[1]/_nest_path_=='toppings#0'",
    +        "/response/docs/[2]/_nest_path_=='toppings#1'",
    +        "/response/docs/[3]/_nest_path_=='toppings#1'",
    +    };
    +
    +    assertJQ(req("q", "_nest_path_:*toppings/",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +
    +    assertJQ(req("q", "+_nest_path_:\"toppings/\"",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +    };
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings/type_s:Regular']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testGrandChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]"
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, grandChildDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testSingularChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[cake]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]"
    +    };
    +
    +    assertJQ(req("q", "type_s:cake",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='lonely" + PATH_SEP_CHAR + "lonelyGrandChild" + PATH_SEP_CHAR + "test2_s:secondTest']"),
    --- End diff --
    
    I'd prefer you not parameterize PATH_SEP_CHAR or any other syntactical/format constructs.  It adds noise that make it harder to read simple strings.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204227753
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformerFactory.java ---
    @@ -0,0 +1,367 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.regex.Pattern;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.QueryBitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.common.SolrException;
    +import org.apache.solr.common.SolrException.ErrorCode;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.StrUtils;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.QParser;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.apache.solr.search.SyntaxError;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +import static org.apache.solr.schema.IndexSchema.ROOT_FIELD_NAME;
    +
    +/**
    + *
    + * @since solr 4.9
    + *
    + * This transformer returns all descendants of each parent document in a flat list nested inside the parent document.
    + *
    + *
    + * The "parentFilter" parameter is mandatory.
    + * Optionally you can provide a "childFilter" param to filter out which child documents should be returned and a
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * Examples -
    + * [child parentFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue" limit=20]
    + */
    +public class DeeplyNestedChildDocTransformerFactory extends TransformerFactory {
    --- End diff --
    
    Sure thing, I will merge the two transformers


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204814239
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    A Pair would be fine too; it's only slightly more bulky than an intermediate Multimap.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204763476
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -70,7 +78,8 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
         }
     
         String parentFilter = params.get( "parentFilter" );
    -    if( parentFilter == null ) {
    +    boolean buildHierarchy = params.getBool("hierarchy", false);
    +    if( parentFilter == null && !buildHierarchy) {
    --- End diff --
    
    Perhaps we can relax this constraint; wouldn't sRootFilter be appropriate?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205119233
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -67,7 +73,9 @@
     
       public static final String PATH_SEP_CHAR = "/";
       public static final String NUM_SEP_CHAR = "#";
    -  private static final String sRootFilter = "*:* NOT " + NEST_PATH_FIELD_NAME + ":*";
    +  private static final BooleanQuery rootFilter = new BooleanQuery.Builder()
    +      .add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.MUST))
    +      .add(new BooleanClause(new WildcardQuery(new Term(NEST_PATH_FIELD_NAME, new BytesRef("*"))), BooleanClause.Occur.MUST_NOT)).build();
    --- End diff --
    
    I think I suggested using DocValuesExistsQuery which only depends on the existence of DocValues and should be efficient


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204785843
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -61,6 +65,10 @@
      */
     public class ChildDocTransformerFactory extends TransformerFactory {
     
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +  private static final String sRootFilter = "*:* NOT " + NEST_PATH_FIELD_NAME + ":*";
    --- End diff --
    
    Oh I forgot.  Perhaps instead create the Lucene Query directly (rather than create a String that needs to be parsed). Use a BooleanQuery with a MUST_NOT clause around DocValuesFieldsExistsQuery for the nest path field?   And comment this only applies to schemas that are enabled for this enhanced nesting.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205475640
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    --- End diff --
    
    I think "cDocsPath" would be more clearly renamed to "labelWithArrayDesignator" or something like that.  It's not a path anymore (there are no slashes).  "trimmedPath" below is definitely "label".  And reorder the params so that the label comes in-between parent and children, which is more natural.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205062754
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    It seems like a MultiMap<String, Pair<String,Pair>> is even more efficient, since each path is unique, while the inner MultiMap will store each item in each child label as another array.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213543737
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -124,10 +124,11 @@ public void testParentFilterLimitJSON() throws Exception {
     
         assertJQ(req("q", "type_s:donut",
             "sort", "id asc",
    -        "fl", "id, type_s, toppings, _nest_path_, [child limit=1]",
    +        "fl", "id, type_s, lonely, lonelyGrandChild, test_s, test2_s, _nest_path_, [child limit=1]",
    --- End diff --
    
    Leaving this as a TODO for another day sounds like a decent option.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I have simplified the calcDocIdToIterateFrom method, and tests seem to pass :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210276463
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -83,6 +84,57 @@ public void testParentFilterJSON() throws Exception {
             tests);
       }
     
    +  @Test
    +  public void testParentFilterLimitJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    --- End diff --
    
    why define these tests up front?  The vast majority of "assertJQ" or similar calls I've seen in Solr's tests put them inline at the method call, which I think makes the most sense since it's together with the query.  And can the length be checked here?  I think that's a key element of this test's purpose :-)  BTW if you use the XML based assertions, you have a richer language to work with -- XPath.  The json-like assertJQ is some home-grown thing in this project that supposedly is easier for people to understand due to the json-like nature (industry shifts from XML to JSON) but it's limited in capability.  I'm not sure if assertJQ can assert an array length but you could investigate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208507249
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -32,7 +32,7 @@
     import org.junit.BeforeClass;
     import org.junit.Test;
     
    -public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +public class TestChildDocTransformerHierarchy extends SolrTestCaseJ4 {
    --- End diff --
    
    These "hierarchy" tests are in a separate class since they require a different schema and configuration to be loaded upon start-up.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205467807
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    --- End diff --
    
    Oh?  I didn't know we cared at all what the ID is.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211594079
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -0,0 +1,346 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Collection;
    +import java.util.Iterator;
    +import java.util.Map;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestChildDocTransformerHierarchy extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +  private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static int randomDocTopId = 0;
    --- End diff --
    
    I suggest rename to "firstTestedDocId".  And rename "counter" to "idCounter". 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210287119
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    since we initialize the document using from the highest index to exhaust the limit if possible, it means some documents may be skipped. If we have 3 child docs: 1 , 2, 3, and the limit is set to "2", only doc 2 and 3 will be added if we initialise the index to the one that will exhaust the limit. Another way is simply to count the number of matching docs so far, and continue onto the next root doc if reached(return from the transform method).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I tried optimizing the lookup and insertion of child documents.
    Hopefully I'll get more time tomorrow to get the tests up to scratch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210280357
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    I suspect you are simply using the "sort" word inappropriately.  "sort" is related to ordering, but we didn't change the order.  We did change the "window" or "offset", if you will, into the docs to examine.  Ordering/sort hasn't changed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205087092
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    Yes I also considered that but I'm still thinking of a clever way to notify is it was a single child or an array, which would only be known at map insert time, since we trim the index (or lack of) when we insert the childDoc to the map.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209070656
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -61,6 +57,13 @@
      */
     public class ChildDocTransformerFactory extends TransformerFactory {
     
    +  public static final String PATH_SEP_CHAR = "/";
    --- End diff --
    
    I don't see why these are declared as Strings and not `char`.  It's clumsy later to do `charAt(0)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210288829
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    I guess I was not specific enough beforehand.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204768647
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    --- End diff --
    
    I'm a bit confused here.  Firstly, it's curious to see getLastPath(label) -- since label should not have a path if it is a label.  A label is a plain string key/label joining a parent to child like "comment".  Secondly, why the trimSuffixFromPaths... I don't get the point.  As I've said, comments of input/output sample values can help a ton in knowing what's going on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210271399
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    How could choosing the index based on the "limit" change the document ordering?  My understanding is that child documents placed onto the parent via this transformer are in docID order, which is the same order they were in as provided to Solr.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213202931
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -238,6 +238,17 @@ public void testSingularChildFilterJSON() throws Exception {
             tests);
       }
     
    +  @Test
    +  public void testNonRootChildren() throws Exception {
    +    indexSampleData(numberOfDocsPerNestedTest);
    +    assertJQ(req("q", "test_s:testing",
    +        "sort", "id asc",
    +        "fl", "*,[child childFilter='lonely/lonelyGrandChild/test2_s:secondTest' parentFilter='_nest_path_:\"lonely/\"']",
    --- End diff --
    
    Perhaps I could try and remove the parentFilter requirement all together for hierarchy based queries?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205119770
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -108,12 +103,11 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
           final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
     
    -      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    --- End diff --
    
    Ah good catch; yes a Map and not MultiMap at outer level once we add the label intermediate


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210291464
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    The next sentence is a proposal for a different way to limit the number of child documents.
    Incase the current logic is sufficient, then we have no need to worry about it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205093780
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    Oh, ok.  Perhaps have a separate standalone Set of complete paths of "single child" elements.  that tracks parent path to child labels in which that label contains a unique entry.  WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I'm glad you added the comments to the analysis regexps.  I don't know why there are two `.*` (dot-star) in the first regexp.
    
    In JIRA I mentioned the "limit" needs to be re-implemented.  Other than that... I don't recall anything.  It'd be good to scan over for any nocommitts or TODOs that look pressing.  I think it's probably about ready otherwise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204227785
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildTransformerBase.java ---
    @@ -0,0 +1,139 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.ArrayList;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Objects;
    +import java.util.stream.Stream;
    +
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +/**
    + *
    + * This base class helps create a child doc transformer which caches the parent query using QueryBitProducer
    + *
    + *
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * @see org.apache.solr.response.transform.DeeplyNestedChildDocTransformer
    + * @see org.apache.solr.response.transform.DeeplyNestedFilterChildDocTransformer
    + */
    +
    +abstract class DeeplyNestedChildDocTransformerBase extends DocTransformer {
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  protected BitSetProducer parentsFilter;
    +  protected BitSet parents;
    +  protected int limit;
    +  protected final Sort pathKeySort;
    +
    +  public DeeplyNestedChildDocTransformerBase( String name, final BitSetProducer parentsFilter,
    +                                          final SolrQueryRequest req, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.pathKeySort = new Sort(new SortField(NEST_PATH_FIELD_NAME, SortField.Type.STRING, false),
    +        new SortField(idField.getName(), SortField.Type.STRING, false));
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static SolrDocument getChildByPath(String[] pathAndNum, SolrDocument lastDoc) {
    +    List<Object> fieldsValues = (List<Object>) lastDoc.getFieldValues(pathAndNum[0]);
    +    int childIndex = Integer.parseInt(pathAndNum[1]);
    +    return fieldsValues.size() > childIndex ? (SolrDocument) fieldsValues.get(childIndex): null;
    +  }
    +
    +  protected static void addChild(SolrDocument parentDoc, String[] pathAndNum, SolrDocument cDoc) {
    +    if(!pathAndNum[1].equals("") && (parentDoc.get(pathAndNum[0]) == null)) {
    +      parentDoc.setField(pathAndNum[0], new NullFilteringArrayList<SolrDocument>());
    +    }
    +    NullFilteringArrayList fieldValues = (NullFilteringArrayList) parentDoc.getFieldValues(pathAndNum[0]);
    +    int pathNum = Integer.parseInt(pathAndNum[1]);
    +
    +    fieldValues.addWithPlaceHolder(pathNum, cDoc);
    +  }
    +
    +  protected static String[] getPathAndNum(String lastPath) {
    +    return lastPath.split(NUM_SEP_CHAR);
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  protected static class NullFilteringArrayList<T> extends ArrayList<T> {
    --- End diff --
    
    I used this since the paths in the documents are stored as array indexes. In case some documents from the array were filtered, I insert null values, so the child docs that matched the filter will be in the same array index as they were originally. Inside the iterator method, the null values are filtered, so they don't get written by the response writer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209067568
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -0,0 +1,249 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Collection;
    +import java.util.Iterator;
    +import java.util.Map;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestChildDocTransformerHierarchy extends SolrTestCaseJ4 {
    --- End diff --
    
    I'm glad you have split out this test.  I want to further create a new "schema-nest.xml" for our tests that explicitly deal with this new nested document stuff.  This test will use it, as well as TestNestedUpdateProcessor.  schema15.xml can be reverted to as it was.  TestChildDocTransformer can continue to use the legacy schema and as-such we can observe that our additions to the underlying code don't disturb anyone who is using it that is unaware of the new nested stuff.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205480076
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by DeeplyNestedChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private String getLastPath(String path) {
    +    if(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) == -1) {
    +      return path;
    +    }
    +    return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
    +  }
    +
    +  private String trimPathIfArrayDoc(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = path.charAt(lastIndex) == NUM_SEP_CHAR.charAt(0);
    +    return singleDocVal ? path: path.substring(0, path.lastIndexOf(NUM_SEP_CHAR.charAt(0)));
    +  }
    +
    +  private String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path: path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  String lookupParentPath(String currDocPath) {
    +    // chop off leaf (after last '/')
    +    // if child of leaf then return null (special value)
    +    int lastPathIndex = currDocPath.lastIndexOf(PATH_SEP_CHAR);
    +    return lastPathIndex == -1 ? null: currDocPath.substring(0, lastPathIndex);
    +  }
    +
    +  private String getPathByDocId(int segDocId, SortedDocValues segPathDocValues) throws IOException {
    +    int numToAdvance = segPathDocValues.docID()==-1?segDocId: segDocId - (segPathDocValues.docID());
    --- End diff --
    
    This was confusing me for a minute until I ultimately figured out it doesn't matter since it's only used in your assert.  Do you think this is helpful at all or just drop it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204777468
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -61,6 +65,10 @@
      */
     public class ChildDocTransformerFactory extends TransformerFactory {
     
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +  private static final String sRootFilter = "*:* NOT " + NEST_PATH_FIELD_NAME + ":*";
    --- End diff --
    
    _ROOT_ is added to root docs too now, so I guess we can't use this. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I added a new test [TestDeeplyNestedChildDocTransformer#testSingularChildFilterJSON](https://github.com/apache/lucene-solr/pull/416/commits/df77f5b9cd32fdd2d607d8e705481e8f17544ae7#diff-f10d284d1a6b88916e43de7c01f46bc1R113) to test this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    This is pretty close; it has come far.  I've applied the patch and started to make some manipulations.  I just have some questions, which I'll ask inline.  Please don't push new changes as it'd be hard to integrate.  I could either post a patch or... maybe I should push a feature branch to my fork on GH which would display diffs from yours.  Hmm; does GitHub allow you to show a diff between your feature branch and another feature branch on another fork?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210274031
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -224,9 +225,29 @@ private static String getPathByDocId(int segDocId, SortedDocValues segPathDocVal
         return segPathDocValues.binaryValue().utf8ToString();
       }
     
    -  private static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    -    return fieldVal instanceof IndexableField
    -        ? fieldType.toExternal((IndexableField)fieldVal)
    -        : fieldVal.toString();
    +  /**
    +   *
    +   * @param segDocBaseId base docID of the segment
    +   * @param RootId docID if the current root document
    +   * @param lastDescendantId lowest docID of the root document's descendant
    +   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   */
    +  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    +    int i = segDocBaseId + RootId - 1; // the child document with the highest docID
    --- End diff --
    
    I strongly suggest avoiding 'i' as a variable name unless it's a loop index


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    RE isAtomicUpdate:  Good catch!  Please add a test to call this; perhaps it's just one additional line to an existing test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I was referring to the '#2'.
    Imagine a query for a social network, where all the posts comments made by a specific user are required, including the ones he commented on, which would not ,atch the ChildFilter commnent.author:Joe. To display these comments we would need to bring all the hierarchy above.
    In another scenario one might want to get statistics of how many comments, including replies Joe commented on each post. This can be done using just the Path, excluding the array index, so isAncestor returns true.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    > Imagine a query for a social network, where all the posts comments made by a specific user are required, including the ones he commented on, which would not match the ChildFilter comment.author:Joe.
    
    Assuming a comment on another comment is a parent/child relationship (i.e. the recursive comments are ancestors), wouldn't our doc transformer return those ancestors?  We're expressly writing it to return those ancestors so it would.  The ancestor path ord IDs will refer to paths distinguishing the comment occurrences, e.g. might resolve to a path looking something like `comment#3/comment#9/comment#1/` and thus we don't mix up which comment is on which.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210285891
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    Yes,
    I meant the order in which the documents are counted.
    My bad.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210287034
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    Again, the order hasn't changed ;-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204776145
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -70,7 +78,8 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
         }
     
         String parentFilter = params.get( "parentFilter" );
    -    if( parentFilter == null ) {
    +    boolean buildHierarchy = params.getBool("hierarchy", false);
    +    if( parentFilter == null && !buildHierarchy) {
    --- End diff --
    
    Sure, if we want this change to apply to the regular ChildDocumentTransformer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211599253
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,257 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +      final int segPrevRootId = segRootId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (segRootId - 1)) {
    +        // doc has no children, return fast
    +        return;
    +      }
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      IndexSchema schema = searcher.getSchema();
    --- End diff --
    
    Will do


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211595043
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -0,0 +1,346 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Collection;
    +import java.util.Iterator;
    +import java.util.Map;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestChildDocTransformerHierarchy extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +  private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static int randomDocTopId = 0;
    +  private static String fqToExcludeNoneTestedDocs; // filter documents that were created for random segments to ensure the transformer works with multiple segments.
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema-nest.xml"); // use "nest" schema
    +    final boolean useSegments = random().nextBoolean();
    +    if(useSegments) {
    +      // create random segments
    +      final int numOfDocs = 10;
    +      for(int i = 0; i < numOfDocs; ++i) {
    +        updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +        if(random().nextBoolean()) {
    +          assertU(commit());
    +        }
    +      }
    +      assertU(commit());
    +      randomDocTopId = counter.get();
    --- End diff --
    
    I think this line should be at the end of this method, executed always.  I can see it works where it is now, executed conditionally, but I think it'd be clearer if not done in a condition.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210262386
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    This had to be changed since we initialise the for loop in ChildDocTransformer using the limit param, so the sort is docID descending now.
    Thought it is important to point it out, and perhaps this should also be documented.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211255381
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -40,22 +43,52 @@
       private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
       private static final String[] names = {"Yaz", "Jazz", "Costa"};
       private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static boolean useSegments;
    +  private static int randomDocTopId = 0;
    +  private static String filterOtherSegments;
     
       @BeforeClass
       public static void beforeClass() throws Exception {
         initCore("solrconfig-update-processor-chains.xml", "schema-nest.xml"); // use "nest" schema
    +    useSegments = random().nextBoolean();
    +    if(useSegments) {
    +      final int numOfDocs = 10;
    +      for(int i = 0; i < numOfDocs; ++i) {
    +        updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +        if(random().nextBoolean()) {
    +          assertU(commit());
    +        }
    +      }
    +      assertU(commit());
    +      randomDocTopId = counter.get();
    +      filterOtherSegments = "{!frange l=" + randomDocTopId + " incl=false}idInt";
    +    } else {
    +      filterOtherSegments = "*:*";
    +    }
       }
     
       @After
       public void after() throws Exception {
    -    clearIndex();
    +    if (!useSegments) {
    --- End diff --
    
    oh my suggestion of beforeClass didn't consider that you might have to do this partial deletion.  To make this simpler, move the new document addition stuff in beforeClass into a before() (and make pertinent fields non-static), then you can revert this after() to as it was.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205468966
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    --- End diff --
    
    Since if there are no children we can return, lets just do that.  This will avoid needing another indentation level for all the remaining code.  You can place that logic immediately after when children is fetched.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205123592
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -190,8 +179,18 @@ private String getLastPath(String path) {
         return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
       }
     
    -  private String trimSuffixFromPaths(String path) {
    -    return path.replaceAll("#\\d|#", "");
    +  private String trimIfSingleDoc(String path) {
    --- End diff --
    
    Do we really need a set though?
    If we trim only the array docs we can then easily figure out at insert time whether the item is a single doc or an array, optimizing the insert time, and removing the need for a set


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205962085
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,227 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Iterator;
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.document.StoredField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +    counter.set(0); // reset id counter
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +        "/response/docs/[0]/toppings/[1]/type_s==[Chocolate]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]",
    +        "/response/docs/[0]/toppings/[1]/ingredients/[1]/name_s==[cocoa]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]",
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc", "fl", "*, _nest_path_, [child hierarchy=true]")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, fullNestedDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*, _nest_path_, [child hierarchy=true]"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testExactPath() throws Exception {
    +    indexSampleData(2);
    +    String[] tests = {
    +        "/response/numFound==4",
    +        "/response/docs/[0]/_nest_path_=='toppings#0'",
    +        "/response/docs/[1]/_nest_path_=='toppings#0'",
    +        "/response/docs/[2]/_nest_path_=='toppings#1'",
    +        "/response/docs/[3]/_nest_path_=='toppings#1'",
    +    };
    +
    +    assertJQ(req("q", "_nest_path_:*toppings/",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +
    +    assertJQ(req("q", "+_nest_path_:\"toppings/\"",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +    };
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings/type_s:Regular']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testGrandChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]"
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, grandChildDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testSingularChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[cake]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]"
    +    };
    +
    +    assertJQ(req("q", "type_s:cake",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='lonely" + PATH_SEP_CHAR + "lonelyGrandChild" + PATH_SEP_CHAR + "test2_s:secondTest']"),
    +        tests);
    +  }
    +
    +  private void indexSampleData(int numDocs) throws Exception {
    +    for(int i = 0; i < numDocs; ++i) {
    +      updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +    }
    +    assertU(commit());
    +  }
    +
    +  private static String id() {
    +    return "" + counter.incrementAndGet();
    +  }
    +
    +  private static String grandChildDocTemplate(int id) {
    +    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    --- End diff --
    
    Would this method go in the TestUtils or the specific test class?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210478700
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -87,7 +87,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final int segBaseId = leafReaderContext.docBase;
           final int segRootId = rootDocId - segBaseId;
           final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    -      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +      final int segPrevRootId = rootDocId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (rootDocId - 1)) {
    --- End diff --
    
    You altered line 90 in response to my comment.  I'm referring to line 92 -- `if(segPrevRootId == (rootDocId - 1))`, where my comment is.
    
    Interesting though.... line 90 is different from the line of code I committed to the feature branch.  That line on the feature branch is:
    `final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay`
    Notice there is no conditional, but there is in your version in this PR.  BitSet.prevSetBit will return -1 if given a 0 input (so says it's documentation).  That's what I was trying to say in my comment on the line of code.  Why did you change it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210482225
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -87,7 +87,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final int segBaseId = leafReaderContext.docBase;
           final int segRootId = rootDocId - segBaseId;
           final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    -      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +      final int segPrevRootId = rootDocId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (rootDocId - 1)) {
    --- End diff --
    
    If segRootId is 0, segParentsBitSet.prevSetBit(-1) throws an assertion error, since the index has to be >= 0.
    I will fix line 92


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204772658
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    This approach kinda requires we remove such internal fields later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204786129
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -70,7 +78,8 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
         }
     
         String parentFilter = params.get( "parentFilter" );
    -    if( parentFilter == null ) {
    +    boolean buildHierarchy = params.getBool("hierarchy", false);
    +    if( parentFilter == null && !buildHierarchy) {
    --- End diff --
    
    if sRootFilter needs to depend on a nest field, and if nest fields aren't required, I guess we can't relax this constraint.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210290075
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -81,8 +81,8 @@ private void testChildDoctransformerXML() {
     
         String test3[] = new String[] {
             "//*[@numFound='1']",
    -        "/response/result/doc[1]/doc[1]/str[@name='id']='3'" ,
    -        "/response/result/doc[1]/doc[2]/str[@name='id']='5'" };
    +        "/response/result/doc[1]/doc[1]/str[@name='id']='5'" ,
    --- End diff --
    
    "If we have 3 child docs" ....   -- the explanation in that sentence is perfectly clear, I think.  The next sentence confused me, but whatever.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205960593
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    Should I just delete the old transformer and make this one the new ChildDocTransformer?
    Or should I give it a new name like ChildDocHierarchyTransformer?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204775739
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    +    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    +      List<SolrDocument> list = new ArrayList<>();
    +      parent.setField(trimmedPath, list);
    +    }
    +    parent.addField(trimmedPath, child);
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child) {
    +    String docPath = getSolrFieldString(child.getFirstValue(NEST_PATH_FIELD_NAME), schema.getFieldType(NEST_PATH_FIELD_NAME));
    --- End diff --
    
    Makes sense, I actually changed it so we don't fetch the path twice, but I guess you're right


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213001520
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -238,6 +238,17 @@ public void testSingularChildFilterJSON() throws Exception {
             tests);
       }
     
    +  @Test
    +  public void testNonRootChildren() throws Exception {
    +    indexSampleData(numberOfDocsPerNestedTest);
    +    assertJQ(req("q", "test_s:testing",
    +        "sort", "id asc",
    +        "fl", "*,[child childFilter='lonely/lonelyGrandChild/test2_s:secondTest' parentFilter='_nest_path_:\"lonely/\"']",
    --- End diff --
    
    Aha; I see you're requiring that a custom parentFilter be present in order to use the childDocFilter on non-root docs.  I suppose that's fine; it something that could be enhanced in the future to avoid that inconvenience & limitation(*).  Since this is now required, the transformer should throw an error to the user if the doc to be transformed isn't in that filter.  For example:  "The document to be transformed must be matched by the parentFilter of the child transformer" with BAD_REQUEST flag as it's user-error.  And add a test for this.
    
    (*) An example of it being a limitation is this:  Say the child docs are all a "comment" in nature; and thus are recursive.  The top query "q" might match certain comments of interest.  And we want all children of those comments returned hierarchically.  The parentFilter would end up having to match all comments, but that would prevent returning child comments and thus blocking the whole idea altogether :-(
    
    To "fix" this limitation, we'd not constrain transformed docs to those in the parentFilter, and we wouldn't insist on any special parentFilter.  In the loop that builds the hierarchy, when fetching the path, we'd need to skip over the current doc if it doesn't descend from that of the doc to be transformed.  Seems pretty straight-forward.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206011552
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    Replace/update ChildDocTransformer, I think.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204813287
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    Perhaps we can Store everything in a Multimap<String,Pair<String,SolrDocument>>, thus saving each document bonded to its full path?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210322853
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -291,4 +302,15 @@ private static String generateDocHierarchy(int i) {
                   "}\n" +
                 "}";
       }
    +
    +  private static String IndexWoChildDocs() {
    --- End diff --
    
    First, never start a method name with an uppercase letter (at least in Java).  But secondly, I suggest inlining this to the point of use since you're only using it in one place.  I know this is a stylistic point but I find it harder to read code that refers to a bunch of other things in other places that I have to go read -- it's disjointed; disrupts reading flow.  Of course a good deal of that is normal (calling string.whatever) but here it's only used for this test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210475699
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -291,4 +302,15 @@ private static String generateDocHierarchy(int i) {
                   "}\n" +
                 "}";
       }
    +
    +  private static String IndexWoChildDocs() {
    --- End diff --
    
    Oops how embarrassing, guess I was in a hurry :(. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208507038
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    Another major difference is that now since child documents are stored as fields, the user needs to explicitly add them to the list of return fields. I have no opinion regarding this issue, but this might pose a problem to some use cases. Perhaps documenting this would be enough?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206011853
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    --- End diff --
    
    Oh right; I forgot we needed an ID to do the child doc query limited by this parent ID.  Please add a comment.  I don't think we should bother with root.  I guess ChildDocTransformer might break if the id field is not stored but did have docValues.  That's a shame; it deserves a TODO.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    Sure thing it won't mix up the comments, but what if the user also requests all the comments which in the same thread, thus being in the same array just a path above?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206892630
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    Yes; it can do both easily enough I think?  A separate method could take over for the existing/legacy case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211259141
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/schema-nest.xml ---
    @@ -20,6 +20,9 @@
     <schema name="nested-docs" version="1.6">
     
       <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
    +  <field name="idInt" type="int" indexed="true" multiValued="false" docValues="true" stored="false" useDocValuesAsStored="false" />
    --- End diff --
    
    minor: rename to `id_i` to follow typical naming conventions


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I think this issue should only be about returning a nested hierarchy.  That at least seems clear -- something everyone would want.  But computing stats... wow thats a large scope increase that deserves its own issue.  Lets table that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208799313
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    ah; interesting.  It's logical.  Is this only needed for anonymous child docs (thus \_childDocuments\_ or any/all possible relationship names that aren't necessarily just at the root level but anywhere in the hierarchy?  Perhaps this is where that "anonChildDocs" ought to come into play again for backwards-compatibility sake?  Well perhaps not... someone who is using anonymous child docs today will not have the nested field metadata and thus the old logic will kick in and ensure child documents are added as it was; right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211505124
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -264,7 +309,7 @@ private static Object cleanIndexableField(Object field) {
       }
     
       private static String grandChildDocTemplate(int id) {
    -    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    int docNum = (id / sumOfDocsPerNestedDocument) % numberOfDocsPerNestedTest; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    --- End diff --
    
    the modulo is added to filter the docs in the other segments, and then calculate the i that was passed, when constructing the nested document. This then ensures the child transformer did not fail. If we can be satisfied by only testing the ids, which does not seem as bullet-proof to me, this could be removed, and only the ids will be tested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210322181
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -87,7 +87,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final int segBaseId = leafReaderContext.docBase;
           final int segRootId = rootDocId - segBaseId;
           final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    -      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +      final int segPrevRootId = rootDocId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (rootDocId - 1)) {
    --- End diff --
    
    you are comparing a segment local ID with a global ID which is incorrect.  You should refer to segRootId.  This is why I'm particular about using "seg" nomenclature in a body of code that deals with both segment and global IDs -- it makes it at least easier to identify such an error.  It's difficult to get tests to detect this; we'd need to commit some docs up front to cause more segments to be created than many tests will do.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205465327
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -70,36 +86,62 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
         }
     
         String parentFilter = params.get( "parentFilter" );
    -    if( parentFilter == null ) {
    -      throw new SolrException( ErrorCode.BAD_REQUEST, "Parent filter should be sent as parentFilter=filterCondition" );
    +    BitSetProducer parentsFilter = null;
    +    boolean buildHierarchy = params.getBool("hierarchy", false);
    +    if( parentFilter == null) {
    +      if(!buildHierarchy) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Parent filter should be sent as parentFilter=filterCondition" );
    +      }
    +      parentsFilter = new QueryBitSetProducer(rootFilter);
    +    } else {
    +      try {
    +        Query parentFilterQuery = QParser.getParser(parentFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    +      }
         }
     
         String childFilter = params.get( "childFilter" );
         int limit = params.getInt( "limit", 10 );
     
    -    BitSetProducer parentsFilter = null;
    -    try {
    -      Query parentFilterQuery = QParser.getParser( parentFilter, req).getQuery();
    -      //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    -      //  BitSetProducer over that?
    -      // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    -      // then return BitSetProducer with custom BitSet impl accessing the docSet
    -      parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    -    } catch (SyntaxError syntaxError) {
    -      throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    -    }
    -
         Query childFilterQuery = null;
         if(childFilter != null) {
    --- End diff --
    
    The code flow from here to the end of the method looks very awkward to me.  I think the top "if" condition should test for buildHierarchy that we the nested and non-nested cases are clearly separated.  Do you think that would be clear?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206034965
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -168,35 +172,57 @@ private static String id() {
         return "" + counter.incrementAndGet();
       }
     
    +  private static void cleanSolrDocumentFields(SolrDocument input) {
    +    for(Map.Entry<String, Object> field: input) {
    +      Object val = field.getValue();
    +      if(val instanceof Collection) {
    +        Object newVals = ((Collection) val).stream().map((item) -> (cleanIndexableField(item)))
    +            .collect(Collectors.toList());
    +        input.setField(field.getKey(), newVals);
    +        continue;
    +      } else {
    +        input.setField(field.getKey(), cleanIndexableField(field.getValue()));
    +      }
    +    }
    +  }
    +
    +  private static Object cleanIndexableField(Object field) {
    +    if(field instanceof IndexableField) {
    +      return ((IndexableField) field).stringValue();
    +    } else if(field instanceof SolrDocument) {
    +      cleanSolrDocumentFields((SolrDocument) field);
    +    }
    +    return field;
    +  }
    +
       private static String grandChildDocTemplate(int id) {
         int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    -    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "toppings=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 3) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Regular>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 4) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], " +
    -        "_nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 3) + ">, _root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 5) + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:Chocolate>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + id + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">, " +
    -        "ingredients=[SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 6) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5)+ ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}, " +
    -        "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + (id + 7) + ">, name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:cocoa>], _nest_parent_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_nest_parent_:" + (id + 5) + ">, " +
    -        "_root_=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<_root_:" + id + ">}]}]}";
    +    return "SolrDocument{id="+ id + ", type_s=[" + types[docNum % types.length] + "], name_s=[" + names[docNum % names.length] + "], " +
    --- End diff --
    
    Perhaps we should only leave ID?
    I would prefer to have one unique key to make sure the documents are for sure placed under the right parent. Hopefully that will clean most of the noise.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205465998
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -91,15 +100,37 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
     
         Query childFilterQuery = null;
         if(childFilter != null) {
    -      try {
    -        childFilterQuery = QParser.getParser( childFilter, req).getQuery();
    -      } catch (SyntaxError syntaxError) {
    -        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +      if(buildHierarchy) {
    +        childFilter = buildHierarchyChildFilterString(childFilter);
    +        return new DeeplyNestedChildDocTransformer(field, parentsFilter, req,
    +            getChildQuery(childFilter, req), limit);
           }
    +      childFilterQuery = getChildQuery(childFilter, req);
    +    } else if(buildHierarchy) {
    +      return new DeeplyNestedChildDocTransformer(field, parentsFilter, req, null, limit);
         }
     
         return new ChildDocTransformer( field, parentsFilter, uniqueKeyField, req.getSchema(), childFilterQuery, limit);
       }
    +
    +  private static Query getChildQuery(String childFilter, SolrQueryRequest req) {
    +    try {
    +      return QParser.getParser( childFilter, req).getQuery();
    +    } catch (SyntaxError syntaxError) {
    +      throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +    }
    +  }
    +
    +  protected static String buildHierarchyChildFilterString(String queryString) {
    --- End diff --
    
    Remember to provide input/output example.  I think this is where the PathHierarchyTokenizer might come into play... and our discussions on the JIRA issue about that hierarchy.  Can we table this for now and do in a follow-up issue?  (i.e. have no special syntax right now).  I'm just concerned the scope of this may be bigger than limited to this doc transformer since presumably users will want to do join queries using this syntax as well.  And this touches on how we index this; which is kinda a bigger discussion than all the stuff going on already in this issue.  And this'll need to be documented in the Solr Ref Guide well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205121128
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -132,54 +126,49 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
                 // load the doc
                 SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
                     schema, new SolrReturnFields());
    -            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
                 if (shouldDecorateWithDVs) {
                   docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
                 }
                 // get parent path
                 // put into pending
                 String parentDocPath = lookupParentPath(fullDocPath);
    -            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
     
    -            // if this path has pending child docs, add them.
    -            if (isAncestor) {
    -              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    -              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
                 }
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimIfSingleDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    --- End diff --
    
    I believe we simply want the label here; maybe add a method getLeafLabel() that takes a path and returns the leaf label.  It would trim off any "#" with digits no matter if it's a single doc or not.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205965045
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    --- End diff --
    
    Perhaps since _root_ is added to every document, we could use that field instead of the ID field?
    This is just a thought that popped into my head.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205125247
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -67,7 +73,9 @@
     
       public static final String PATH_SEP_CHAR = "/";
       public static final String NUM_SEP_CHAR = "#";
    -  private static final String sRootFilter = "*:* NOT " + NEST_PATH_FIELD_NAME + ":*";
    +  private static final BooleanQuery rootFilter = new BooleanQuery.Builder()
    +      .add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.MUST))
    +      .add(new BooleanClause(new WildcardQuery(new Term(NEST_PATH_FIELD_NAME, new BytesRef("*"))), BooleanClause.Occur.MUST_NOT)).build();
    --- End diff --
    
    My bad, I'll fix this in the next commit.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r212993235
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -92,6 +97,18 @@ private void testChildDoctransformerXML() {
         assertQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
             "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
     
    +    try(SolrQueryRequest req = req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    --- End diff --
    
    Please include a comment explaining what it is you're testing here (i.e. what is the point of this particular test); it's non-obvious to me.
    This test tests manually... but it could be done more concisely using the XPath style assertions done elsewhere in this file and most tests.  For example if you want to test the number of child documents, it'd be something like this: "count(/response/result/doc[1]/doc)=2" and include a numFound check.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211594271
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -0,0 +1,346 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Collection;
    +import java.util.Iterator;
    +import java.util.Map;
    +import java.util.concurrent.atomic.AtomicInteger;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestChildDocTransformerHierarchy extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +  private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static int randomDocTopId = 0;
    +  private static String fqToExcludeNoneTestedDocs; // filter documents that were created for random segments to ensure the transformer works with multiple segments.
    --- End diff --
    
    None -> Non


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210294082
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -224,9 +225,29 @@ private static String getPathByDocId(int segDocId, SortedDocValues segPathDocVal
         return segPathDocValues.binaryValue().utf8ToString();
       }
     
    -  private static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    -    return fieldVal instanceof IndexableField
    -        ? fieldType.toExternal((IndexableField)fieldVal)
    -        : fieldVal.toString();
    +  /**
    +   *
    +   * @param segDocBaseId base docID of the segment
    +   * @param RootId docID if the current root document
    +   * @param lastDescendantId lowest docID of the root document's descendant
    +   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   */
    +  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    +    int i = segDocBaseId + RootId - 1; // the child document with the highest docID
    --- End diff --
    
    Changed and noted :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209069776
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -70,109 +73,59 @@ public DocTransformer create(String field, SolrParams params, SolrQueryRequest r
         }
     
         String parentFilter = params.get( "parentFilter" );
    -    if( parentFilter == null ) {
    -      throw new SolrException( ErrorCode.BAD_REQUEST, "Parent filter should be sent as parentFilter=filterCondition" );
    -    }
    -
    -    String childFilter = params.get( "childFilter" );
    -    int limit = params.getInt( "limit", 10 );
    -
         BitSetProducer parentsFilter = null;
    -    try {
    -      Query parentFilterQuery = QParser.getParser( parentFilter, req).getQuery();
    -      //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    -      //  BitSetProducer over that?
    -      // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    -      // then return BitSetProducer with custom BitSet impl accessing the docSet
    -      parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    -    } catch (SyntaxError syntaxError) {
    -      throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    -    }
    -
    -    Query childFilterQuery = null;
    -    if(childFilter != null) {
    +    boolean buildHierarchy = params.getBool("hierarchy", false);
    +    if( parentFilter == null) {
    +      if(!buildHierarchy) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Parent filter should be sent as parentFilter=filterCondition" );
    +      }
    +      parentsFilter = new QueryBitSetProducer(rootFilter);
    +    } else {
           try {
    -        childFilterQuery = QParser.getParser( childFilter, req).getQuery();
    +        Query parentFilterQuery = QParser.getParser(parentFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
           } catch (SyntaxError syntaxError) {
    -        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
           }
         }
     
    -    return new ChildDocTransformer( field, parentsFilter, uniqueKeyField, req.getSchema(), childFilterQuery, limit);
    -  }
    -}
    -
    -class ChildDocTransformer extends DocTransformer {
    -  private final String name;
    -  private final SchemaField idField;
    -  private final IndexSchema schema;
    -  private BitSetProducer parentsFilter;
    -  private Query childFilterQuery;
    -  private int limit;
    -
    -  public ChildDocTransformer( String name, final BitSetProducer parentsFilter, 
    -                              final SchemaField idField, IndexSchema schema,
    -                              final Query childFilterQuery, int limit) {
    -    this.name = name;
    -    this.idField = idField;
    -    this.schema = schema;
    -    this.parentsFilter = parentsFilter;
    -    this.childFilterQuery = childFilterQuery;
    -    this.limit = limit;
    -  }
    +    String childFilter = params.get( "childFilter" );
    +    int limit = params.getInt( "limit", 10 );
     
    -  @Override
    -  public String getName()  {
    -    return name;
    -  }
    -  
    -  @Override
    -  public String[] getExtraRequestFields() {
    -    // we always need the idField (of the parent) in order to fill out it's children
    -    return new String[] { idField.getName() };
    +    if(buildHierarchy) {
    +      if(childFilter != null) {
    +        childFilter = buildHierarchyChildFilterString(childFilter);
    +        return new ChildDocTransformer(field, parentsFilter, req,
    +            getChildQuery(childFilter, req), limit);
    +      }
    +      return new ChildDocTransformer(field, parentsFilter, req, null, limit);
    +    }
    +    return new ChildDocTransformer( field, parentsFilter, req,
    +        childFilter==null? null: getChildQuery(childFilter, req), limit);
       }
     
    -  @Override
    -  public void transform(SolrDocument doc, int docid) {
    -
    -    FieldType idFt = idField.getType();
    -    Object parentIdField = doc.getFirstValue(idField.getName());
    -    
    -    String parentIdExt = parentIdField instanceof IndexableField
    -      ? idFt.toExternal((IndexableField)parentIdField)
    -      : parentIdField.toString();
    -
    +  private static Query getChildQuery(String childFilter, SolrQueryRequest req) {
         try {
    -      Query parentQuery = idFt.getFieldQuery(null, idField, parentIdExt);
    -      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    -      DocList children = context.getSearcher().getDocList(query, childFilterQuery, new Sort(), 0, limit);
    -      if(children.matches() > 0) {
    -        SolrDocumentFetcher docFetcher = context.getSearcher().getDocFetcher();
    -
    -        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    -        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    -        DocIterator i = children.iterator();
    -
    -        while(i.hasNext()) {
    -          Integer childDocNum = i.next();
    -          Document childDoc = context.getSearcher().doc(childDocNum);
    -          // TODO: future enhancement...
    -          // support an fl local param in the transformer, which is used to build
    -          // a private ReturnFields instance that we use to prune unwanted field 
    -          // names from solrChildDoc
    -          SolrDocument solrChildDoc = DocsStreamer.convertLuceneDocToSolrDoc(childDoc, schema,
    -                                                                             new SolrReturnFields());
    +      return QParser.getParser( childFilter, req).getQuery();
    +    } catch (SyntaxError syntaxError) {
    +      throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +    }
    +  }
     
    -          if (shouldDecorateWithDVs) {
    -            docFetcher.decorateDocValueFields(solrChildDoc, childDocNum, dvFieldsToReturn);
    -          }
    -          doc.addChildDocument(solrChildDoc);
    -        }
    -      }
    -      
    -    } catch (IOException e) {
    -      doc.put(name, "Could not fetch child Documents");
    +  protected static String buildHierarchyChildFilterString(String queryString) {
    +    List<String> split = StrUtils.splitSmart(queryString, ':');
    --- End diff --
    
    FYI I've rewritten this method to not use splitting & joining in cases like this which can use "indexOf"/"lastIndexOf".   I've also increased it's robustness to more complicated query examples of multiple conditions.  
    
    For now I think we shouldn't document this; let it be kinda a secret feature until we can query (in q/fq) in like-kind.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211595662
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,257 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +      final int segPrevRootId = segRootId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (segRootId - 1)) {
    +        // doc has no children, return fast
    +        return;
    +      }
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      IndexSchema schema = searcher.getSchema();
    --- End diff --
    
    Master has changes recently committed related to document fetching.  It'll simplify this nicely here.  Can you please sync up?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208813169
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    I had to add a small addition and indeed, the behaviour was as expected.
    I added a test to ensure there is backwards compatibility with the old XML format [here](https://github.com/apache/lucene-solr/blob/2573b89e465de615623084145cab7d17e9cb8a07/solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java#L100).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205122374
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -190,8 +179,18 @@ private String getLastPath(String path) {
         return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
       }
     
    -  private String trimSuffixFromPaths(String path) {
    -    return path.replaceAll("#\\d|#", "");
    +  private String trimIfSingleDoc(String path) {
    --- End diff --
    
    I'll assume this is still WIP as we discussed wanting to use a Set<String> of child docs, and thus we wouldn't trim conditionally for single doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205478036
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by DeeplyNestedChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private String getLastPath(String path) {
    +    if(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) == -1) {
    +      return path;
    +    }
    +    return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
    +  }
    +
    +  private String trimPathIfArrayDoc(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = path.charAt(lastIndex) == NUM_SEP_CHAR.charAt(0);
    +    return singleDocVal ? path: path.substring(0, path.lastIndexOf(NUM_SEP_CHAR.charAt(0)));
    +  }
    +
    +  private String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path: path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  String lookupParentPath(String currDocPath) {
    --- End diff --
    
    I think this method should be `getParentPath`; "lookup" is suggestive of needing to look inside some data structure to find something.  Or alternatively, trimLastPath.  And again, a input-output example helps nicely.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204766704
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    +    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    +      List<SolrDocument> list = new ArrayList<>();
    +      parent.setField(trimmedPath, list);
    +    }
    +    parent.addField(trimmedPath, child);
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child) {
    +    String docPath = getSolrFieldString(child.getFirstValue(NEST_PATH_FIELD_NAME), schema.getFieldType(NEST_PATH_FIELD_NAME));
    +    addChildToParent(parent, child, docPath);
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children) {
    +    for(SolrDocument child: children) {
    +      addChildToParent(parent, child);
    +    }
    +  }
    +
    +  private String getLastPath(String path) {
    +
    +    if(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) == -1) {
    +      return path;
    +    }
    +    return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
    +  }
    +
    +  private String trimSuffixFromPaths(String path) {
    --- End diff --
    
    Maybe we don't need this method; we'll see.  I believe the goal of this method is only to trim off the trailing pound then number?  I'd rather you use String.lastIndexOf type calls rather than a regexp for this simple task.  Assume '#' is a special care thus simply assume what follows is the child index.  Also remember to comment on the method an example to clearly indicate what it's doing


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla closed the pull request at:

    https://github.com/apache/lucene-solr/pull/416


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208813471
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    I was thinking about what you said, and perhaps we could add the old format using if the user did not index the field meta-data. I prefer having a flag, since the user can query separate two collections and get the same output if desired, instead of having to deal with two different XML formats.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209066014
  
    --- Diff: solr/core/src/test/org/apache/solr/update/TestNestedUpdateProcessor.java ---
    @@ -107,8 +107,8 @@ public void testDeeplyNestedURPGrandChild() throws Exception {
         };
         indexSampleData(jDoc);
     
    -    assertJQ(req("q", IndexSchema.NEST_PATH_FIELD_NAME + ":*/grandChild#*",
    -        "fl","*",
    +    assertJQ(req("q", IndexSchema.NEST_PATH_FIELD_NAME + ":*/grandChild",
    +        "fl","*, _nest_path_",
    --- End diff --
    
    This change (and others here) to a test of an URP that isn't modified in this issue underscores my point made previously having the test for that URP be more of a unit test of what the URP produces (test the SolrInputDocument), and _not_ executing queries.  I'm not saying it was wrong to make this change in this issue but just want you to reflect on the ramifications of these choices.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205477055
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by DeeplyNestedChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    --- End diff --
    
    assert size == 1?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204227797
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformerFactory.java ---
    @@ -0,0 +1,367 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.regex.Pattern;
    +
    +import org.apache.lucene.document.Document;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.QueryBitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.common.SolrException;
    +import org.apache.solr.common.SolrException.ErrorCode;
    +import org.apache.solr.common.params.SolrParams;
    +import org.apache.solr.common.util.StrUtils;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.QParser;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.apache.solr.search.SyntaxError;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +import static org.apache.solr.schema.IndexSchema.ROOT_FIELD_NAME;
    +
    +/**
    + *
    + * @since solr 4.9
    + *
    + * This transformer returns all descendants of each parent document in a flat list nested inside the parent document.
    + *
    + *
    + * The "parentFilter" parameter is mandatory.
    + * Optionally you can provide a "childFilter" param to filter out which child documents should be returned and a
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * Examples -
    + * [child parentFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue"]
    + * [child parentFilter="fieldName:fieldValue" childFilter="fieldName:fieldValue" limit=20]
    + */
    +public class DeeplyNestedChildDocTransformerFactory extends TransformerFactory {
    +
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +
    +  @Override
    +  public DocTransformer create(String field, SolrParams params, SolrQueryRequest req) {
    +    SchemaField uniqueKeyField = req.getSchema().getUniqueKeyField();
    +    if(uniqueKeyField == null) {
    +      throw new SolrException( ErrorCode.BAD_REQUEST,
    +          " ChildDocTransformer requires the schema to have a uniqueKeyField." );
    +    }
    +
    +    String childFilter = params.get( "childFilter" );
    +    String nestPath = null;
    +    int limit = params.getInt( "limit", 10 );
    +
    +    Query childFilterQuery = null;
    +    List<String> split = null;
    +    List<String> splitPath = null;
    +    if(childFilter != null) {
    +      split = StrUtils.splitSmart(childFilter, ':');
    +      splitPath = StrUtils.splitSmart(split.get(0), PATH_SEP_CHAR.charAt(0));
    +      try {
    +        if (childFilter.contains(PATH_SEP_CHAR)) {
    +          nestPath = String.join(PATH_SEP_CHAR, splitPath.subList(0, splitPath.size() - 1));
    +          // TODO: filter out parents who's childDocs don't match the original childFilter
    +          childFilter = "(" + splitPath.get(splitPath.size() - 1) + ":\"" + split.get(split.size() - 1) + "\" AND " + NEST_PATH_FIELD_NAME + ":\"" + nestPath + "/\")";
    +        }
    +        childFilterQuery = QParser.getParser(childFilter, req).getQuery();
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct child filter query" );
    +      }
    +    }
    +
    +    String parentFilter = params.get( "parentFilter" );
    +
    +    BitSetProducer parentsFilter = null;
    +
    +    if(parentFilter != null) {
    +      try {
    +        Query parentFilterQuery = QParser.getParser( parentFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    +      }
    +    } else {
    +      String sRootFilter = "{!frange l=1 u=1}strdist(" + req.getSchema().getUniqueKeyField().getName() + "," + ROOT_FIELD_NAME + ",edit)";
    +      try {
    +        Query parentFilterQuery = QParser.getParser(sRootFilter, req).getQuery();
    +        //TODO shouldn't we try to use the Solr filter cache, and then ideally implement
    +        //  BitSetProducer over that?
    +        // DocSet parentDocSet = req.getSearcher().getDocSet(parentFilterQuery);
    +        // then return BitSetProducer with custom BitSet impl accessing the docSet
    +        parentsFilter = new QueryBitSetProducer(parentFilterQuery);
    +      } catch (SyntaxError syntaxError) {
    +        throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, "Failed to create correct parent filter query" );
    +      }
    +    }
    +
    +    if(childFilterQuery == null) {
    +      return new DeeplyNestedChildDocTransformer(field, parentsFilter, req, limit);
    +    }
    +    return new DeeplyNestedFilterChildDocTransformer(field, parentsFilter, req, childFilterQuery, nestPath!=null? generatePattern(splitPath): null, limit);
    +  }
    +
    +  private Pattern generatePattern(List<String> pathList) {
    +    if(pathList.size() <= 2) {
    +      return Pattern.compile(pathList.get(0) + NUM_SEP_CHAR + "\\d");
    +    }
    +    return Pattern.compile(String.join(NUM_SEP_CHAR + "\\d" + PATH_SEP_CHAR, pathList.subList(0, pathList.size() - 1)) + NUM_SEP_CHAR + "\\d");
    +  }
    +}
    +
    +class DeeplyNestedFilterChildDocTransformer extends DeeplyNestedChildDocTransformerBase {
    +
    +  private Query childFilterQuery;
    +  private Pattern nestPathMatcher;
    +
    +  public DeeplyNestedFilterChildDocTransformer( String name, final BitSetProducer parentsFilter,
    +                              final SolrQueryRequest req, final Query childFilterQuery, Pattern pathPattern, int limit) {
    +    super(name, parentsFilter, req, limit);
    +    this.childFilterQuery = childFilterQuery;
    +    this.nestPathMatcher = pathPattern;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int docid) {
    --- End diff --
    
    Sure thing, any help would be welcome.
    This is a pretty rough draft, so I will try to work on this ASAP before I add more logic to the transformer.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204774918
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/schema15.xml ---
    @@ -567,7 +567,17 @@
       <field name="_root_" type="string" indexed="true" stored="true"/>
       <!-- required for NestedUpdateProcessor -->
       <field name="_nest_parent_" type="string" indexed="true" stored="true"/>
    -  <field name="_nest_path_" type="string" indexed="true" stored="true"/>
    +  <field name="_nest_path_" type="descendants_path" indexed="true" multiValued="false" docValues="true" stored="false" useDocValuesAsStored="true"/>
    --- End diff --
    
    Sure thing, will fix ASAP


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211258577
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -40,22 +43,52 @@
       private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
       private static final String[] names = {"Yaz", "Jazz", "Costa"};
       private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    --- End diff --
    
    I understand we need a filter query here that can be referenced by the test, but I'm a bit dubious on all these other ones.  You may very well have written the test in such a way that they are necessary at the moment, but lets consider how to simplify. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205121643
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -132,54 +126,49 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
                 // load the doc
                 SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
                     schema, new SolrReturnFields());
    -            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
                 if (shouldDecorateWithDVs) {
                   docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
                 }
                 // get parent path
                 // put into pending
                 String parentDocPath = lookupParentPath(fullDocPath);
    -            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
     
    -            // if this path has pending child docs, add them.
    -            if (isAncestor) {
    -              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    -              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
                 }
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimIfSingleDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
               }
             }
     
             // only children of parent remain
             assert pendingParentPathsToChildren.keySet().size() == 1;
     
    -        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
           }
         } catch (IOException e) {
           rootDoc.put(getName(), "Could not fetch child Documents");
         }
       }
     
    -  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    -    // lookup leaf key for these children using path
    -    // depending on the label, add to the parent at the right key/label
    -    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    -    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    -    if (!parent.containsKey(trimmedPath) && (label.contains(NUM_SEP_CHAR) && !label.endsWith(NUM_SEP_CHAR))) {
    -      List<SolrDocument> list = new ArrayList<>();
    -      parent.setField(trimmedPath, list);
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    --- End diff --
    
    there is likely a way to iterate over Map.Entry or similar so you don't have to "get" each in the next line.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213321225
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -109,9 +109,14 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           // Loop each child ID up to the parent (exclusive).
           for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) {
     
    -        // get the path.  (note will default to ANON_CHILD_KEY if not in schema or oddly blank)
    +        // get the path.  (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank)
             String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
     
    +        if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) {
    +          // is not a descendant of the transformed doc, return fast.
    +          return;
    --- End diff --
    
    Added another query to [TestChildDocumentHierarchy#testNonRootChildren](https://github.com/apache/lucene-solr/pull/416/files#diff-9fe0ab006f82be5c6a07d5bb6dbc6da0R243).
    This test failed before I changed the return to continue(previous commit), and passes using the latest.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213295374
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -99,6 +96,9 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
     
           // we'll need this soon...
           final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +      // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs,
    +      // and the iterator can not be reversed.
    +      final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME));
    --- End diff --
    
    Can you call this rootDocPath since we refer to this doc as the "root doc" elsewhere here?  I can see why you chose this name.  You could add a comment that the "root doc" is the input doc we are adding information to, and is usually but not necessarily the root of the block of documents (i.e. the root doc may itself be a child doc of another doc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    When you refer to the "array index", what do you mean?  Do you mean a DocValue ord?  Do you mean the '#2' or whatever child index?  AFAIK this code shouldn't care about child index.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208932604
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    Of course, one day this will probably get removed. Right now there is a way to provide the old XML format.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205488727
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestDeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,227 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.Iterator;
    +import java.util.concurrent.atomic.AtomicInteger;
    +
    +import com.google.common.collect.Iterables;
    +import org.apache.lucene.document.StoredField;
    +import org.apache.solr.SolrTestCaseJ4;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.BasicResultContext;
    +import org.junit.After;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +public class TestDeeplyNestedChildDocTransformer extends SolrTestCaseJ4 {
    +
    +  private static AtomicInteger counter = new AtomicInteger();
    +  private static final char PATH_SEP_CHAR = '/';
    +  private static final String[] types = {"donut", "cake"};
    +  private static final String[] ingredients = {"flour", "cocoa", "vanilla"};
    +  private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
    +  private static final String[] names = {"Yaz", "Jazz", "Costa"};
    +
    +  @BeforeClass
    +  public static void beforeClass() throws Exception {
    +    initCore("solrconfig-update-processor-chains.xml", "schema15.xml");
    +  }
    +
    +  @After
    +  public void after() throws Exception {
    +    assertU(delQ("*:*"));
    +    assertU(commit());
    +    counter.set(0); // reset id counter
    +  }
    +
    +  @Test
    +  public void testParentFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +        "/response/docs/[0]/toppings/[1]/type_s==[Chocolate]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]",
    +        "/response/docs/[0]/toppings/[1]/ingredients/[1]/name_s==[cocoa]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]",
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc", "fl", "*, _nest_path_, [child hierarchy=true]")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, fullNestedDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*, _nest_path_, [child hierarchy=true]"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testExactPath() throws Exception {
    +    indexSampleData(2);
    +    String[] tests = {
    +        "/response/numFound==4",
    +        "/response/docs/[0]/_nest_path_=='toppings#0'",
    +        "/response/docs/[1]/_nest_path_=='toppings#0'",
    +        "/response/docs/[2]/_nest_path_=='toppings#1'",
    +        "/response/docs/[3]/_nest_path_=='toppings#1'",
    +    };
    +
    +    assertJQ(req("q", "_nest_path_:*toppings/",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +
    +    assertJQ(req("q", "+_nest_path_:\"toppings/\"",
    +        "sort", "_nest_path_ asc",
    +        "fl", "*, _nest_path_"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/type_s==[Regular]",
    +    };
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings/type_s:Regular']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testGrandChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[donut]",
    +        "/response/docs/[0]/toppings/[0]/ingredients/[0]/name_s==[cocoa]"
    +    };
    +
    +    try(SolrQueryRequest req = req("q", "type_s:donut", "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']")) {
    +      BasicResultContext res = (BasicResultContext) h.queryAndResponse("/select", req).getResponse();
    +      Iterator<SolrDocument> docsStreamer = res.getProcessedDocuments();
    +      while (docsStreamer.hasNext()) {
    +        SolrDocument doc = docsStreamer.next();
    +        int currDocId = Integer.parseInt(((StoredField) doc.getFirstValue("id")).stringValue());
    +        assertEquals("queried docs are not equal to expected output for id: " + currDocId, grandChildDocTemplate(currDocId), doc.toString());
    +      }
    +    }
    +
    +
    +
    +    assertJQ(req("q", "type_s:donut",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='toppings" + PATH_SEP_CHAR + "ingredients" + PATH_SEP_CHAR + "name_s:cocoa']"),
    +        tests);
    +  }
    +
    +  @Test
    +  public void testSingularChildFilterJSON() throws Exception {
    +    indexSampleData(10);
    +    String[] tests = new String[] {
    +        "/response/docs/[0]/type_s==[cake]",
    +        "/response/docs/[0]/lonely/test_s==[testing]",
    +        "/response/docs/[0]/lonely/lonelyGrandChild/test2_s==[secondTest]"
    +    };
    +
    +    assertJQ(req("q", "type_s:cake",
    +        "sort", "id asc",
    +        "fl", "*,[child hierarchy=true childFilter='lonely" + PATH_SEP_CHAR + "lonelyGrandChild" + PATH_SEP_CHAR + "test2_s:secondTest']"),
    +        tests);
    +  }
    +
    +  private void indexSampleData(int numDocs) throws Exception {
    +    for(int i = 0; i < numDocs; ++i) {
    +      updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +    }
    +    assertU(commit());
    +  }
    +
    +  private static String id() {
    +    return "" + counter.incrementAndGet();
    +  }
    +
    +  private static String grandChildDocTemplate(int id) {
    +    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    return "SolrDocument{id=stored,indexed,tokenized,omitNorms,indexOptions=DOCS<id:" + id + ">, type_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<type_s:" + types[docNum % types.length] + ">], name_s=[stored,indexed,tokenized,omitNorms,indexOptions=DOCS<name_s:" + names[docNum % names.length] + ">], " +
    --- End diff --
    
    Hmmmm; this isn't *quite* what I had in mind.  The "stored,indexed,tokenized,..." stuff is not what I expected.  Apparently, a document.toString may not be a good a choice.  Can you JSONify this?  Either query Solr for a JSON response and use that (this approach is semi-common in Solr tests, e.g. TestJsonFacets line 192 (params to calling testJQ()) , or use JSONWriter.writeSolrDocument() somehow to avoid a search if you want to go that route?
    
    It's too bad we have to see both id & root fields in the document; it'd be nice if the child doc transformer had "fl".
    
    Ultimately, we'd like to see something easy to read -- a JSON structured nested docs with as little noise as possible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205961339
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    --- End diff --
    
    FieldType idFt = idField.getType();
    
        String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    
    Doesn't this mean we need the root document's ID?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210306003
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -227,27 +225,28 @@ private static String getPathByDocId(int segDocId, SortedDocValues segPathDocVal
     
       /**
        *
    -   * @param segDocBaseId base docID of the segment
    -   * @param RootId docID if the current root document
    -   * @param lastDescendantId lowest docID of the root document's descendant
    -   * @return the docID to loop and to not surpass limit of descendants to match specified by query
    +   * @param RootDocId docID if the current root document
    +   * @param lowestChildDocId lowest docID of the root document's descendant
    +   * @return the docID to loop and not surpass limit of descendants to match specified by query
        */
    -  private int calcLimitIndex(int segDocBaseId, int RootId, int lastDescendantId) {
    -    int i = segDocBaseId + RootId - 1; // the child document with the highest docID
    -    final int prevSegRootId = segDocBaseId + lastDescendantId;
    -    assert prevSegRootId < i; // previous rootId should be smaller then current RootId
    +  private int calcDocIdToIterateFrom(int lowestChildDocId, int RootDocId) {
    +    assert lowestChildDocId < RootDocId; // first childDocId should be smaller then current RootId
    --- End diff --
    
    if the root doc has no children, this assertion would fail, right?  Hmm; should be tested we don't blow up/fail.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205468112
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    --- End diff --
    
    Again, needs some nice comments explaining how this is populated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209070327
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,244 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.common.params.CommonParams;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +  private boolean hasPaths;
    +  private boolean anonChildDoc;
    +
    +  public ChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                             final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +    this.hasPaths = req.getSchema().hasExplicitField(NEST_PATH_FIELD_NAME);
    +    this.anonChildDoc = req.getParams().getBool(CommonParams.ANONYMOUS_CHILD_DOCS, false);
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to perform a ChildBlockJoinQuery
    +    return new String[] { idField.getName() };
    +  }
    +
    +  public boolean hasPaths() {
    +    return hasPaths;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +
    +      if(children.matches() > 0) {
    +        long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +        final int seg = (int) (segAndId >> 32);
    +        final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +        final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +        // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +        // MultiMap is the direct child document's key(of the parent document)
    +        Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = hasPaths()? getPathByDocId(docId - segBaseId, segPathDocValues): "_childDocuments_";
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = getParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +
    +            final String lastPath = getLastPath(fullDocPath);
    +            // trim path if the doc was inside array, see ChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(hasPaths()? trimPathIfArrayDoc(lastPath) : lastPath, doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  private void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  private void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // if anonChildDoc is set to true we do not need to add the child document's relation to its parent document.
    +    if(anonChildDoc) {
    +      parent.addChildDocuments(children);
    +      return;
    +    }
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by ChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private static String getLastPath(String path) {
    +    int lastIndexOfPathSepChar = path.lastIndexOf(PATH_SEP_CHAR.charAt(0));
    +    if(lastIndexOfPathSepChar == -1) {
    +      return path;
    +    }
    +    return path.substring(lastIndexOfPathSepChar + 1);
    +  }
    +
    +  private static String trimPathIfArrayDoc(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = path.charAt(lastIndex) == NUM_SEP_CHAR.charAt(0);
    +    return singleDocVal ? path: path.substring(0, path.lastIndexOf(NUM_SEP_CHAR.charAt(0)));
    +  }
    +
    +  private static String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path: path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  private static String getParentPath(String currDocPath) {
    +    // chop off leaf (after last '/')
    +    // if child of leaf then return null (special value)
    +    int lastPathIndex = currDocPath.lastIndexOf(PATH_SEP_CHAR);
    +    return lastPathIndex == -1 ? null: currDocPath.substring(0, lastPathIndex);
    +  }
    +
    +  private static String getPathByDocId(int segDocId, SortedDocValues segPathDocValues) throws IOException {
    +    int numToAdvance = segPathDocValues.docID()==-1?segDocId: segDocId - (segPathDocValues.docID());
    +    assert numToAdvance >= 0;
    +    assert segPathDocValues.advanceExact(segDocId);
    --- End diff --
    
    Woah; we want to call advanceExact even if assertions are disabled -- like, you know, in production/real-world; not just tests.  Thankfully, Lucene/Solr's randomized test infrastructure actually randomly disables assertions and thus would catch this (eventually) if I didn't.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205463666
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformerFactory.java ---
    @@ -61,6 +71,12 @@
      */
     public class ChildDocTransformerFactory extends TransformerFactory {
     
    +  public static final String PATH_SEP_CHAR = "/";
    +  public static final String NUM_SEP_CHAR = "#";
    +  private static final BooleanQuery rootFilter = new BooleanQuery.Builder()
    +      .add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.MUST))
    +      .add(new BooleanClause(new WildcardQuery(new Term(NEST_PATH_FIELD_NAME, new BytesRef("*"))), BooleanClause.Occur.MUST_NOT)).build();
    --- End diff --
    
    Remember again to use DocValuesExistsQuery


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r209862802
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/schema15.xml ---
    @@ -567,7 +567,17 @@
       <field name="_root_" type="string" indexed="true" stored="true"/>
       <!-- required for NestedUpdateProcessor -->
       <field name="_nest_parent_" type="string" indexed="true" stored="true"/>
    -  <field name="_nest_path_" type="string" indexed="true" stored="true"/>
    +  <field name="_nest_path_" type="descendants_path" indexed="true" multiValued="false" docValues="true" stored="false" useDocValuesAsStored="false"/>
    +  <fieldType name="descendants_path" class="solr.SortableTextField">
    +    <analyzer type="index">
    +      <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(^.*.*$)" replacement="$0/"/>
    --- End diff --
    
    Sure thing


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204785508
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    --- End diff --
    
    We don't want the # and trailing number to be added to the fieldName represented in the document hierarchy, so trimSuffixFromPath is used.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205475272
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    --- End diff --
    
    Lets actually throw an exception to loudly complain that the docs in the "block" are wrong/invalid and/or the nest paths are encoded incorrectly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213319927
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -99,6 +96,9 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
     
           // we'll need this soon...
           final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +      // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs,
    +      // and the iterator can not be reversed.
    +      final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME));
    --- End diff --
    
    Sure thing.
    Done :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205096164
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    To be clear, I just mean a `Set<String>` of paths like `foo#1/bar#9/author#`  (author is single valued)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r210600024
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -87,7 +87,12 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           final int segBaseId = leafReaderContext.docBase;
           final int segRootId = rootDocId - segBaseId;
           final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    -      final int segPrevRootId = segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +      final int segPrevRootId = rootDocId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (rootDocId - 1)) {
    --- End diff --
    
    Ooooh, good catch.
    
    Lets enhance the tests in this file a bit to help us give confidence that we're using docIDs correctly (and help avoid future enhancers/modifiers from introducing similar bugs).  Here's what I propose:  in the @BeforeClass, if random().nextBoolean(), add some nested docs -- using one of your existing nested document adding methods.  And randomly do a commit() to flush the segment.  Later in the test methods we need to add a filter query that will exclude those docs.  One way to do this is to ensure these first docs have some field we can exclude.  Another way might be knowing the maximum uniqueKey ID you can query by prior to the test starting, and then adding a filter query with a range saying the uniqueKey must be at least this value.  Make sense?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213540255
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -123,6 +124,16 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
     
             // Do we need to do anything with this doc (either ancestor or matched the child query)
             if (isAncestor || childDocSet == null || childDocSet.exists(docId)) {
    +
    +          if(limit != -1) {
    +            if(!isAncestor) {
    +              if(matches == limit) {
    +                continue;
    +              }
    +              ++matches;
    --- End diff --
    
    I think matches should be incremented if it's in childDocSet (includes childDocSet being null).  Wether it's an ancestor or not doesn't matter I think.  You could pull out a new variable isInChildDocSet.  Or I suppose simply consider all a match; what I see what you just did as I write this; that's fine too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    This is only a start, I will merge this into the main ChildDocTransformerFactory.
    This new algorithm is simpler, just like you suggested, although it contains a few minor implementation changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r208928799
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformer.java ---
    @@ -242,10 +242,10 @@ private void testChildDocNonStoredDVFields() throws Exception {
             "fl", "*,[child parentFilter=\"subject:parentDocument\"]"), test1);
     
         assertJQ(req("q", "*:*", "fq", "subject:\"parentDocument\" ",
    -        "fl", "subject,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    +        "fl", "id,_childDocuments_,subject,intDvoDefault,[child parentFilter=\"subject:parentDocument\" childFilter=\"title:foo\"]"), test2);
    --- End diff --
    
    I think either the user hasn't yet started using the new key'ed/labeled style of child documents, or they have updated completely.  It's a migration to a new way you either do or don't do (and perhaps one day will not have a choice).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204787481
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +            pendingParentPathsToChildren.put(parentDocPath, doc); // multimap add (won't replace)
    +
    +            // if this path has pending child docs, add them.
    +            if (isAncestor) {
    +              addChildrenToParent(doc, pendingParentPathsToChildren.get(fullDocPath));
    +              pendingParentPathsToChildren.removeAll(fullDocPath); // no longer pending
    +            }
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.get(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildToParent(SolrDocument parent, SolrDocument child, String label) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    // TODO: unfortunately this is the 2nd time we grab the paths for these docs. resolve how?
    +    String trimmedPath = trimSuffixFromPaths(getLastPath(label));
    --- End diff --
    
    Ok.  So I propose renaming the "label" variable to be "nestPath" since that's what it seems to be.  getLastPath is fine.  trimSuffixFromPaths should probable be renamed to trimLastChildIndex.  Then, "trimmedPath" variable would in fact be the "label" so should be named as such.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211495503
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -40,22 +43,52 @@
       private static final Iterator<String> ingredientsCycler = Iterables.cycle(ingredients).iterator();
       private static final String[] names = {"Yaz", "Jazz", "Costa"};
       private static final String[] fieldsToRemove = {"_nest_parent_", "_nest_path_", "_root_"};
    +  private static final int sumOfDocsPerNestedDocument = 8;
    +  private static final int numberOfDocsPerNestedTest = 10;
    +  private static boolean useSegments;
    +  private static int randomDocTopId = 0;
    +  private static String filterOtherSegments;
     
       @BeforeClass
       public static void beforeClass() throws Exception {
         initCore("solrconfig-update-processor-chains.xml", "schema-nest.xml"); // use "nest" schema
    +    useSegments = random().nextBoolean();
    +    if(useSegments) {
    +      final int numOfDocs = 10;
    +      for(int i = 0; i < numOfDocs; ++i) {
    +        updateJ(generateDocHierarchy(i), params("update.chain", "nested"));
    +        if(random().nextBoolean()) {
    +          assertU(commit());
    +        }
    +      }
    +      assertU(commit());
    +      randomDocTopId = counter.get();
    +      filterOtherSegments = "{!frange l=" + randomDocTopId + " incl=false}idInt";
    +    } else {
    +      filterOtherSegments = "*:*";
    +    }
       }
     
       @After
       public void after() throws Exception {
    -    clearIndex();
    +    if (!useSegments) {
    --- End diff --
    
    So I could simply use delQ using the filter,
    great :-).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r204769700
  
    --- Diff: solr/core/src/test-files/solr/collection1/conf/schema15.xml ---
    @@ -567,7 +567,17 @@
       <field name="_root_" type="string" indexed="true" stored="true"/>
       <!-- required for NestedUpdateProcessor -->
       <field name="_nest_parent_" type="string" indexed="true" stored="true"/>
    -  <field name="_nest_path_" type="string" indexed="true" stored="true"/>
    +  <field name="_nest_path_" type="descendants_path" indexed="true" multiValued="false" docValues="true" stored="false" useDocValuesAsStored="true"/>
    --- End diff --
    
    I think useDocValuesAsStored=false.  This is internal and shouldn't be returned when someone does `fl=*`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213291939
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -0,0 +1,263 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.lang.invoke.MethodHandles;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.ReaderUtil;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.search.DocSet;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class ChildDocTransformer extends DocTransformer {
    +  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
    +
    +  private static final String ANON_CHILD_KEY = "_childDocuments_";
    +
    +  private final String name;
    +  private final BitSetProducer parentsFilter;
    +  private final DocSet childDocSet;
    +  private final int limit;
    +  private final boolean isNestedSchema;
    +
    +  private final SolrReturnFields childReturnFields = new SolrReturnFields();
    +
    +  ChildDocTransformer(String name, BitSetProducer parentsFilter,
    +                      DocSet childDocSet, boolean isNestedSchema, int limit) {
    +    this.name = name;
    +    this.parentsFilter = parentsFilter;
    +    this.childDocSet = childDocSet;
    +    this.limit = limit;
    +    this.isNestedSchema = isNestedSchema;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +    // note: this algorithm works if both if we have have _nest_path_  and also if we don't!
    +
    +    try {
    +
    +      // lookup what the *previous* rootDocId is, and figure which segment this is
    +      final SolrIndexSearcher searcher = context.getSearcher();
    +      final List<LeafReaderContext> leaves = searcher.getIndexReader().leaves();
    +      final int seg = ReaderUtil.subIndex(rootDocId, leaves);
    +      final LeafReaderContext leafReaderContext = leaves.get(seg);
    +      final int segBaseId = leafReaderContext.docBase;
    +      final int segRootId = rootDocId - segBaseId;
    +      final BitSet segParentsBitSet = parentsFilter.getBitSet(leafReaderContext);
    +
    +      final int segPrevRootId = segRootId==0? -1: segParentsBitSet.prevSetBit(segRootId - 1); // can return -1 and that's okay
    +
    +      if(segPrevRootId == (segRootId - 1)) {
    +        // doc has no children, return fast
    +        return;
    +      }
    +
    +      // we'll need this soon...
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +      // passing a different SortedDocValues obj since the child documents which come after are of smaller docIDs,
    +      // and the iterator can not be reversed.
    +      final String transformedDocPath = getPathByDocId(segRootId, DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME));
    +
    +      // the key in the Map is the document's ancestors key(one above the parent), while the key in the intermediate
    +      // MultiMap is the direct child document's key(of the parent document)
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +      final int lastChildId = segBaseId + segPrevRootId + 1;
    +      // Loop each child ID up to the parent (exclusive).
    +      for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) {
    +
    +        // get the path.  (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank)
    +        String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +        if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) {
    --- End diff --
    
    > Perhaps a better way to do this ...
    
    I think it would be slow and add complexity if we did that.
    
    string.contains(...) doesn't seem right here; shouldn't it be startsWith or endsWith?  Will probably need to special-case empty transformedDocPath (rootDocPath).
    
    I think it's possible to not need "isNestedSchema", though it's not a big deal.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I will push another commit soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205960639
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    +          }
    +        }
    +
    +        // only children of parent remain
    +        assert pendingParentPathsToChildren.keySet().size() == 1;
    +
    +        addChildrenToParent(rootDoc, pendingParentPathsToChildren.remove(null));
    +      }
    +    } catch (IOException e) {
    +      rootDoc.put(getName(), "Could not fetch child Documents");
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Multimap<String, SolrDocument> children) {
    +    for(String childLabel: children.keySet()) {
    +      addChildrenToParent(parent, children.get(childLabel), childLabel);
    +    }
    +  }
    +
    +  void addChildrenToParent(SolrDocument parent, Collection<SolrDocument> children, String cDocsPath) {
    +    // lookup leaf key for these children using path
    +    // depending on the label, add to the parent at the right key/label
    +    String trimmedPath = trimLastPound(cDocsPath);
    +    // if the child doc's path does not end with #, it is an array(same string is returned by DeeplyNestedChildDocTransformer#trimLastPound)
    +    if (!parent.containsKey(trimmedPath) && (trimmedPath == cDocsPath)) {
    +      List<SolrDocument> list = new ArrayList<>(children);
    +      parent.setField(trimmedPath, list);
    +      return;
    +    }
    +    // is single value
    +    parent.setField(trimmedPath, ((List)children).get(0));
    +  }
    +
    +  private String getLastPath(String path) {
    +    if(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) == -1) {
    +      return path;
    +    }
    +    return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
    +  }
    +
    +  private String trimPathIfArrayDoc(String path) {
    +    // remove index after last pound sign and if there is an array index e.g. toppings#1 -> toppings
    +    // or return original string if child doc is not in an array ingredients# -> ingredients#
    +    int lastIndex = path.length() - 1;
    +    boolean singleDocVal = path.charAt(lastIndex) == NUM_SEP_CHAR.charAt(0);
    +    return singleDocVal ? path: path.substring(0, path.lastIndexOf(NUM_SEP_CHAR.charAt(0)));
    +  }
    +
    +  private String trimLastPound(String path) {
    +    // remove index after last pound sign and index from e.g. toppings#1 -> toppings
    +    int lastIndex = path.lastIndexOf('#');
    +    return lastIndex == -1 ? path: path.substring(0, lastIndex);
    +  }
    +
    +  /**
    +   * Returns the *parent* path for this document.
    +   * Children of the root will yield null.
    +   */
    +  String lookupParentPath(String currDocPath) {
    +    // chop off leaf (after last '/')
    +    // if child of leaf then return null (special value)
    +    int lastPathIndex = currDocPath.lastIndexOf(PATH_SEP_CHAR);
    +    return lastPathIndex == -1 ? null: currDocPath.substring(0, lastPathIndex);
    +  }
    +
    +  private String getPathByDocId(int segDocId, SortedDocValues segPathDocValues) throws IOException {
    +    int numToAdvance = segPathDocValues.docID()==-1?segDocId: segDocId - (segPathDocValues.docID());
    --- End diff --
    
    Yes,
    because if we have not set the docId (the method returns -1), the iterator will advance by one doc too many.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205474348
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Map<String, Multimap<String, SolrDocument>> pendingParentPathsToChildren = new HashMap<>();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true);
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            if (shouldDecorateWithDVs) {
    +              docFetcher.decorateDocValueFields(doc, docId, dvFieldsToReturn);
    +            }
    +            // get parent path
    +            // put into pending
    +            String parentDocPath = lookupParentPath(fullDocPath);
    +
    +            if(isAncestor) {
    +              // if this path has pending child docs, add them.
    +              addChildrenToParent(doc, pendingParentPathsToChildren.remove(fullDocPath)); // no longer pending
    +            }
    +            // trim path if the doc was inside array, see DeeplyNestedChildDocTransformer#trimPathIfArrayDoc
    +            // e.g. toppings#1/ingredients#1 -> outer map key toppings#1
    +            // -> inner MultiMap key ingredients
    +            // or lonely#/lonelyGrandChild# -> outer map key lonely#
    +            // -> inner MultiMap key lonelyGrandChild#
    +            pendingParentPathsToChildren.computeIfAbsent(parentDocPath, x -> ArrayListMultimap.create())
    +                .put(trimPathIfArrayDoc(getLastPath(fullDocPath)), doc); // multimap add (won't replace)
    --- End diff --
    
    I suggest renaming "trimPathIfArrayDoc" to "trimChildIndexIfArrayDoc".
    
    The long chain of methods here is hard to read/parse; takes time.  I appreciate your efforts with comments above.  I think it would help clarify if you break it down with some intermediate well-named variables.  "label" would be one, with a comment saying it's _actually_ possibly a label with a trailing pound sign (not easy to articulate this in a var name alone :-)     Hmmm; perhaps when we populate the path in the URP, it would ultimately be more clear if a unitary child had no "#" at all, and "#" would signify it's in an array?  If we did that, then at this spot here, "#" would more clearly represent an array doc instead of the other way around.  Know what I mean?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205124848
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -190,8 +179,18 @@ private String getLastPath(String path) {
         return path.substring(path.lastIndexOf(PATH_SEP_CHAR.charAt(0)) + 1);
       }
     
    -  private String trimSuffixFromPaths(String path) {
    -    return path.replaceAll("#\\d|#", "");
    +  private String trimIfSingleDoc(String path) {
    --- End diff --
    
    Oh gotcha -- sure.  It does complicate an explanation of what that String is, as it won't simply be a label, but that's okay.  Please add comments where we declare the field to explain this stuff. (e.g.. what the outer String is (with example), what the intermediate string is (with 2 examples)).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205086247
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.List;
    +import java.util.Set;
    +import java.util.stream.Collectors;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    +
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  private BitSetProducer parentsFilter;
    +  protected int limit;
    +  private final static Sort docKeySort = new Sort(new SortField(null, SortField.Type.DOC, false));
    +  private Query childFilterQuery;
    +
    +  public DeeplyNestedChildDocTransformer(String name, final BitSetProducer parentsFilter,
    +                                         final SolrQueryRequest req, final Query childFilterQuery, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.childFilterQuery = childFilterQuery;
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  @Override
    +  public void transform(SolrDocument rootDoc, int rootDocId) {
    +
    +    FieldType idFt = idField.getType();
    +
    +    String rootIdExt = getSolrFieldString(rootDoc.getFirstValue(idField.getName()), idFt);
    +
    +    try {
    +      Query parentQuery = idFt.getFieldQuery(null, idField, rootIdExt);
    +      Query query = new ToChildBlockJoinQuery(parentQuery, parentsFilter);
    +      SolrIndexSearcher searcher = context.getSearcher();
    +      DocList children = searcher.getDocList(query, childFilterQuery, docKeySort, 0, limit);
    +      long segAndId = searcher.lookupId(new BytesRef(rootIdExt));
    +      final int seg = (int) (segAndId >> 32);
    +      final LeafReaderContext leafReaderContext = searcher.getIndexReader().leaves().get(seg);
    +      final SortedDocValues segPathDocValues = DocValues.getSorted(leafReaderContext.reader(), NEST_PATH_FIELD_NAME);
    +
    +      Multimap<String,SolrDocument> pendingParentPathsToChildren = ArrayListMultimap.create();
    +
    +      if(children.matches() > 0) {
    +        SolrDocumentFetcher docFetcher = searcher.getDocFetcher();
    +        Set<String> dvFieldsToReturn = docFetcher.getNonStoredDVs(true).stream()
    +            .filter(name -> !NEST_PATH_FIELD_NAME.equals(name)).collect(Collectors.toSet());
    +        boolean shouldDecorateWithDVs = dvFieldsToReturn.size() > 0;
    +        DocIterator i = children.iterator();
    +        final int segBaseId = leafReaderContext.docBase;
    +        final int firstChildDocId = i.nextDoc();
    +        assert firstChildDocId < rootDocId;
    +
    +        for (int docId = firstChildDocId; docId < rootDocId; ++docId) {
    +          // get the path
    +          String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
    +
    +          // Is this doc a direct ancestor of another doc we've seen?
    +          boolean isAncestor = pendingParentPathsToChildren.containsKey(fullDocPath);
    +
    +          // Do we need to do anything with this doc (either ancestor or a matched the child query)
    +          if (isAncestor || children.exists(docId)) {
    +            // load the doc
    +            SolrDocument doc = DocsStreamer.convertLuceneDocToSolrDoc(docFetcher.doc(docId),
    +                schema, new SolrReturnFields());
    +            doc.setField(NEST_PATH_FIELD_NAME, fullDocPath);
    --- End diff --
    
    I expected the intermediate string is a *label* like “comment” not a path. A label will be shared by all children under that label. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on the issue:

    https://github.com/apache/lucene-solr/pull/416
  
    I found this nasty bug in [AtomicUpdateDocumentMerger#isAtomicUpdate](https://github.com/apache/lucene-solr/pull/416/commits/7cd1dbc3208746541bd80f2d2656b6464fecd23d#diff-50d2f2a4580a649dc2a08749061504a0R78), where single child docs(non array) were mistaken for an atomicUpdate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r211519256
  
    --- Diff: solr/core/src/test/org/apache/solr/response/transform/TestChildDocTransformerHierarchy.java ---
    @@ -264,7 +309,7 @@ private static Object cleanIndexableField(Object field) {
       }
     
       private static String grandChildDocTemplate(int id) {
    -    int docNum = id / 8; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    +    int docNum = (id / sumOfDocsPerNestedDocument) % numberOfDocsPerNestedTest; // the index of docs sent to solr in the AddUpdateCommand. e.g. first doc is 0
    --- End diff --
    
    I tried simplifying the calculation a bit,  [here's a link](https://github.com/apache/lucene-solr/pull/416/files#diff-9fe0ab006f82be5c6a07d5bb6dbc6da0R299).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r206846562
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r203719810
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildTransformerBase.java ---
    @@ -0,0 +1,139 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.solr.response.transform;
    +
    +import java.util.ArrayList;
    +import java.util.Iterator;
    +import java.util.List;
    +import java.util.Objects;
    +import java.util.stream.Stream;
    +
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.util.BitSet;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +
    +import static org.apache.solr.response.transform.DeeplyNestedChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +/**
    + *
    + * This base class helps create a child doc transformer which caches the parent query using QueryBitProducer
    + *
    + *
    + * "limit" param which provides an option to specify the number of child documents
    + * to be returned per parent document. By default it's set to 10.
    + *
    + * @see org.apache.solr.response.transform.DeeplyNestedChildDocTransformer
    + * @see org.apache.solr.response.transform.DeeplyNestedFilterChildDocTransformer
    + */
    +
    +abstract class DeeplyNestedChildDocTransformerBase extends DocTransformer {
    +  private final String name;
    +  protected final SchemaField idField;
    +  protected final SolrQueryRequest req;
    +  protected final IndexSchema schema;
    +  protected BitSetProducer parentsFilter;
    +  protected BitSet parents;
    +  protected int limit;
    +  protected final Sort pathKeySort;
    +
    +  public DeeplyNestedChildDocTransformerBase( String name, final BitSetProducer parentsFilter,
    +                                          final SolrQueryRequest req, int limit) {
    +    this.name = name;
    +    this.schema = req.getSchema();
    +    this.idField = this.schema.getUniqueKeyField();
    +    this.req = req;
    +    this.parentsFilter = parentsFilter;
    +    this.limit = limit;
    +    this.pathKeySort = new Sort(new SortField(NEST_PATH_FIELD_NAME, SortField.Type.STRING, false),
    +        new SortField(idField.getName(), SortField.Type.STRING, false));
    +  }
    +
    +  @Override
    +  public String getName()  {
    +    return name;
    +  }
    +
    +  @Override
    +  public String[] getExtraRequestFields() {
    +    // we always need the idField (of the parent) in order to fill out it's children
    +    return new String[] { idField.getName() };
    +  }
    +
    +  protected static SolrDocument getChildByPath(String[] pathAndNum, SolrDocument lastDoc) {
    +    List<Object> fieldsValues = (List<Object>) lastDoc.getFieldValues(pathAndNum[0]);
    +    int childIndex = Integer.parseInt(pathAndNum[1]);
    +    return fieldsValues.size() > childIndex ? (SolrDocument) fieldsValues.get(childIndex): null;
    +  }
    +
    +  protected static void addChild(SolrDocument parentDoc, String[] pathAndNum, SolrDocument cDoc) {
    +    if(!pathAndNum[1].equals("") && (parentDoc.get(pathAndNum[0]) == null)) {
    +      parentDoc.setField(pathAndNum[0], new NullFilteringArrayList<SolrDocument>());
    +    }
    +    NullFilteringArrayList fieldValues = (NullFilteringArrayList) parentDoc.getFieldValues(pathAndNum[0]);
    +    int pathNum = Integer.parseInt(pathAndNum[1]);
    +
    +    fieldValues.addWithPlaceHolder(pathNum, cDoc);
    +  }
    +
    +  protected static String[] getPathAndNum(String lastPath) {
    +    return lastPath.split(NUM_SEP_CHAR);
    +  }
    +
    +  protected static String getSolrFieldString(Object fieldVal, FieldType fieldType) {
    +    return fieldVal instanceof IndexableField
    +        ? fieldType.toExternal((IndexableField)fieldVal)
    +        : fieldVal.toString();
    +  }
    +
    +  protected static class NullFilteringArrayList<T> extends ArrayList<T> {
    --- End diff --
    
    Um; this ArrayList subclass seems undesirable to me; I'm not sure yet why it's used (though I'm sure you have your reasons)... but maybe there could be some change to the logic to avoid the need for this thing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by dsmiley <gi...@git.apache.org>.
Github user dsmiley commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r205466774
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/DeeplyNestedChildDocTransformer.java ---
    @@ -0,0 +1,224 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.solr.response.transform;
    +
    +import java.io.IOException;
    +import java.util.ArrayList;
    +import java.util.Collection;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.Set;
    +
    +import com.google.common.collect.ArrayListMultimap;
    +import com.google.common.collect.Multimap;
    +import org.apache.lucene.index.DocValues;
    +import org.apache.lucene.index.IndexableField;
    +import org.apache.lucene.index.LeafReaderContext;
    +import org.apache.lucene.index.SortedDocValues;
    +import org.apache.lucene.search.Query;
    +import org.apache.lucene.search.Sort;
    +import org.apache.lucene.search.SortField;
    +import org.apache.lucene.search.join.BitSetProducer;
    +import org.apache.lucene.search.join.ToChildBlockJoinQuery;
    +import org.apache.lucene.util.BytesRef;
    +import org.apache.solr.common.SolrDocument;
    +import org.apache.solr.request.SolrQueryRequest;
    +import org.apache.solr.response.DocsStreamer;
    +import org.apache.solr.schema.FieldType;
    +import org.apache.solr.schema.IndexSchema;
    +import org.apache.solr.schema.SchemaField;
    +import org.apache.solr.search.DocIterator;
    +import org.apache.solr.search.DocList;
    +import org.apache.solr.search.SolrDocumentFetcher;
    +import org.apache.solr.search.SolrIndexSearcher;
    +import org.apache.solr.search.SolrReturnFields;
    +
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.NUM_SEP_CHAR;
    +import static org.apache.solr.response.transform.ChildDocTransformerFactory.PATH_SEP_CHAR;
    +import static org.apache.solr.schema.IndexSchema.NEST_PATH_FIELD_NAME;
    +
    +class DeeplyNestedChildDocTransformer extends DocTransformer {
    --- End diff --
    
    As with our URP, lets forgo the "Deeply" terminology.  I hope this will simply be how any nested docs in the future are done rather than making a distinction.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #416: WIP: SOLR-12519

Posted by moshebla <gi...@git.apache.org>.
Github user moshebla commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/416#discussion_r213306038
  
    --- Diff: solr/core/src/java/org/apache/solr/response/transform/ChildDocTransformer.java ---
    @@ -109,9 +109,14 @@ public void transform(SolrDocument rootDoc, int rootDocId) {
           // Loop each child ID up to the parent (exclusive).
           for (int docId = calcDocIdToIterateFrom(lastChildId, rootDocId); docId < rootDocId; ++docId) {
     
    -        // get the path.  (note will default to ANON_CHILD_KEY if not in schema or oddly blank)
    +        // get the path.  (note will default to ANON_CHILD_KEY if schema is not nested or empty string if blank)
             String fullDocPath = getPathByDocId(docId - segBaseId, segPathDocValues);
     
    +        if(isNestedSchema && !fullDocPath.contains(transformedDocPath)) {
    +          // is not a descendant of the transformed doc, return fast.
    +          return;
    --- End diff --
    
    Yep, you're right.
    I'll investigate further to see why a test did not fail because of this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org