You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Lee <ni...@comcast.net> on 2018/02/04 23:55:45 UTC

Best approach for flattening data or using nested lists.

I have a project where I am working with nested data (not that deep, but 
multiple lists) and would love to get some advice from other experienced 
developers. I've read most of the books on Solr (including Solr In 
Action) and though they provide good information (though dated) on the 
actual indexing mechanism, not many deal with this issue very much.

If there are other resources that aren't necessarily Solr specific that 
can help here, please feel free to point those out.

Here is the structure I'm working with. I've made it generic to simplify 
things, but the intent is here.

{
     id: 1,

     _type: "book",
     name: "My Martian",
     genre: "Science Fiction",
     edits: [
         {
             _type: "book_action",
             action: "Modify",
             chapter: 3,
             description: "Corrected spelling for interstellar"
         }, {
             _type: "book_action",
             action: "Removal",
             chapter: 24,
             description: "Removed chapter as it adds no value to the story"
         }
     ],
     chapters: [
         {
             _type: "book_chapter",
             chapter_number: 1,
             chapter_title: "The Test"
         }, {
             _type: "book_chapter",
             chapter_number: 2,
             chapter_title: "The Next Test"
         }
     ]
}

My first attempt was to just add both lists through SolrJ (can't do this 
with the JSON interface since it doesn't allow multiple _childDocuments_ 
at the same level). That works and I'm able to use the _type value to 
distinguish between them. However, my problem here is that the users 
want to be able to search for any field in the top level of the data as 
well as within the lists. For example (using sql for clarity only):

select * from book_index where genre = "Science Fiction" and action = 
"Removal" and chapter_number = 2;

The problem I'm having with this sort of search is that, based on what I 
know, the {!child ....... and {!parent ..... parsers won't give me 
access to all fields like this.

I've looked at flattening the data similar to the following:

{
     id: 1,
     name: "My Martian",
     genre: "Science Fiction",
     edit_action_3: {
         action: "Modify",
         chapter: 3,
         description: "Corrected spelling for interstellar"
     },
     edit_action_24: {
         action: "Removal",
         chapter: 24,
         description: "Removed chapter as it adds no value to the story"
     },
     chapter_1: {
         chapter_number: 1,
         chapter_title: "The Test"
     },
     chapter_2: {
         chapter_number: 2,
         chapter_title: "The Next Test"
     }

}

This does flatten things out so that the above query would be able to 
search on any field, but it's a real kludge and makes it nearly 
impossible to get just a list of chapters or actions.

So anyone have any thoughts? (FYI, this is my first Solr project so I'm 
really starting from scratch here).

Thanks




---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus