You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by denl0 <da...@gmail.com> on 2012/11/14 17:26:26 UTC

Solr defining Schema structure trouble.

I'm having trouble putting somewhat related data in my solr schema.
I know solr isn't a database but I need some data to be put in solr.

Problem. 
I have plenty of books to index.
The user want's page hit results. The terms you where looking for are found
on page X.
To do this I was told to make a solrDocument of each seperate pageContent
and pass that to solr.

The problems I have in my structure are.

-*Data stored related to the document is stored x pages times. While it
should only be stored once*? (In this case only the name but I have alot
more fields)

<solrDoc>
<id>1</id>
<docname>test.pdf</docmname>
<pagenumber>1</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

-*Some data related to a document is related to each othe*r.
Let's say these combinations are possible

-ac
-ad
-be

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>

<model>a</model> //multivalue field model
<model>b</model>
<extra>c</extra>  //multivaluefield extra
<extra>d</extra>
<extra>e</extra>
</solrDoc>

I was wondering how I could solve these problems with the creation of my
schema. And how to query it!

An option would be to create a document with each possible combination of
the model and extra fields. But I don't know how this would be possible to
query



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr defining Schema structure trouble.

Posted by Jack Krupansky <ja...@basetechnology.com>.
You could implement a custom search component that takes the pages found by 
the query and then re-queries to find the book-level documents and adds them 
to the search results. Or, you could even have a query/parameter that found 
the pages but then discarded them and only kept the book metadata.

-- Jack Krupansky

-----Original Message----- 
From: denl0
Sent: Wednesday, November 21, 2012 4:01 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr defining Schema structure trouble.

isn't it possible to combine the document related values and page related
values at query time?

Book1
Page1 with ref to book1
Page2 with ref to book2

When querying making all pages (page1+book1) and (page2+book1) Or would this
be hard to achieve.

I'm pretty sure they wan't to search on book related metadata too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4021531.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Solr defining Schema structure trouble.

Posted by denl0 <da...@gmail.com>.
isn't it possible to combine the document related values and page related
values at query time?

Book1
Page1 with ref to book1
Page2 with ref to book2

When querying making all pages (page1+book1) and (page2+book1) Or would this
be hard to achieve. 

I'm pretty sure they wan't to search on book related metadata too.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4021531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr defining Schema structure trouble.

Posted by Jack Krupansky <ja...@basetechnology.com>.
Ah... sure, you can create a schema that has several different document 
types in it, with extra fields that are used in some but not all documents - 
books have the metadata fields but no page bodies while pages have page 
bodies but no metadata. And maybe even do a Solr join for the "block" of 
pages that are for the same book. Or, just two queries - the first to get 
the pages, grouped, and then take their book names/IDs and query the 
book-level metadata. You can also store the book-level metadata in a 
separate Solr collection.

But, having said that, you have to decide whether your content search is a 
pure content search or whether you also want to search by metadata as well. 
The searchable metadata should be present on each of the pages in addition 
to the book level. That may seem like repetition, but that's okay. The bulk 
of the storage will be the page bodies themselves.

-- Jack Krupansky

-----Original Message----- 
From: denl0
Sent: Thursday, November 15, 2012 5:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr defining Schema structure trouble.

Yes this is what I'm trying to do. But stuff related to the document like
language/title/...(i got way more fields) are stored many times. Each page
has a part of data that's the same is it possible to seperate that data?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4020471.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Solr defining Schema structure trouble.

Posted by denl0 <da...@gmail.com>.
Yes this is what I'm trying to do. But stuff related to the document like
language/title/...(i got way more fields) are stored many times. Each page
has a part of data that's the same is it possible to seperate that data? 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305p4020471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr defining Schema structure trouble.

Posted by Jack Krupansky <ja...@basetechnology.com>.
You can break your books into individual pages, each a separate Solr 
"document", with the full page text as one tokenized text field value. Solr 
(Lucene) will take care of indexing the individual terms on each page. Then 
when you query on terms, Solr will find all pages that have the specified 
terms, ranking them by frequency and number of terms that match on each 
page.

You can also use grouping (field collapsing) to group the pages by book 
(another field or the id would be the book name.)

-- Jack Krupansky

-----Original Message----- 
From: denl0
Sent: Wednesday, November 14, 2012 8:26 AM
To: solr-user@lucene.apache.org
Subject: Solr defining Schema structure trouble.

I'm having trouble putting somewhat related data in my solr schema.
I know solr isn't a database but I need some data to be put in solr.

Problem.
I have plenty of books to index.
The user want's page hit results. The terms you where looking for are found
on page X.
To do this I was told to make a solrDocument of each seperate pageContent
and pass that to solr.

The problems I have in my structure are.

-*Data stored related to the document is stored x pages times. While it
should only be stored once*? (In this case only the name but I have alot
more fields)

<solrDoc>
<id>1</id>
<docname>test.pdf</docmname>
<pagenumber>1</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>
</solrDoc>

-*Some data related to a document is related to each othe*r.
Let's say these combinations are possible

-ac
-ad
-be

<solrDoc>
<id>2</id>
<docname>test.pdf</docmname>
<pagenumber>2</pagenumber>
<pagecontent>blablabla</pagecontent>

<model>a</model> //multivalue field model
<model>b</model>
<extra>c</extra>  //multivaluefield extra
<extra>d</extra>
<extra>e</extra>
</solrDoc>

I was wondering how I could solve these problems with the creation of my
schema. And how to query it!

An option would be to create a document with each possible combination of
the model and extra fields. But I don't know how this would be possible to
query



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-defining-Schema-structure-trouble-tp4020305.html
Sent from the Solr - User mailing list archive at Nabble.com.