You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by NabbleUser <ma...@maxstricker.it> on 2013/05/16 14:10:20 UTC

Is payload the right solution for my problem?

Hi,

I recently read about payloads in the Apache Solr 4 Cookbook and would like
to know if this is the
right solution for my problem or if other methods are more suitable.

Generally, I need to perform fulltext search in a field (including
highlighting) where I need metadata per token in the search result, but I do
not need to search in that metadata.

I have documents containing data (not natural language), where each data
entry contains multiple metadata informations. An example with a sentence
and as XML-like structure could be
<meta attr1="val11" attr2="val2" attr3="val3">This</meta>
<meta attr1="val13" attr2="val7" attr3="val3">is</meta>
<meta attr1="val16" attr2="val22" attr3="val3">one</meta>
<meta attr1="val14" attr2="val2" attr3="val3">sentence.</meta>
Additionaly there exist some fields per document that i need for faceting
etc. (id, category, timestamp etc.)

When searching, I want to search only in "This is one sentence.", a search
for "attr1" or "val3" should give no results. However, when searching for
"one" in the search response I need to know attr1="val16" attr2="val22" and
attr3="val3".

My first intuition when creating the schema was to create a multiValue field
"content" containing each word in the document. Then I add attr1, attr2 and
attr3 as payload to each word/token.
Is this the right way to use payloads? Or is there a better solution for
such a task?
I imagine this to be a common use case: searching in a "cleaned" version of
the data and returning the original one.

Could anyone please provide suggestions on how to tackle such a task? The
book and the Solr wiki pages
did not lead me to anything that I could immediately identify as a solution
to my problem.

If the proposed solution depends on the data: each document might have 3-8
additional attributes, and there might be between 100-10000 tokens per
document. 

Regards



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Is payload the right solution for my problem?

Posted by jasimop <st...@gmail.com>.

I did some experiments but I think I will end up with the doubled disk space.

The Problem is the following: I will search in the fulltext (without the xml
content), but I need to know the 
position of the search result in the fulltext (to display) and in the XML
data (to get the attributes associated
with the result term).
I tried to solve this by using highlighting, and as my experiments show, to
use highlighting on both fields
they have to be indexed and stored, thus I am ending up with nearly the
doubled disk space as my original data.

Does solr provide any other options for such a problem?



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814p4064482.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Is payload the right solution for my problem?

Posted by "Petersen, Robert" <ro...@mail.rakuten.com>.

Hi

It will not be double the disk space at all.  You will not need to store the field you search, only the field being returned needs to be stored.  Furthermore if you are not searching the XML field you will not need to index that field, only store it.

Hope that helps,
Robi

-----Original Message-----
From: jasimop [mailto:stricker.ma@gmail.com] 
Sent: Friday, May 17, 2013 12:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Is payload the right solution for my problem?

I think I just found the solution.

Would the right strategy be to store the original XML content and then use a solr.HTMLStripCharFilterFactory when querying? I just made a quick test and it work, the only problem now is that it also finds the data contained in the XML attribute fields.

I think I will put my data into two fields, one containing only the raw data without XML, and one in the original format. Then I search in the raw field but return the original format with the response.
The only problem I see here is that I need the double amount of diskspace.
Is there a better solution?



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814p4064117.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is payload the right solution for my problem?

Posted by jasimop <st...@gmail.com>.

I think I just found the solution.

Would the right strategy be to store the original XML content and then use a
solr.HTMLStripCharFilterFactory when querying? I just made a quick test and
it work,
the only problem now is that it also finds the data contained in the XML
attribute fields.

I think I will put my data into two fields, one containing only the raw data
without XML, and one 
in the original format. Then I search in the raw field but return the
original format with the response.
The only problem I see here is that I need the double amount of diskspace.
Is there a better solution?



--
View this message in context: http://lucene.472066.n3.nabble.com/Is-payload-the-right-solution-for-my-problem-tp4063814p4064117.html
Sent from the Solr - User mailing list archive at Nabble.com.