You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shairon <sh...@gmail.com> on 2010/01/04 21:28:09 UTC

Phrase search issue with XMLPayload? Is it the better solution?

I have a project that involves words extracted by OCR, each page has words,
each word has its geometry to blink a highlight to end user. 
I've been trying represent this document structure by xml
<document>
   <page num="1">
    <term top='111' bottom='222' right='333' left='444'>foo</term> 
    <term top='211' bottom='322' right='833' left='944'>bar</term> 
    <term top='311' bottom='422' right='733' left='144'>baz</term> 
    <term top='411' bottom='522' right='633' left='244'>qux</term> 
   </page>
   <page num="2">
	<term .... />
   </page>
   
</document>
Using the field 'fulltext_st' ,

<field name="fulltext_st">
	&lt;document &gt;
	&lt;page top='111' bottom='222' right='333' left='444' word='foo'
num='1'&gt;foo&lt;/page&gt;
	&lt;page top='211' bottom='322' right='833' left='944' word='bar'
num='1'&gt;bar&lt;/page&gt;
	&lt;page top='311' bottom='422' right='733' left='144' word='baz'
num='1'&gt;baz&lt;/page&gt;
	&lt;page top='411' bottom='522' right='633' left='244' word='qux'
num='1'&gt;qux&lt;/page&gt;
	&lt;/document&gt;
</field>
I can get all terms in my search result with them payloads.
But if I do search using phrase query I can't fetch any result.

Example:


search?q=foo
<lst name="fulltext_st">
	<int
name="/document/page[word='foo'][num='1'][top='111'][bottom='222'][right='333'][left='444']">1</int>
</lst>


search?q=foo+bar
<lst name="fulltext_st">
	<int
name="/document/page[word='foo'][num='1'][top='111'][bottom='222'][right='333'][left='444']">1</int>
	<int
name="/document/page[word='baz'][num='1'][top='211'][bottom='322'][right='833'][left='944']">1</int>
</lst>

/search?q="foo bar"

*nothing*


I was wondering if I could get your thoughts if xmlpayload supports sort of
the things(with phrase search) or is there a good solution to index a doc
with many pages and one rectangle(graphical word geometry) for each term?



thank you in advance

-- 
View this message in context: http://old.nabble.com/Phrase-search-issue-with-XMLPayload--Is-it-the-better-solution--tp27018815p27018815.html
Sent from the Solr - User mailing list archive at Nabble.com.