You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by anton feldmann <an...@uni-bielefeld.de> on 2006/04/23 20:47:44 UTC

How to serach in sentence and dispaly the whole sentence

I intend, to make a search, to find a word or a word pair
in  a sentence or a paragraph. But then the sentence should be indicated
as a whole. The question relates to the fact, that I need to extend Lucene
in such a way that this is possible. But where to I make a start, because
I have no idea, how I have to change the IndexFile, whether that 
conforms with the Lucene Team.

cheers

anton feldmann


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to serach in sentence and dispaly the whole sentence

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 26, 2006, at 6:20 PM, anton feldmann wrote:
> Are the names of a field in a document unique or can i make a field  
> with the name "sentence" for each sentence in an text document?

The names of a field in a document are unique, but you can add  
multiple instances of the same field name.  You can retrieve the  
array of them, if they are stored, as well.  You do have to be  
careful though - a phrase query can match across these field  
instances unless you specify a positional gap between them large  
enough to prevent it.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to serach in sentence and dispaly the whole sentence

Posted by anton feldmann <an...@uni-bielefeld.de>.
Are the names of a field in a document unique or can i make a field with 
the name "sentence" for each sentence in an text document?

Grant Ingersoll schrieb:
> Anton,
>
> I think there are at least a couple of ways of doing this.  I assume 
> you have a program that does sentence detection already, as Lucene 
> does not provide this.  If not, I am sure a search of the web will 
> find one that has high accuracy.
> You can:
> 1. Index each sentence as a separate Document.  You will need a field 
> on the Document relating it back to the overall file so you can 
> reconstruct it.
> 2. As you index, insert sentence/paragraph boundary markers into your 
> index and then use the SpanQuery functionality.  Search this mail 
> archive for sentence boundary detection and Span Query (try the dev 
> list too).  I think there was a discussion between me, Doug and Hoss 
> on how to do this.
> 3. Do search as you do now and then post process to figure out what 
> sentence it came from.  This will be inefficient, but I don't know 
> what your requirements are that way, so it may work for you.
>
> There are probably other ways too.
>
> anton feldmann wrote:
>> I intend, to make a search, to find a word or a word pair
>> in  a sentence or a paragraph. But then the sentence should be indicated
>> as a whole. The question relates to the fact, that I need to extend 
>> Lucene
>> in such a way that this is possible. But where to I make a start, 
>> because
>> I have no idea, how I have to change the IndexFile, whether that 
>> conforms with the Lucene Team.
>>
>> cheers
>>
>> anton feldmann
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to serach in sentence and dispaly the whole sentence

Posted by Grant Ingersoll <gs...@syr.edu>.
Anton,

I think there are at least a couple of ways of doing this.  I assume you 
have a program that does sentence detection already, as Lucene does not 
provide this.  If not, I am sure a search of the web will find one that 
has high accuracy.
You can:
1. Index each sentence as a separate Document.  You will need a field on 
the Document relating it back to the overall file so you can reconstruct it.
2. As you index, insert sentence/paragraph boundary markers into your 
index and then use the SpanQuery functionality.  Search this mail 
archive for sentence boundary detection and Span Query (try the dev list 
too).  I think there was a discussion between me, Doug and Hoss on how 
to do this.
3. Do search as you do now and then post process to figure out what 
sentence it came from.  This will be inefficient, but I don't know what 
your requirements are that way, so it may work for you.

There are probably other ways too.

anton feldmann wrote:
> I intend, to make a search, to find a word or a word pair
> in  a sentence or a paragraph. But then the sentence should be indicated
> as a whole. The question relates to the fact, that I need to extend 
> Lucene
> in such a way that this is possible. But where to I make a start, because
> I have no idea, how I have to change the IndexFile, whether that 
> conforms with the Lucene Team.
>
> cheers
>
> anton feldmann
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org