You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by anton feldmann <an...@uni-bielefeld.de> on 2006/04/23 20:47:44 UTC
How to serach in sentence and dispaly the whole sentence
I intend, to make a search, to find a word or a word pair
in a sentence or a paragraph. But then the sentence should be indicated
as a whole. The question relates to the fact, that I need to extend Lucene
in such a way that this is possible. But where to I make a start, because
I have no idea, how I have to change the IndexFile, whether that
conforms with the Lucene Team.
cheers
anton feldmann
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How to serach in sentence and dispaly the whole sentence
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 26, 2006, at 6:20 PM, anton feldmann wrote:
> Are the names of a field in a document unique or can i make a field
> with the name "sentence" for each sentence in an text document?
The names of a field in a document are unique, but you can add
multiple instances of the same field name. You can retrieve the
array of them, if they are stored, as well. You do have to be
careful though - a phrase query can match across these field
instances unless you specify a positional gap between them large
enough to prevent it.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How to serach in sentence and dispaly the whole sentence
Posted by anton feldmann <an...@uni-bielefeld.de>.
Are the names of a field in a document unique or can i make a field with
the name "sentence" for each sentence in an text document?
Grant Ingersoll schrieb:
> Anton,
>
> I think there are at least a couple of ways of doing this. I assume
> you have a program that does sentence detection already, as Lucene
> does not provide this. If not, I am sure a search of the web will
> find one that has high accuracy.
> You can:
> 1. Index each sentence as a separate Document. You will need a field
> on the Document relating it back to the overall file so you can
> reconstruct it.
> 2. As you index, insert sentence/paragraph boundary markers into your
> index and then use the SpanQuery functionality. Search this mail
> archive for sentence boundary detection and Span Query (try the dev
> list too). I think there was a discussion between me, Doug and Hoss
> on how to do this.
> 3. Do search as you do now and then post process to figure out what
> sentence it came from. This will be inefficient, but I don't know
> what your requirements are that way, so it may work for you.
>
> There are probably other ways too.
>
> anton feldmann wrote:
>> I intend, to make a search, to find a word or a word pair
>> in a sentence or a paragraph. But then the sentence should be indicated
>> as a whole. The question relates to the fact, that I need to extend
>> Lucene
>> in such a way that this is possible. But where to I make a start,
>> because
>> I have no idea, how I have to change the IndexFile, whether that
>> conforms with the Lucene Team.
>>
>> cheers
>>
>> anton feldmann
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: How to serach in sentence and dispaly the whole sentence
Posted by Grant Ingersoll <gs...@syr.edu>.
Anton,
I think there are at least a couple of ways of doing this. I assume you
have a program that does sentence detection already, as Lucene does not
provide this. If not, I am sure a search of the web will find one that
has high accuracy.
You can:
1. Index each sentence as a separate Document. You will need a field on
the Document relating it back to the overall file so you can reconstruct it.
2. As you index, insert sentence/paragraph boundary markers into your
index and then use the SpanQuery functionality. Search this mail
archive for sentence boundary detection and Span Query (try the dev list
too). I think there was a discussion between me, Doug and Hoss on how
to do this.
3. Do search as you do now and then post process to figure out what
sentence it came from. This will be inefficient, but I don't know what
your requirements are that way, so it may work for you.
There are probably other ways too.
anton feldmann wrote:
> I intend, to make a search, to find a word or a word pair
> in a sentence or a paragraph. But then the sentence should be indicated
> as a whole. The question relates to the fact, that I need to extend
> Lucene
> in such a way that this is possible. But where to I make a start, because
> I have no idea, how I have to change the IndexFile, whether that
> conforms with the Lucene Team.
>
> cheers
>
> anton feldmann
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org