You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dan Frankowski <df...@cs.umn.edu> on 2006/01/05 00:04:33 UTC

Search result snippets?

Folks,

I'm a Lucene newbie, and I've been searching awhile today to answer this 
question. Googled, read Lucene FAQ, looked at Javadoc for Document and 
Hits, etc.

How would you implement "snippets" with Lucene? A snippet is some text 
around parts of the text that caused a match, with the search matches in 
bold. For example, the supporting text in Google searches. A snippet is 
NOT a document summary, which has a FAQ entry and someone answered about 
on this email list.

I can store the full text in Lucene, but then when I get the match back, 
it seems as if I'd have to reimplement the query matcher if I wanted to 
know which things matched.

Thanks in advance for any help.

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Dan Frankowski <df...@cs.umn.edu>.
Koji Sekiguchi wrote:

>Hi Dan,
>
>I've experienced same error you are facing. Check out:
>
>http://www.gossamer-threads.com/lists/lucene/java-user/28554?search_string=T
>ermVectorOffsetInfo;#28554
>
>Hope this helps.
>
>Koji
>  
>


Thanks! That helped! I put all this in the FAQ entry: 
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-75566820ee94a425c7e2950ac61d24e405fbd914

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Search result snippets?

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.
Hi Dan,

I've experienced same error you are facing. Check out:

http://www.gossamer-threads.com/lists/lucene/java-user/28554?search_string=T
ermVectorOffsetInfo;#28554

Hope this helps.

Koji


> -----Original Message-----
> From: Dan Frankowski [mailto:dfrankow@cs.umn.edu]
> Sent: Saturday, January 07, 2006 8:00 AM
> To: java-user@lucene.apache.org
> Subject: Re: Search result snippets?
>
>
> Chris,
>
> Thanks, I have signed up for the wiki and will put in the answer when I
> fully understand it.
>
> Otis,
>
> So, I downloaded the Highlighter code with
>
> svn checkout
> http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highligh
> ter/src/java/org/apache/lucene/search/highlight/
>
> Then I tried to compile it with lucene.jar, version 1.4.3. I get
> compile-time errors (attached at the bottom of this message) for missing
> classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo.
> For example,
> % jar tvf lib/lucene.jar | grep TermVectorOffset
> <nothing ...>
> % jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar  META-INF/MANIFEST.MF
> inflated: META-INF/MANIFEST.MF
> dfrankow@gibson (~/windows/tmp) % cat META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.6.1
> Created-By: Apache Jakarta
>
> Name: org/apache/lucene
> Specification-Title: Lucene Search Engine
> Specification-Version: 1.4.3
> Specification-Vendor: Lucene
> Implementation-Title: org.apache.lucene
> Implementation-Version: build 2004-11-29 15:13:05
> Implementation-Vemdpr: Lucene
>
> I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and
> 1.2, and could not find it.
>
> Someone asked about this on March 11
> (http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.
> mbox/%3C20050311062700.46656.qmail@web31110.mail.mud.yahoo.com%3E).
> You said to check out the trunk of Lucene, but I have to work with
> Lucene 1.4.3 in an existing project. I can't work with a version from SVN.
>
> Suggestions? Is there a version of Highlighter that works with 1.4.3? If
> it compiles for you, where do you get TermVectorOffsetInfo, and is that
> in 1.4.3?
>
> Thanks in advance.
>
> Dan
>
> Chris Hostetter wrote:
>
> >: Thanks for this tip! This should be in the FAQ. Is there a way
> to get it
> >: in there? I can't edit the wiki, I think.
> >
> >1) People who create a Wiki account can edit the FAQ.
> >
> >2) the FAQ already includes a refrence to Highlighter, I think it's the
> >FAQ entry you mentioned...
> >
> >http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7
755f4ae639ee8eaf7de04f5c27f5
> >
> >   Is there a way to get a text summary of an indexed document
> with Lucene?
> >
> >   You could store the documents summary in the index and then use the
> >   Highlighter from the sandbox.
> >
> >...feel free to add to the FAQ whatever information you think would have
> >made the answer more useful to you when you first found it.
> >
> >
> >: >How would you implement "snippets" with Lucene? A snippet is some text
> >: >around parts of the text that caused a match, with the search
> matches in
> >: >bold. For example, the supporting text in Google searches. A
> snippet is
> >: >NOT a document summary, which has a FAQ entry and someone
> answered about
> >: >on this email list.
> >
> >
> >-Hoss
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ====
>
>     [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19:
> cannot find symbol
>     [javac] symbol  : class TermVectorOffsetInfo
>     [javac] location: package org.apache.lucene.index
>     [javac] import org.apache.lucene.index.TermVectorOffsetInfo;
>     [javac]                                ^
>     [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> cannot find symbol
>     [javac] symbol  : class TermVectorOffsetInfo
>     [javac] location: class
> org.apache.lucene.search.highlight.TokenSources
>     [javac]             TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
>     [javac]             ^
>     [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> cannot find symbol
>     [javac] symbol  : method getOffsets(int)
>     [javac] location: interface org.apache.lucene.index.TermPositionVector
>     [javac]             TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
>     [javac]                                               ^
>     [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145:
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> org.apache.lucene.analysis.Token to ()
>     [javac]                     unsortedTokens.add(new Token(terms[t],
>     [javac]                                        ^
>     [javac]
> .../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160:
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> org.apache.lucene.analysis.Token to ()
>     [javac]                     tokensInOriginalOrder[pos[tp]]=new
> Token(terms[t],
>     [javac]                                                    ^
>     [javac] 5 errors
>
> BUILD FAILED
> .../JSPWiki/build.xml:216: Compile failed; see the compiler error output
> for details.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Dan,

Grab the highlighter Jar that comes with Lucene in Action code -
http://www.lucenebook.com/ .  That will work with Lucene 1.4.3.

Otis

--- Dan Frankowski <df...@cs.umn.edu> wrote:

> Chris,
> 
> Thanks, I have signed up for the wiki and will put in the answer when
> I 
> fully understand it.
> 
> Otis,
> 
> So, I downloaded the Highlighter code with
> 
> svn checkout 
>
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/
> 
> Then I tried to compile it with lucene.jar, version 1.4.3. I get 
> compile-time errors (attached at the bottom of this message) for
> missing 
> classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo.
> 
> For example,
> % jar tvf lib/lucene.jar | grep TermVectorOffset
> <nothing ...>
> % jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar 
> META-INF/MANIFEST.MF
> inflated: META-INF/MANIFEST.MF
> dfrankow@gibson (~/windows/tmp) % cat META-INF/MANIFEST.MF
> Manifest-Version: 1.0
> Ant-Version: Apache Ant 1.6.1
> Created-By: Apache Jakarta
> 
> Name: org/apache/lucene
> Specification-Title: Lucene Search Engine
> Specification-Version: 1.4.3
> Specification-Vendor: Lucene
> Implementation-Title: org.apache.lucene
> Implementation-Version: build 2004-11-29 15:13:05
> Implementation-Vemdpr: Lucene
> 
> I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and
> 
> 1.2, and could not find it.
> 
> Someone asked about this on March 11 
>
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3C20050311062700.46656.qmail@web31110.mail.mud.yahoo.com%3E).
> 
> You said to check out the trunk of Lucene, but I have to work with 
> Lucene 1.4.3 in an existing project. I can't work with a version from
> SVN.
> 
> Suggestions? Is there a version of Highlighter that works with 1.4.3?
> If 
> it compiles for you, where do you get TermVectorOffsetInfo, and is
> that 
> in 1.4.3?
> 
> Thanks in advance.
> 
> Dan
> 
> Chris Hostetter wrote:
> 
> >: Thanks for this tip! This should be in the FAQ. Is there a way to
> get it
> >: in there? I can't edit the wiki, I think.
> >
> >1) People who create a Wiki account can edit the FAQ.
> >
> >2) the FAQ already includes a refrence to Highlighter, I think it's
> the
> >FAQ entry you mentioned...
> >
>
>http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5
> >
> >   Is there a way to get a text summary of an indexed document with
> Lucene?
> >
> >   You could store the documents summary in the index and then use
> the
> >   Highlighter from the sandbox.
> >
> >...feel free to add to the FAQ whatever information you think would
> have
> >made the answer more useful to you when you first found it.
> >
> >
> >: >How would you implement "snippets" with Lucene? A snippet is some
> text
> >: >around parts of the text that caused a match, with the search
> matches in
> >: >bold. For example, the supporting text in Google searches. A
> snippet is
> >: >NOT a document summary, which has a FAQ entry and someone
> answered about
> >: >on this email list.
> >
> >
> >-Hoss
> >
> >
>
>---------------------------------------------------------------------
> >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >For additional commands, e-mail: java-user-help@lucene.apache.org
> >  
> >
> 
> ====
> 
>     [javac] 
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19:
> 
> cannot find symbol
>     [javac] symbol  : class TermVectorOffsetInfo
>     [javac] location: package org.apache.lucene.index
>     [javac] import org.apache.lucene.index.TermVectorOffsetInfo;
>     [javac]                                ^
>     [javac] 
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> 
> cannot find symbol
>     [javac] symbol  : class TermVectorOffsetInfo
>     [javac] location: class
> org.apache.lucene.search.highlight.TokenSources
>     [javac]             TermVectorOffsetInfo[]
> offsets=tpv.getOffsets(t);
>     [javac]             ^
>     [javac] 
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124:
> 
> cannot find symbol
>     [javac] symbol  : method getOffsets(int)
>     [javac] location: interface
> org.apache.lucene.index.TermPositionVector
>     [javac]             TermVectorOffsetInfo[]
> offsets=tpv.getOffsets(t);
>     [javac]                                               ^
>     [javac] 
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145:
> 
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> 
> org.apache.lucene.analysis.Token to ()
>     [javac]                     unsortedTokens.add(new
> Token(terms[t],
>     [javac]                                        ^
>     [javac] 
>
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160:
> 
> internal error; cannot instantiate Token(java.lang.String,int,int) at
> 
> org.apache.lucene.analysis.Token to ()
>     [javac]                     tokensInOriginalOrder[pos[tp]]=new 
> Token(terms[t],
>     [javac]                                                    ^
>     [javac] 5 errors
> 
> BUILD FAILED
> .../JSPWiki/build.xml:216: Compile failed; see the compiler error
> output 
> for details.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Dan Frankowski <df...@cs.umn.edu>.
Chris,

Thanks, I have signed up for the wiki and will put in the answer when I 
fully understand it.

Otis,

So, I downloaded the Highlighter code with

svn checkout 
http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/highlighter/src/java/org/apache/lucene/search/highlight/

Then I tried to compile it with lucene.jar, version 1.4.3. I get 
compile-time errors (attached at the bottom of this message) for missing 
classes. One of them is org.apache.lucene.index.TermVectorOffsetInfo. 
For example,
% jar tvf lib/lucene.jar | grep TermVectorOffset
<nothing ...>
% jar xvf ../work/jspwiki/JSPWiki/lib/lucene.jar  META-INF/MANIFEST.MF
inflated: META-INF/MANIFEST.MF
dfrankow@gibson (~/windows/tmp) % cat META-INF/MANIFEST.MF
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.6.1
Created-By: Apache Jakarta

Name: org/apache/lucene
Specification-Title: Lucene Search Engine
Specification-Version: 1.4.3
Specification-Vendor: Lucene
Implementation-Title: org.apache.lucene
Implementation-Version: build 2004-11-29 15:13:05
Implementation-Vemdpr: Lucene

I looked for TermVectorOffsetInfo in jar version 1.4.3, 1.4, 1.3, and 
1.2, and could not find it.

Someone asked about this on March 11 
(http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3C20050311062700.46656.qmail@web31110.mail.mud.yahoo.com%3E). 
You said to check out the trunk of Lucene, but I have to work with 
Lucene 1.4.3 in an existing project. I can't work with a version from SVN.

Suggestions? Is there a version of Highlighter that works with 1.4.3? If 
it compiles for you, where do you get TermVectorOffsetInfo, and is that 
in 1.4.3?

Thanks in advance.

Dan

Chris Hostetter wrote:

>: Thanks for this tip! This should be in the FAQ. Is there a way to get it
>: in there? I can't edit the wiki, I think.
>
>1) People who create a Wiki account can edit the FAQ.
>
>2) the FAQ already includes a refrence to Highlighter, I think it's the
>FAQ entry you mentioned...
>
>http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5
>
>   Is there a way to get a text summary of an indexed document with Lucene?
>
>   You could store the documents summary in the index and then use the
>   Highlighter from the sandbox.
>
>...feel free to add to the FAQ whatever information you think would have
>made the answer more useful to you when you first found it.
>
>
>: >How would you implement "snippets" with Lucene? A snippet is some text
>: >around parts of the text that caused a match, with the search matches in
>: >bold. For example, the supporting text in Google searches. A snippet is
>: >NOT a document summary, which has a FAQ entry and someone answered about
>: >on this email list.
>
>
>-Hoss
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>  
>

====

    [javac] 
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:19: 
cannot find symbol
    [javac] symbol  : class TermVectorOffsetInfo
    [javac] location: package org.apache.lucene.index
    [javac] import org.apache.lucene.index.TermVectorOffsetInfo;
    [javac]                                ^
    [javac] 
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124: 
cannot find symbol
    [javac] symbol  : class TermVectorOffsetInfo
    [javac] location: class org.apache.lucene.search.highlight.TokenSources
    [javac]             TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
    [javac]             ^
    [javac] 
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:124: 
cannot find symbol
    [javac] symbol  : method getOffsets(int)
    [javac] location: interface org.apache.lucene.index.TermPositionVector
    [javac]             TermVectorOffsetInfo[] offsets=tpv.getOffsets(t);
    [javac]                                               ^
    [javac] 
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:145: 
internal error; cannot instantiate Token(java.lang.String,int,int) at 
org.apache.lucene.analysis.Token to ()
    [javac]                     unsortedTokens.add(new Token(terms[t],
    [javac]                                        ^
    [javac] 
.../JSPWiki/src/org/apache/lucene/search/highlight/TokenSources.java:160: 
internal error; cannot instantiate Token(java.lang.String,int,int) at 
org.apache.lucene.analysis.Token to ()
    [javac]                     tokensInOriginalOrder[pos[tp]]=new 
Token(terms[t],
    [javac]                                                    ^
    [javac] 5 errors

BUILD FAILED
.../JSPWiki/build.xml:216: Compile failed; see the compiler error output 
for details.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Chris Hostetter <ho...@fucit.org>.
: Thanks for this tip! This should be in the FAQ. Is there a way to get it
: in there? I can't edit the wiki, I think.

1) People who create a Wiki account can edit the FAQ.

2) the FAQ already includes a refrence to Highlighter, I think it's the
FAQ entry you mentioned...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0c15966e2ee7755f4ae639ee8eaf7de04f5c27f5

   Is there a way to get a text summary of an indexed document with Lucene?

   You could store the documents summary in the index and then use the
   Highlighter from the sandbox.

...feel free to add to the FAQ whatever information you think would have
made the answer more useful to you when you first found it.


: >How would you implement "snippets" with Lucene? A snippet is some text
: >around parts of the text that caused a match, with the search matches in
: >bold. For example, the supporting text in Google searches. A snippet is
: >NOT a document summary, which has a FAQ entry and someone answered about
: >on this email list.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Dan Frankowski <df...@cs.umn.edu>.
Thanks for this tip! This should be in the FAQ. Is there a way to get it 
in there? I can't edit the wiki, I think.

Dan

Otis Gospodnetic wrote:

>Look for the Highlighter in contrib/ to get this effect:
>
>http://www.lucenebook.com/search?query=highlighter+fragment
>
>Otis
>
>----- Original Message ----
>From: Dan Frankowski <df...@cs.umn.edu>
>To: java-user@lucene.apache.org
>Sent: Wed Jan  4 18:04:33 2006
>Subject: Search result snippets?
>
>Folks,
>
>I'm a Lucene newbie, and I've been searching awhile today to answer this 
>question. Googled, read Lucene FAQ, looked at Javadoc for Document and 
>Hits, etc.
>
>How would you implement "snippets" with Lucene? A snippet is some text 
>around parts of the text that caused a match, with the search matches in 
>bold. For example, the supporting text in Google searches. A snippet is 
>NOT a document summary, which has a FAQ entry and someone answered about 
>on this email list.
>
>I can store the full text in Lucene, but then when I get the match back, 
>it seems as if I'd have to reimplement the query matcher if I wanted to 
>know which things matched.
>
>Thanks in advance for any help.
>
>Dan
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Search result snippets?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Look for the Highlighter in contrib/ to get this effect:

http://www.lucenebook.com/search?query=highlighter+fragment

Otis

----- Original Message ----
From: Dan Frankowski <df...@cs.umn.edu>
To: java-user@lucene.apache.org
Sent: Wed Jan  4 18:04:33 2006
Subject: Search result snippets?

Folks,

I'm a Lucene newbie, and I've been searching awhile today to answer this 
question. Googled, read Lucene FAQ, looked at Javadoc for Document and 
Hits, etc.

How would you implement "snippets" with Lucene? A snippet is some text 
around parts of the text that caused a match, with the search matches in 
bold. For example, the supporting text in Google searches. A snippet is 
NOT a document summary, which has a FAQ entry and someone answered about 
on this email list.

I can store the full text in Lucene, but then when I get the match back, 
it seems as if I'd have to reimplement the query matcher if I wanted to 
know which things matched.

Thanks in advance for any help.

Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org