You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by DerFichtl <de...@gmail.com> on 2007/09/12 22:54:13 UTC

maybe dumb question about nutch index and segments file

i have tried to open a nutch generated index with zend_search_lucene and it
argues that there is no segments file ... and it is right. i have the
segments folder with the timestamp subdirs but there is no file named
"segments".

question: where is my segments file or is it not possible to use a nutch
index with zend framework ... a few weeks ago i tried the same with lucene
(using a nutch index) and i think i can remember that i had the same problem
... it looks for a segments file.

thanks
-- 
View this message in context: http://www.nabble.com/maybe-dumb-question-about-nutch-index-and-segments-file-tf4431983.html#a12643987
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: maybe dumb question about nutch index and segments file

Posted by Martin Kuen <ma...@gmail.com>.
hi,

regarding hit summaries:
The summaries are generated at search time. This is necessary, since
different queries will generate different summaries (and different terms
will be highlighted). The parsed text is stored in the various
"segments/<timestamp>" folders. I don't know which directory it actually
picks (parse_text, maybe) to generate the summary. However, as you can see
the summaries are not stored in the lucene index.
You may have a look at the plugins: summary-basic and summary-lucene
furthermore have a look at "org.apache.nutch.searcher.NutchBean" (contains a
main-class). From there you can track down the usage of these plugins (look
out for sth. like HitDetails.getSummaries()).


indexing meta-tags:
Well, I don't know :p. However, not too long ago there was a discussion
about indexing meta-tags on this mailing-list.


Hope it helps,

Martin



Thank you for your hints. From the zend developer i get the information that
> it is another version of lucene index. the current version version of zend
> framework (1.0.1) can open indexes created with nutch 8. now i have some
> other requirements for my search application. i need a hit summary and it
> must be possible to index and retrieve meta tags.
>
> I have no idea how to get a summary in my index so that i can get it with
> Zend_Search_Lucene or Lucene.Net. Or are the summeries not stored in the
> index but generated at search time?
>
> there are some plugins but they don't store things in index?
>
>

Re: maybe dumb question about nutch index and segments file

Posted by DerFichtl <de...@gmail.com>.


Martin Kuen wrote:
> 
> 
> Nutch stores more data than lucene. The lucene index is a subset of what
> you
> call "nutch index". If you follow the nutch tutorial you'll find the
> lucene
> index in "crawl/indexes". That's the location you should try to open. In
> that directory you'll also find a file called "segments.gen". IMO that's
> what zend is complaining about. I am not familiar with the zend framework.
> 
> 

Thank you for your hints. From the zend developer i get the information that
it is another version of lucene index. the current version version of zend
framework (1.0.1) can open indexes created with nutch 8. now i have some
other requirements for my search application. i need a hit summary and it
must be possible to index and retrieve meta tags. 

I have no idea how to get a summary in my index so that i can get it with
Zend_Search_Lucene or Lucene.Net. Or are the summeries not stored in the
index but generated at search time?

there are some plugins but they don't store things in index?

Thank you
Michael Feichtinger


-- 
View this message in context: http://www.nabble.com/maybe-dumb-question-about-nutch-index-and-segments-file-tf4431983.html#a12745030
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: maybe dumb question about nutch index and segments file

Posted by Martin Kuen <ma...@gmail.com>.
hi,

Nutch stores more data than lucene. The lucene index is a subset of what you
call "nutch index". If you follow the nutch tutorial you'll find the lucene
index in "crawl/indexes". That's the location you should try to open. In
that directory you'll also find a file called "segments.gen". IMO that's
what zend is complaining about. I am not familiar with the zend framework.

Cheers,

Martin


PS: Depending on your crawl strategy you may have subfolders in
"crawl/indexes". Each of these folders is a lucene index.
PPS: You may want to try out luke to investigate your lucene index. See
http://www.getopt.org/luke/

On 9/12/07, DerFichtl <de...@gmail.com> wrote:
>
>
> i have tried to open a nutch generated index with zend_search_lucene and
> it
> argues that there is no segments file ... and it is right. i have the
> segments folder with the timestamp subdirs but there is no file named
> "segments".
>
> question: where is my segments file or is it not possible to use a nutch
> index with zend framework ... a few weeks ago i tried the same with lucene
> (using a nutch index) and i think i can remember that i had the same
> problem
> ... it looks for a segments file.
>
> thanks
> --
> View this message in context:
> http://www.nabble.com/maybe-dumb-question-about-nutch-index-and-segments-file-tf4431983.html#a12643987
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>