You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by liv <li...@hotmail.com> on 2006/12/18 16:07:03 UTC

Re: subcollections IT WORKS

I have no ideea why this hapened - probably due to luke, because of it not
re-reading the indexes? very strange!

Anyway, it works as it should - after a reindex the subcollection field is
populated with latest data.

Please excuse my insistence and my clumsiness, and thanks for your answers.



liv wrote:
> 
> Unfortunately my java knowledge is too poor to debug this one. However I
> doubt that the file "subcollections.xml" from inside the nutch-xxx.job is
> used. This because the file nutchxxx.job is old enough - has the date
> since the day I made he nutch installation.
> 
> 
> Sami Siren-2 wrote:
>> 
>> liv wrote:
>>> - I reindex the db: delete folder "indexes", run the command:
>>> 
>>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
>>> crawl/segments/*
>>> 
>>> - then I inspect the resulting db with luke again
>>> 
>>> Unfortunately nothing has changed. Maybe I am missing something...
>>> Please
>>> tell me if you see anything wrong.
>> 
>> If you did exactly those steps then what happens is that the
>> subcollections.xml is read from inside the .job file. You need to
>> rebuild the .job to put new file inside of it.
>> 
>> simply do "ant" and rerun indexing and it should work as expected.
>> 
>> --
>>  Sami Siren
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/subcollections-tf2821188.html#a7930248
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: subcollections IT DOESN'T WORK!

Posted by liv <li...@hotmail.com>.
look here:
http://issues.apache.org/jira/browse/NUTCH-201?page=all

unfortunately it doesn't work as expected... yet


kauu wrote:
> 
> hi ,  i'm new to nutch ,i want to know what's the useness of the
> subcollection plugin?
> where is the introduction?
> 
-- 
View this message in context: http://www.nabble.com/subcollections-tf2821188.html#a7946767
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: subcollections IT DOESN'T WORK!

Posted by kauu <ba...@gmail.com>.
hi ,  i'm new to nutch ,i want to know what's the useness of the
subcollection plugin?
where is the introduction?


On 12/19/06, liv <li...@hotmail.com> wrote:
>
>
> I may be loosing all and every credit ... it's still in the same state -
> reindex doesn't change the subcollection field!
>
> I did a REFETCH by mistake (before reindex), and I was happy to notice
> that
> subcollections were changed - but I assumed it happened only due to
> reindex.
>
> However I am looking for REINDEX only - and subcollection field looks that
> it doesn't change (on corresponding changes on subcollection.xml file).
>
> Any help in debugging would be greatly appreciated... however I'm not
> acquinted to java to pursue this by myself.
>
> thanks
>
>
> liv wrote:
> >
> > I have no ideea why this hapened - probably due to luke, because of it
> not
> > re-reading the indexes? very strange!
> >
> > Anyway, it works as it should - after a reindex the subcollection field
> is
> > populated with latest data.
> >
> > Please excuse my insistence and my clumsiness, and thanks for your
> > answers.
> >
> >
> >
> > liv wrote:
> >>
> >> Unfortunately my java knowledge is too poor to debug this one. However
> I
> >> doubt that the file "subcollections.xml" from inside the nutch-xxx.jobis
> >> used. This because the file nutchxxx.job is old enough - has the date
> >> since the day I made he nutch installation.
> >>
> >>
> >> Sami Siren-2 wrote:
> >>>
> >>> liv wrote:
> >>>> - I reindex the db: delete folder "indexes", run the command:
> >>>>
> >>>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
> >>>> crawl/segments/*
> >>>>
> >>>> - then I inspect the resulting db with luke again
> >>>>
> >>>> Unfortunately nothing has changed. Maybe I am missing something...
> >>>> Please
> >>>> tell me if you see anything wrong.
> >>>
> >>> If you did exactly those steps then what happens is that the
> >>> subcollections.xml is read from inside the .job file. You need to
> >>> rebuild the .job to put new file inside of it.
> >>>
> >>> simply do "ant" and rerun indexing and it should work as expected.
> >>>
> >>> --
> >>>  Sami Siren
> >>>
> >>>
> >>>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/subcollections-tf2821188.html#a7935139
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
www.babatu.com

Re: subcollections IT DOESN'T WORK!

Posted by liv <li...@hotmail.com>.
I may be loosing all and every credit ... it's still in the same state -
reindex doesn't change the subcollection field! 

I did a REFETCH by mistake (before reindex), and I was happy to notice that
subcollections were changed - but I assumed it happened only due to reindex.

However I am looking for REINDEX only - and subcollection field looks that
it doesn't change (on corresponding changes on subcollection.xml file).

Any help in debugging would be greatly appreciated... however I'm not
acquinted to java to pursue this by myself.

thanks


liv wrote:
> 
> I have no ideea why this hapened - probably due to luke, because of it not
> re-reading the indexes? very strange!
> 
> Anyway, it works as it should - after a reindex the subcollection field is
> populated with latest data.
> 
> Please excuse my insistence and my clumsiness, and thanks for your
> answers.
> 
> 
> 
> liv wrote:
>> 
>> Unfortunately my java knowledge is too poor to debug this one. However I
>> doubt that the file "subcollections.xml" from inside the nutch-xxx.job is
>> used. This because the file nutchxxx.job is old enough - has the date
>> since the day I made he nutch installation.
>> 
>> 
>> Sami Siren-2 wrote:
>>> 
>>> liv wrote:
>>>> - I reindex the db: delete folder "indexes", run the command:
>>>> 
>>>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
>>>> crawl/segments/*
>>>> 
>>>> - then I inspect the resulting db with luke again
>>>> 
>>>> Unfortunately nothing has changed. Maybe I am missing something...
>>>> Please
>>>> tell me if you see anything wrong.
>>> 
>>> If you did exactly those steps then what happens is that the
>>> subcollections.xml is read from inside the .job file. You need to
>>> rebuild the .job to put new file inside of it.
>>> 
>>> simply do "ant" and rerun indexing and it should work as expected.
>>> 
>>> --
>>>  Sami Siren
>>> 
>>> 
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/subcollections-tf2821188.html#a7935139
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: subcollections IT WORKS

Posted by WebDev Freak <we...@gmail.com>.
Were you able by any chance figure out how to search on multiple
subcollections?  For example let's say you have the following
subcollections: books, magazines, cd, dvd, software.  I would like to have a
search page with checkboxes to pick any of the subcollections. For example
search on books, magazines and cd's or just search in dvd and software.
Etc.  Does anybody know how to do this? Thanks.


On 12/18/06, liv <li...@hotmail.com> wrote:
>
>
> I have no ideea why this hapened - probably due to luke, because of it not
> re-reading the indexes? very strange!
>
> Anyway, it works as it should - after a reindex the subcollection field is
> populated with latest data.
>
> Please excuse my insistence and my clumsiness, and thanks for your
> answers.
>
>
>
> liv wrote:
> >
> > Unfortunately my java knowledge is too poor to debug this one. However I
> > doubt that the file "subcollections.xml" from inside the nutch-xxx.jobis
> > used. This because the file nutchxxx.job is old enough - has the date
> > since the day I made he nutch installation.
> >
> >
> > Sami Siren-2 wrote:
> >>
> >> liv wrote:
> >>> - I reindex the db: delete folder "indexes", run the command:
> >>>
> >>> bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb
> >>> crawl/segments/*
> >>>
> >>> - then I inspect the resulting db with luke again
> >>>
> >>> Unfortunately nothing has changed. Maybe I am missing something...
> >>> Please
> >>> tell me if you see anything wrong.
> >>
> >> If you did exactly those steps then what happens is that the
> >> subcollections.xml is read from inside the .job file. You need to
> >> rebuild the .job to put new file inside of it.
> >>
> >> simply do "ant" and rerun indexing and it should work as expected.
> >>
> >> --
> >>  Sami Siren
> >>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/subcollections-tf2821188.html#a7930248
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>