You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bai Shen <ba...@gmail.com> on 2011/09/26 15:49:36 UTC

How do I use Luke to read Nutch index?

So I used the tutorial to do some crawling with Nutch and I've done all the
way up to Step 4.  I want to look at what I've indexed so far before I
import it into Solr so I can make sure that everything is working correctly.

But no matter which directory I use, Luke tells me that there's no valid
index.  Do I need to run the solrindex command?  And is there a way to do it
without pushing it to my solr install?

Thanks.

Re: How do I use Luke to read Nutch index?

Posted by Markus Jelsma <ma...@openindex.io>.
Nutch 1.4 comes with a indexchecker tool that tells you how fields are sent to 
Solr for a given URL.

On Friday 30 September 2011 15:53:24 Bai Shen wrote:
> Ah.  I was hoping to look at the created index before I sent it over to the
> solr server.
> 
> On Fri, Sep 30, 2011 at 2:26 AM, Elisabeth Adler
> 
> <el...@gmail.com>wrote:
> > Yep, after fetching and parsing the pages, you need to tell Nutch to
> > index the data in Solr, like:
> > ./nutch solrindex http://localhost:8080/solr/ crawl/crawldb crawl/linkdb
> > crawl/segments/*
> > 
> > It's all explained in the wiki: http://wiki.apache.org/nutch/**
> > NutchTutorial <http://wiki.apache.org/nutch/NutchTutorial>
> > 
> > Best,
> > Elisabeth
> > 
> > On 27.09.2011 15:08, Bai Shen wrote:
> >> I'm using Luke 3.3 and Nutch 1.3
> >> 
> >> I didn't see any fdt files.  Are those created when you run the
> >> solrindex command?
> >> 
> >> On Mon, Sep 26, 2011 at 10:11 AM, Elisabeth
> >> Adler<elisabeth.adler@gmail.* *com <el...@gmail.com>
> >> 
> >>> wrote:
> >>  Which version of Luke and Nutch are you using? I had the same problem
> >>  
> >>> with
> >>> Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 (
> >>> http://code.google.com/p/****luke/ <http://code.google.com/p/**luke/><
> >>> http://code.google.com/**p/luke/ <http://code.google.com/p/luke/>>) and
> >>> 
> >>> it's working without problems now. Btw, you need to select the
> >>> directory "data/index" (containing .fdt and more files).
> >>> Hope this helps,
> >>> Elisabeth
> >>> 
> >>> On 26.09.2011 15:49, Bai Shen wrote:
> >>>  So I used the tutorial to do some crawling with Nutch and I've done
> >>>  all
> >>>  
> >>>> the
> >>>> way up to Step 4.  I want to look at what I've indexed so far before I
> >>>> import it into Solr so I can make sure that everything is working
> >>>> correctly.
> >>>> 
> >>>> But no matter which directory I use, Luke tells me that there's no
> >>>> valid index.  Do I need to run the solrindex command?  And is there a
> >>>> way to do
> >>>> it
> >>>> without pushing it to my solr install?
> >>>> 
> >>>> Thanks.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: How do I use Luke to read Nutch index?

Posted by Bai Shen <ba...@gmail.com>.
Ah.  I was hoping to look at the created index before I sent it over to the
solr server.

On Fri, Sep 30, 2011 at 2:26 AM, Elisabeth Adler
<el...@gmail.com>wrote:

> Yep, after fetching and parsing the pages, you need to tell Nutch to index
> the data in Solr, like:
> ./nutch solrindex http://localhost:8080/solr/ crawl/crawldb crawl/linkdb
> crawl/segments/*
>
> It's all explained in the wiki: http://wiki.apache.org/nutch/**
> NutchTutorial <http://wiki.apache.org/nutch/NutchTutorial>
>
> Best,
> Elisabeth
>
>
> On 27.09.2011 15:08, Bai Shen wrote:
>
>> I'm using Luke 3.3 and Nutch 1.3
>>
>> I didn't see any fdt files.  Are those created when you run the solrindex
>> command?
>>
>> On Mon, Sep 26, 2011 at 10:11 AM, Elisabeth Adler<elisabeth.adler@gmail.*
>> *com <el...@gmail.com>
>>
>>> wrote:
>>>
>>
>>  Which version of Luke and Nutch are you using? I had the same problem
>>> with
>>> Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 (
>>> http://code.google.com/p/****luke/ <http://code.google.com/p/**luke/><
>>> http://code.google.com/**p/luke/ <http://code.google.com/p/luke/>>) and
>>>
>>> it's working without problems now. Btw, you need to select the directory
>>> "data/index" (containing .fdt and more files).
>>> Hope this helps,
>>> Elisabeth
>>>
>>>
>>> On 26.09.2011 15:49, Bai Shen wrote:
>>>
>>>  So I used the tutorial to do some crawling with Nutch and I've done all
>>>> the
>>>> way up to Step 4.  I want to look at what I've indexed so far before I
>>>> import it into Solr so I can make sure that everything is working
>>>> correctly.
>>>>
>>>> But no matter which directory I use, Luke tells me that there's no valid
>>>> index.  Do I need to run the solrindex command?  And is there a way to
>>>> do
>>>> it
>>>> without pushing it to my solr install?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>

Re: How do I use Luke to read Nutch index?

Posted by Elisabeth Adler <el...@gmail.com>.
Yep, after fetching and parsing the pages, you need to tell Nutch to 
index the data in Solr, like:
./nutch solrindex http://localhost:8080/solr/ crawl/crawldb crawl/linkdb 
crawl/segments/*

It's all explained in the wiki: http://wiki.apache.org/nutch/NutchTutorial

Best,
Elisabeth

On 27.09.2011 15:08, Bai Shen wrote:
> I'm using Luke 3.3 and Nutch 1.3
>
> I didn't see any fdt files.  Are those created when you run the solrindex
> command?
>
> On Mon, Sep 26, 2011 at 10:11 AM, Elisabeth Adler<elisabeth.adler@gmail.com
>> wrote:
>
>> Which version of Luke and Nutch are you using? I had the same problem with
>> Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 (
>> http://code.google.com/p/**luke/<http://code.google.com/p/luke/>) and
>> it's working without problems now. Btw, you need to select the directory
>> "data/index" (containing .fdt and more files).
>> Hope this helps,
>> Elisabeth
>>
>>
>> On 26.09.2011 15:49, Bai Shen wrote:
>>
>>> So I used the tutorial to do some crawling with Nutch and I've done all
>>> the
>>> way up to Step 4.  I want to look at what I've indexed so far before I
>>> import it into Solr so I can make sure that everything is working
>>> correctly.
>>>
>>> But no matter which directory I use, Luke tells me that there's no valid
>>> index.  Do I need to run the solrindex command?  And is there a way to do
>>> it
>>> without pushing it to my solr install?
>>>
>>> Thanks.
>>>
>>>
>

Re: How do I use Luke to read Nutch index?

Posted by Bai Shen <ba...@gmail.com>.
I'm using Luke 3.3 and Nutch 1.3

I didn't see any fdt files.  Are those created when you run the solrindex
command?

On Mon, Sep 26, 2011 at 10:11 AM, Elisabeth Adler <elisabeth.adler@gmail.com
> wrote:

> Which version of Luke and Nutch are you using? I had the same problem with
> Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 (
> http://code.google.com/p/**luke/ <http://code.google.com/p/luke/>) and
> it's working without problems now. Btw, you need to select the directory
> "data/index" (containing .fdt and more files).
> Hope this helps,
> Elisabeth
>
>
> On 26.09.2011 15:49, Bai Shen wrote:
>
>> So I used the tutorial to do some crawling with Nutch and I've done all
>> the
>> way up to Step 4.  I want to look at what I've indexed so far before I
>> import it into Solr so I can make sure that everything is working
>> correctly.
>>
>> But no matter which directory I use, Luke tells me that there's no valid
>> index.  Do I need to run the solrindex command?  And is there a way to do
>> it
>> without pushing it to my solr install?
>>
>> Thanks.
>>
>>

Re: How do I use Luke to read Nutch index?

Posted by Elisabeth Adler <el...@gmail.com>.
Which version of Luke and Nutch are you using? I had the same problem 
with Luke 0.9 and Nutch 1.3 indices - I upgraded Luke to 3.3 
(http://code.google.com/p/luke/) and it's working without problems now. 
Btw, you need to select the directory "data/index" (containing .fdt and 
more files).
Hope this helps,
Elisabeth

On 26.09.2011 15:49, Bai Shen wrote:
> So I used the tutorial to do some crawling with Nutch and I've done all the
> way up to Step 4.  I want to look at what I've indexed so far before I
> import it into Solr so I can make sure that everything is working correctly.
>
> But no matter which directory I use, Luke tells me that there's no valid
> index.  Do I need to run the solrindex command?  And is there a way to do it
> without pushing it to my solr install?
>
> Thanks.
>