You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Pratyush Banerjee <pr...@gmail.com> on 2007/10/31 08:22:03 UTC

[URGENT] : Query regarding handling multiple index with nutch....

Hi Sir,

I am currently working on a cross lingual search engine that requires
building separate index for every language we are going to support. We are
using Nutch as the basis of the engine.  However i was trying to find any
help regrading handling multiple indexes in nutch.

The nutch indexer creates an index and an indexes directory. I am not sure
about which one is the actual index directory. I tried tweaking around with
the code but with little effect. Can any one tell me, why there are two
separate directories and which one is used for what ?

secondly since my index will be on the basis of language, i would try to
identify the language using the language identifier (which is working
currently). So which files need to be modified and also if someone can
provide me with some idea about how to go about it.

Am current;ly using nutch 0.9 on a fc6 machine with jdk1.6.

Anybody please help...

-- 
Pratyush Banerjee
SPO, CLIA
IIT Kharagpur

Re: [URGENT] : Query regarding handling multiple index with nutch....

Posted by Ravi Chintakunta <ra...@gmail.com>.
Hi Pratyush,

You can see my solution for searching multiple indexes with a single
instance of Nutch here.

https://issues.apache.org/jira/browse/NUTCH-480

- Ravi Chintakunta

On 10/31/07, Stacey Gammon <pe...@gmail.com> wrote:
> I'm using nutch .8 and using multiple indexes with it.  I just point the
> crawler to different folders, depending on which index I am updating.  When
> searching, I point it to the index directory, not indexes.  I merge my
> indexes (the one in index and indexes) and I store the result in the index
> folder.  I think the indexes folder stores an index for each segment
> (maybe?) but if you merge them together with the main index I think the
> index folder will have everything.  At least it seems to be working in my
> implementation.
>
> On 10/31/07, Pratyush Banerjee <pr...@gmail.com> wrote:
> >
> > Hi Sir,
> >
> > I am currently working on a cross lingual search engine that requires
> > building separate index for every language we are going to support. We are
> > using Nutch as the basis of the engine.  However i was trying to find any
> > help regrading handling multiple indexes in nutch.
> >
> > The nutch indexer creates an index and an indexes directory. I am not sure
> > about which one is the actual index directory. I tried tweaking around
> > with
> > the code but with little effect. Can any one tell me, why there are two
> > separate directories and which one is used for what ?
> >
> > secondly since my index will be on the basis of language, i would try to
> > identify the language using the language identifier (which is working
> > currently). So which files need to be modified and also if someone can
> > provide me with some idea about how to go about it.
> >
> > Am current;ly using nutch 0.9 on a fc6 machine with jdk1.6.
> >
> > Anybody please help...
> >
> > --
> > Pratyush Banerjee
> > SPO, CLIA
> > IIT Kharagpur
> >
>

looking for nutch professional

Posted by Georg Ochsner <g....@revolistic.com>.
Hello,

we are looking for a nutch professional who has very good experience and
knowledge to remotely setup and support a nutch system with optimized
configuration for individual requirements and speed. Freelancers and
companies both welcome. Please contact me personally for more details. 

Thanks!

Best regards
Georg


Re: [URGENT] : Query regarding handling multiple index with nutch....

Posted by Stacey Gammon <pe...@gmail.com>.
I'm using nutch .8 and using multiple indexes with it.  I just point the
crawler to different folders, depending on which index I am updating.  When
searching, I point it to the index directory, not indexes.  I merge my
indexes (the one in index and indexes) and I store the result in the index
folder.  I think the indexes folder stores an index for each segment
(maybe?) but if you merge them together with the main index I think the
index folder will have everything.  At least it seems to be working in my
implementation.

On 10/31/07, Pratyush Banerjee <pr...@gmail.com> wrote:
>
> Hi Sir,
>
> I am currently working on a cross lingual search engine that requires
> building separate index for every language we are going to support. We are
> using Nutch as the basis of the engine.  However i was trying to find any
> help regrading handling multiple indexes in nutch.
>
> The nutch indexer creates an index and an indexes directory. I am not sure
> about which one is the actual index directory. I tried tweaking around
> with
> the code but with little effect. Can any one tell me, why there are two
> separate directories and which one is used for what ?
>
> secondly since my index will be on the basis of language, i would try to
> identify the language using the language identifier (which is working
> currently). So which files need to be modified and also if someone can
> provide me with some idea about how to go about it.
>
> Am current;ly using nutch 0.9 on a fc6 machine with jdk1.6.
>
> Anybody please help...
>
> --
> Pratyush Banerjee
> SPO, CLIA
> IIT Kharagpur
>