You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vijay <vi...@gmail.com> on 2008/08/12 09:37:55 UTC

Suggestions for faster serving of queries with Nutch

Hi all,

        I was wondering if you have any suggestions for fast serving
of queries with Nutch. I am running Nutch on a machine with 3GB
memory. The total size of my crawl directory is about 800MB. I was
wondering if there is any way to allow nutch to cache its indexes
either partly or wholly in main memory for faster serving of queries.
For some queries, especially after a period of "idleness" I often find
Nutch taking 7-8 seconds to return results for the query.

     Likewise do let me know if there are ways to better utilize main
memory to speed up the indexing process.



Thanks,
Vijay

Re: Suggestions for faster serving of queries with Nutch

Posted by Dennis Kubes <ku...@apache.org>.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10088.html

Vijay wrote:
> Hi all,
> 
>         I was wondering if you have any suggestions for fast serving
> of queries with Nutch. I am running Nutch on a machine with 3GB
> memory. The total size of my crawl directory is about 800MB. I was
> wondering if there is any way to allow nutch to cache its indexes
> either partly or wholly in main memory for faster serving of queries.
> For some queries, especially after a period of "idleness" I often find
> Nutch taking 7-8 seconds to return results for the query.
> 
>      Likewise do let me know if there are ways to better utilize main
> memory to speed up the indexing process.
> 
> 
> 
> Thanks,
> Vijay

Re: Suggestions for faster serving of queries with Nutch

Posted by ianwong <yi...@hotmail.com>.
I wonder how to make query server use RAMDirectory in nutch?


Thanks

Ian


Alexander Aristov wrote:
> 
> Look for a way to put your index into RAM. You create a file system which
> works with RAM instead of hard disk and when copy your index into it.
> 
> It might significantly increas performance.
> 
> Alex
> 
> 
> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>
>> Hi all,
>>
>>        I was wondering if you have any suggestions for fast serving
>> of queries with Nutch. I am running Nutch on a machine with 3GB
>> memory. The total size of my crawl directory is about 800MB. I was
>> wondering if there is any way to allow nutch to cache its indexes
>> either partly or wholly in main memory for faster serving of queries.
>> For some queries, especially after a period of "idleness" I often find
>> Nutch taking 7-8 seconds to return results for the query.
>>
>>     Likewise do let me know if there are ways to better utilize main
>> memory to speed up the indexing process.
>>
>>
>>
>> Thanks,
>> Vijay
>>
> 
> 
> 
> -- 
> Best Regards
> Alexander Aristov
> 
> 

-- 
View this message in context: http://www.nabble.com/Suggestions-for-faster-serving-of-queries-with-Nutch-tp18939420p21322620.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Suggestions for faster serving of queries with Nutch

Posted by Winton Davies <wd...@cs.stanford.edu>.
Indexes in RAM is the way to go. Make a tmpfs.  Putting the segments 
in RAM is much harder, as they should be a lot bigger than the 
indexes. All the indexes have is DocIDs, so you can't even show the 
URL without retrieval. You could presumably create a custom segment 
that didnt store anything but title and URL, if speed was truly most 
important. Of course, you can also shard etc.

W

Re: Suggestions for faster serving of queries with Nutch

Posted by Dennis Kubes <ku...@apache.org>.
You do not need to include the segments when putting indexes in memory. 
  The distributed search makes two calls, the first for hits, the second 
for hit details of top scoring pages.

Dennis

Alexander Aristov wrote:
> Is it just the index size or including segments? You don't need segments in
> RAm, only index files.
> 
> 2008/8/12 Michael Chan <da...@gmail.com>
> 
>> Can a part of the index be loaded into RAM? For example, if the index is
>> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>>
>> Michael
>>
>> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
>> alexander.aristov@gmail.com> wrote:
>>
>>> Look for a way to put your index into RAM. You create a file system which
>>> works with RAM instead of hard disk and when copy your index into it.
>>>
>>> It might significantly increas performance.
>>>
>>> Alex
>>>
>>>
>>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>>        I was wondering if you have any suggestions for fast serving
>>>> of queries with Nutch. I am running Nutch on a machine with 3GB
>>>> memory. The total size of my crawl directory is about 800MB. I was
>>>> wondering if there is any way to allow nutch to cache its indexes
>>>> either partly or wholly in main memory for faster serving of queries.
>>>> For some queries, especially after a period of "idleness" I often find
>>>> Nutch taking 7-8 seconds to return results for the query.
>>>>
>>>>     Likewise do let me know if there are ways to better utilize main
>>>> memory to speed up the indexing process.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Alexander Aristov
>>>
> 
> 
> 

Re: Suggestions for faster serving of queries with Nutch

Posted by Alexander Aristov <al...@gmail.com>.
Is it just the index size or including segments? You don't need segments in
RAm, only index files.

2008/8/12 Michael Chan <da...@gmail.com>

> Can a part of the index be loaded into RAM? For example, if the index is
> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>
> Michael
>
> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
> > Look for a way to put your index into RAM. You create a file system which
> > works with RAM instead of hard disk and when copy your index into it.
> >
> > It might significantly increas performance.
> >
> > Alex
> >
> >
> > On 12/08/2008, Vijay <vi...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > >        I was wondering if you have any suggestions for fast serving
> > > of queries with Nutch. I am running Nutch on a machine with 3GB
> > > memory. The total size of my crawl directory is about 800MB. I was
> > > wondering if there is any way to allow nutch to cache its indexes
> > > either partly or wholly in main memory for faster serving of queries.
> > > For some queries, especially after a period of "idleness" I often find
> > > Nutch taking 7-8 seconds to return results for the query.
> > >
> > >     Likewise do let me know if there are ways to better utilize main
> > > memory to speed up the indexing process.
> > >
> > >
> > >
> > > Thanks,
> > > Vijay
> > >
> >
> >
> >
> > --
> > Best Regards
> > Alexander Aristov
> >
>



-- 
Best Regards
Alexander Aristov

Re: Suggestions for faster serving of queries with Nutch

Posted by Orion Letizi <or...@terracotta.org>.
You'll probably have better luck with Compass. The Lucene RAMDirectory has
poor locking characteristics in a clustered context.

--Orion


Kunthar wrote:
> 
> Check LuceneRamDirectory with Terracotta.
> 
> 
> 
> On Tue, Aug 12, 2008 at 3:35 PM, Michael Chan <da...@gmail.com> wrote:
>> Can a part of the index be loaded into RAM? For example, if the index is
>> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM?
>> Thanks.
>>
>> Michael
>>
>> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
>> alexander.aristov@gmail.com> wrote:
>>
>>> Look for a way to put your index into RAM. You create a file system
>>> which
>>> works with RAM instead of hard disk and when copy your index into it.
>>>
>>> It might significantly increas performance.
>>>
>>> Alex
>>>
>>>
>>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> >        I was wondering if you have any suggestions for fast serving
>>> > of queries with Nutch. I am running Nutch on a machine with 3GB
>>> > memory. The total size of my crawl directory is about 800MB. I was
>>> > wondering if there is any way to allow nutch to cache its indexes
>>> > either partly or wholly in main memory for faster serving of queries.
>>> > For some queries, especially after a period of "idleness" I often find
>>> > Nutch taking 7-8 seconds to return results for the query.
>>> >
>>> >     Likewise do let me know if there are ways to better utilize main
>>> > memory to speed up the indexing process.
>>> >
>>> >
>>> >
>>> > Thanks,
>>> > Vijay
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Alexander Aristov
>>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Suggestions-for-faster-serving-of-queries-with-Nutch-tp18939420p18957035.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Suggestions for faster serving of queries with Nutch

Posted by Kunthar <ku...@gmail.com>.
Check LuceneRamDirectory with Terracotta.



On Tue, Aug 12, 2008 at 3:35 PM, Michael Chan <da...@gmail.com> wrote:
> Can a part of the index be loaded into RAM? For example, if the index is
> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>
> Michael
>
> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
>> Look for a way to put your index into RAM. You create a file system which
>> works with RAM instead of hard disk and when copy your index into it.
>>
>> It might significantly increas performance.
>>
>> Alex
>>
>>
>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> >        I was wondering if you have any suggestions for fast serving
>> > of queries with Nutch. I am running Nutch on a machine with 3GB
>> > memory. The total size of my crawl directory is about 800MB. I was
>> > wondering if there is any way to allow nutch to cache its indexes
>> > either partly or wholly in main memory for faster serving of queries.
>> > For some queries, especially after a period of "idleness" I often find
>> > Nutch taking 7-8 seconds to return results for the query.
>> >
>> >     Likewise do let me know if there are ways to better utilize main
>> > memory to speed up the indexing process.
>> >
>> >
>> >
>> > Thanks,
>> > Vijay
>> >
>>
>>
>>
>> --
>> Best Regards
>> Alexander Aristov
>>
>

Re: Suggestions for faster serving of queries with Nutch

Posted by Michael Chan <da...@gmail.com>.
Can a part of the index be loaded into RAM? For example, if the index is
20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.

Michael

On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:

> Look for a way to put your index into RAM. You create a file system which
> works with RAM instead of hard disk and when copy your index into it.
>
> It might significantly increas performance.
>
> Alex
>
>
> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
> >
> > Hi all,
> >
> >        I was wondering if you have any suggestions for fast serving
> > of queries with Nutch. I am running Nutch on a machine with 3GB
> > memory. The total size of my crawl directory is about 800MB. I was
> > wondering if there is any way to allow nutch to cache its indexes
> > either partly or wholly in main memory for faster serving of queries.
> > For some queries, especially after a period of "idleness" I often find
> > Nutch taking 7-8 seconds to return results for the query.
> >
> >     Likewise do let me know if there are ways to better utilize main
> > memory to speed up the indexing process.
> >
> >
> >
> > Thanks,
> > Vijay
> >
>
>
>
> --
> Best Regards
> Alexander Aristov
>

Re: Suggestions for faster serving of queries with Nutch

Posted by Alexander Aristov <al...@gmail.com>.
Look for a way to put your index into RAM. You create a file system which
works with RAM instead of hard disk and when copy your index into it.

It might significantly increas performance.

Alex


On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>
> Hi all,
>
>        I was wondering if you have any suggestions for fast serving
> of queries with Nutch. I am running Nutch on a machine with 3GB
> memory. The total size of my crawl directory is about 800MB. I was
> wondering if there is any way to allow nutch to cache its indexes
> either partly or wholly in main memory for faster serving of queries.
> For some queries, especially after a period of "idleness" I often find
> Nutch taking 7-8 seconds to return results for the query.
>
>     Likewise do let me know if there are ways to better utilize main
> memory to speed up the indexing process.
>
>
>
> Thanks,
> Vijay
>



-- 
Best Regards
Alexander Aristov