You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vijay <vi...@gmail.com> on 2008/08/12 09:37:55 UTC
Suggestions for faster serving of queries with Nutch
Hi all,
I was wondering if you have any suggestions for fast serving
of queries with Nutch. I am running Nutch on a machine with 3GB
memory. The total size of my crawl directory is about 800MB. I was
wondering if there is any way to allow nutch to cache its indexes
either partly or wholly in main memory for faster serving of queries.
For some queries, especially after a period of "idleness" I often find
Nutch taking 7-8 seconds to return results for the query.
Likewise do let me know if there are ways to better utilize main
memory to speed up the indexing process.
Thanks,
Vijay
Re: Suggestions for faster serving of queries with Nutch
Posted by Dennis Kubes <ku...@apache.org>.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10088.html
Vijay wrote:
> Hi all,
>
> I was wondering if you have any suggestions for fast serving
> of queries with Nutch. I am running Nutch on a machine with 3GB
> memory. The total size of my crawl directory is about 800MB. I was
> wondering if there is any way to allow nutch to cache its indexes
> either partly or wholly in main memory for faster serving of queries.
> For some queries, especially after a period of "idleness" I often find
> Nutch taking 7-8 seconds to return results for the query.
>
> Likewise do let me know if there are ways to better utilize main
> memory to speed up the indexing process.
>
>
>
> Thanks,
> Vijay
Re: Suggestions for faster serving of queries with Nutch
Posted by ianwong <yi...@hotmail.com>.
I wonder how to make query server use RAMDirectory in nutch?
Thanks
Ian
Alexander Aristov wrote:
>
> Look for a way to put your index into RAM. You create a file system which
> works with RAM instead of hard disk and when copy your index into it.
>
> It might significantly increas performance.
>
> Alex
>
>
> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I was wondering if you have any suggestions for fast serving
>> of queries with Nutch. I am running Nutch on a machine with 3GB
>> memory. The total size of my crawl directory is about 800MB. I was
>> wondering if there is any way to allow nutch to cache its indexes
>> either partly or wholly in main memory for faster serving of queries.
>> For some queries, especially after a period of "idleness" I often find
>> Nutch taking 7-8 seconds to return results for the query.
>>
>> Likewise do let me know if there are ways to better utilize main
>> memory to speed up the indexing process.
>>
>>
>>
>> Thanks,
>> Vijay
>>
>
>
>
> --
> Best Regards
> Alexander Aristov
>
>
--
View this message in context: http://www.nabble.com/Suggestions-for-faster-serving-of-queries-with-Nutch-tp18939420p21322620.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Suggestions for faster serving of queries with Nutch
Posted by Winton Davies <wd...@cs.stanford.edu>.
Indexes in RAM is the way to go. Make a tmpfs. Putting the segments
in RAM is much harder, as they should be a lot bigger than the
indexes. All the indexes have is DocIDs, so you can't even show the
URL without retrieval. You could presumably create a custom segment
that didnt store anything but title and URL, if speed was truly most
important. Of course, you can also shard etc.
W
Re: Suggestions for faster serving of queries with Nutch
Posted by Dennis Kubes <ku...@apache.org>.
You do not need to include the segments when putting indexes in memory.
The distributed search makes two calls, the first for hits, the second
for hit details of top scoring pages.
Dennis
Alexander Aristov wrote:
> Is it just the index size or including segments? You don't need segments in
> RAm, only index files.
>
> 2008/8/12 Michael Chan <da...@gmail.com>
>
>> Can a part of the index be loaded into RAM? For example, if the index is
>> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>>
>> Michael
>>
>> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
>> alexander.aristov@gmail.com> wrote:
>>
>>> Look for a way to put your index into RAM. You create a file system which
>>> works with RAM instead of hard disk and when copy your index into it.
>>>
>>> It might significantly increas performance.
>>>
>>> Alex
>>>
>>>
>>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I was wondering if you have any suggestions for fast serving
>>>> of queries with Nutch. I am running Nutch on a machine with 3GB
>>>> memory. The total size of my crawl directory is about 800MB. I was
>>>> wondering if there is any way to allow nutch to cache its indexes
>>>> either partly or wholly in main memory for faster serving of queries.
>>>> For some queries, especially after a period of "idleness" I often find
>>>> Nutch taking 7-8 seconds to return results for the query.
>>>>
>>>> Likewise do let me know if there are ways to better utilize main
>>>> memory to speed up the indexing process.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Alexander Aristov
>>>
>
>
>
Re: Suggestions for faster serving of queries with Nutch
Posted by Alexander Aristov <al...@gmail.com>.
Is it just the index size or including segments? You don't need segments in
RAm, only index files.
2008/8/12 Michael Chan <da...@gmail.com>
> Can a part of the index be loaded into RAM? For example, if the index is
> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>
> Michael
>
> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
> > Look for a way to put your index into RAM. You create a file system which
> > works with RAM instead of hard disk and when copy your index into it.
> >
> > It might significantly increas performance.
> >
> > Alex
> >
> >
> > On 12/08/2008, Vijay <vi...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > I was wondering if you have any suggestions for fast serving
> > > of queries with Nutch. I am running Nutch on a machine with 3GB
> > > memory. The total size of my crawl directory is about 800MB. I was
> > > wondering if there is any way to allow nutch to cache its indexes
> > > either partly or wholly in main memory for faster serving of queries.
> > > For some queries, especially after a period of "idleness" I often find
> > > Nutch taking 7-8 seconds to return results for the query.
> > >
> > > Likewise do let me know if there are ways to better utilize main
> > > memory to speed up the indexing process.
> > >
> > >
> > >
> > > Thanks,
> > > Vijay
> > >
> >
> >
> >
> > --
> > Best Regards
> > Alexander Aristov
> >
>
--
Best Regards
Alexander Aristov
Re: Suggestions for faster serving of queries with Nutch
Posted by Orion Letizi <or...@terracotta.org>.
You'll probably have better luck with Compass. The Lucene RAMDirectory has
poor locking characteristics in a clustered context.
--Orion
Kunthar wrote:
>
> Check LuceneRamDirectory with Terracotta.
>
>
>
> On Tue, Aug 12, 2008 at 3:35 PM, Michael Chan <da...@gmail.com> wrote:
>> Can a part of the index be loaded into RAM? For example, if the index is
>> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM?
>> Thanks.
>>
>> Michael
>>
>> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
>> alexander.aristov@gmail.com> wrote:
>>
>>> Look for a way to put your index into RAM. You create a file system
>>> which
>>> works with RAM instead of hard disk and when copy your index into it.
>>>
>>> It might significantly increas performance.
>>>
>>> Alex
>>>
>>>
>>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I was wondering if you have any suggestions for fast serving
>>> > of queries with Nutch. I am running Nutch on a machine with 3GB
>>> > memory. The total size of my crawl directory is about 800MB. I was
>>> > wondering if there is any way to allow nutch to cache its indexes
>>> > either partly or wholly in main memory for faster serving of queries.
>>> > For some queries, especially after a period of "idleness" I often find
>>> > Nutch taking 7-8 seconds to return results for the query.
>>> >
>>> > Likewise do let me know if there are ways to better utilize main
>>> > memory to speed up the indexing process.
>>> >
>>> >
>>> >
>>> > Thanks,
>>> > Vijay
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Alexander Aristov
>>>
>>
>
>
--
View this message in context: http://www.nabble.com/Suggestions-for-faster-serving-of-queries-with-Nutch-tp18939420p18957035.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Suggestions for faster serving of queries with Nutch
Posted by Kunthar <ku...@gmail.com>.
Check LuceneRamDirectory with Terracotta.
On Tue, Aug 12, 2008 at 3:35 PM, Michael Chan <da...@gmail.com> wrote:
> Can a part of the index be loaded into RAM? For example, if the index is
> 20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
>
> Michael
>
> On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
> alexander.aristov@gmail.com> wrote:
>
>> Look for a way to put your index into RAM. You create a file system which
>> works with RAM instead of hard disk and when copy your index into it.
>>
>> It might significantly increas performance.
>>
>> Alex
>>
>>
>> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > I was wondering if you have any suggestions for fast serving
>> > of queries with Nutch. I am running Nutch on a machine with 3GB
>> > memory. The total size of my crawl directory is about 800MB. I was
>> > wondering if there is any way to allow nutch to cache its indexes
>> > either partly or wholly in main memory for faster serving of queries.
>> > For some queries, especially after a period of "idleness" I often find
>> > Nutch taking 7-8 seconds to return results for the query.
>> >
>> > Likewise do let me know if there are ways to better utilize main
>> > memory to speed up the indexing process.
>> >
>> >
>> >
>> > Thanks,
>> > Vijay
>> >
>>
>>
>>
>> --
>> Best Regards
>> Alexander Aristov
>>
>
Re: Suggestions for faster serving of queries with Nutch
Posted by Michael Chan <da...@gmail.com>.
Can a part of the index be loaded into RAM? For example, if the index is
20gb and I only have 8gb RAM, can I load 7gb of the index into RAM? Thanks.
Michael
On Tue, Aug 12, 2008 at 9:30 AM, Alexander Aristov <
alexander.aristov@gmail.com> wrote:
> Look for a way to put your index into RAM. You create a file system which
> works with RAM instead of hard disk and when copy your index into it.
>
> It might significantly increas performance.
>
> Alex
>
>
> On 12/08/2008, Vijay <vi...@gmail.com> wrote:
> >
> > Hi all,
> >
> > I was wondering if you have any suggestions for fast serving
> > of queries with Nutch. I am running Nutch on a machine with 3GB
> > memory. The total size of my crawl directory is about 800MB. I was
> > wondering if there is any way to allow nutch to cache its indexes
> > either partly or wholly in main memory for faster serving of queries.
> > For some queries, especially after a period of "idleness" I often find
> > Nutch taking 7-8 seconds to return results for the query.
> >
> > Likewise do let me know if there are ways to better utilize main
> > memory to speed up the indexing process.
> >
> >
> >
> > Thanks,
> > Vijay
> >
>
>
>
> --
> Best Regards
> Alexander Aristov
>
Re: Suggestions for faster serving of queries with Nutch
Posted by Alexander Aristov <al...@gmail.com>.
Look for a way to put your index into RAM. You create a file system which
works with RAM instead of hard disk and when copy your index into it.
It might significantly increas performance.
Alex
On 12/08/2008, Vijay <vi...@gmail.com> wrote:
>
> Hi all,
>
> I was wondering if you have any suggestions for fast serving
> of queries with Nutch. I am running Nutch on a machine with 3GB
> memory. The total size of my crawl directory is about 800MB. I was
> wondering if there is any way to allow nutch to cache its indexes
> either partly or wholly in main memory for faster serving of queries.
> For some queries, especially after a period of "idleness" I often find
> Nutch taking 7-8 seconds to return results for the query.
>
> Likewise do let me know if there are ways to better utilize main
> memory to speed up the indexing process.
>
>
>
> Thanks,
> Vijay
>
--
Best Regards
Alexander Aristov