You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sreedevi s <sr...@gmail.com> on 2015/02/05 10:13:21 UTC

MMapDirectory or FSDirectory

Hi,
I am doing some performance analysis with lucene. I have 1 million
resources with 1000 attributes.
According to how I index, I will have 1 million documents with 1000 fields.
For me the total data was about 100 GB and while using FSDirectory to store
my indices, index size was almost 6 GB.
I have virtual memory available of almost 8 GB. Is it advised to use
MMapDirectory for increased performance?
Many blogs suggest it doesnt bring out much performance difference.


Best Regards,
Sreedevi S

Re: MMapDirectory or FSDirectory

Posted by sreedevi s <sr...@gmail.com>.
Hi,
Thank you for sharing the blog.I am using FSDirectory.open() in my
program.So, I guess I am using MMapDirectory. It takes about 3 minutes when
I search for a key(which is actually present in 80% of total data) in all
the fields(1000) in this 1 million documents.

Best Regards,
Sreedevi S

On Thu, Feb 5, 2015 at 3:20 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> If you use FSDirectory.open() it will automatically choose MMapDirectory
> on 64 bit systems. Please note, virtual memory is != physical RAM. A 64 bit
> machine has *always* >1 Terabyte of virtual address space available, this
> is unrelated to physical memory (a common misunderstanding about mmap):
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> The speed difference depends on use case: In general MMapDirectory is much
> faster in multi-threaded environments, because no concurrency problems. If
> you use SimpleFSDirectory this is the largest bootleneck. NIOFSDirectory
> does not have concurrency problems, but it is still slower because it does
> a lot of extra copying of data between kernel space and user space for
> buffering. MMapDirectory is muuuuuuuch faster if you sort by docvalues
> fields, because it supports random access without any buffering overhead.
>
> So please: Use MMapDirectory where possible - this is completely unrelated
> to how much RAM you have available!
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: sreedevi s [mailto:sreedevi.payikkad@gmail.com]
> > Sent: Thursday, February 05, 2015 10:13 AM
> > To: java-user@lucene.apache.org
> > Subject: MMapDirectory or FSDirectory
> >
> > Hi,
> > I am doing some performance analysis with lucene. I have 1 million
> resources
> > with 1000 attributes.
> > According to how I index, I will have 1 million documents with 1000
> fields.
> > For me the total data was about 100 GB and while using FSDirectory to
> store
> > my indices, index size was almost 6 GB.
> > I have virtual memory available of almost 8 GB. Is it advised to use
> > MMapDirectory for increased performance?
> > Many blogs suggest it doesnt bring out much performance difference.
> >
> >
> > Best Regards,
> > Sreedevi S
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Fill up field date using file name or path

Posted by Erick Erickson <er...@gmail.com>.
There's nothing automatic that'll make a "best guess" about turning
some substring of the incoming field into a date.

You could create a custom update component to handle this case, or you
could massage the data on the ingestion
side to populate this field, but it'll be custom code one way or the other.

Best,
Erick

On Mon, Mar 2, 2015 at 10:00 AM, Mirko Torrisi
<mi...@ucdconnect.ie> wrote:
> Hi folks,
>
> Hopefully this is an easy question but I couldn't do it after several
> hours..
>
> I created a new field (adding <field name="date" type="date" indexed="true"
> stored="true"/>) and I'd like to fill out it using the filename or the file
> path.
> The file names are like: TEXT_CRE_YYYYMMGG_X-XXX-XXX.txt or
> TEXT_CRE_YYYYMMGG_X-XXX.txt (where every X are random numbers).
> The files are divided in directory following this rule:
> /YYYY/MM/**filename**.
>
> I don't know which way is easier (filename or path). I'd like to use a date
> field type to be able to use some group functions.
>
>
> Thank in advance.
> Have a nice week,
>
> Mirko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Fill up field date using file name or path

Posted by Mirko Torrisi <mi...@ucdconnect.ie>.
Hi folks,

Hopefully this is an easy question but I couldn't do it after several 
hours..

I created a new field (adding <field name="date" type="date" 
indexed="true" stored="true"/>) and I'd like to fill out it using the 
filename or the file path.
The file names are like: TEXT_CRE_YYYYMMGG_X-XXX-XXX.txt or 
TEXT_CRE_YYYYMMGG_X-XXX.txt (where every X are random numbers).
The files are divided in directory following this rule: 
/YYYY/MM/**filename**.

I don't know which way is easier (filename or path). I'd like to use a 
date field type to be able to use some group functions.


Thank in advance.
Have a nice week,

Mirko

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: MMapDirectory or FSDirectory

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

If you use FSDirectory.open() it will automatically choose MMapDirectory on 64 bit systems. Please note, virtual memory is != physical RAM. A 64 bit machine has *always* >1 Terabyte of virtual address space available, this is unrelated to physical memory (a common misunderstanding about mmap): http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

The speed difference depends on use case: In general MMapDirectory is much faster in multi-threaded environments, because no concurrency problems. If you use SimpleFSDirectory this is the largest bootleneck. NIOFSDirectory does not have concurrency problems, but it is still slower because it does a lot of extra copying of data between kernel space and user space for buffering. MMapDirectory is muuuuuuuch faster if you sort by docvalues fields, because it supports random access without any buffering overhead.

So please: Use MMapDirectory where possible - this is completely unrelated to how much RAM you have available!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: sreedevi s [mailto:sreedevi.payikkad@gmail.com]
> Sent: Thursday, February 05, 2015 10:13 AM
> To: java-user@lucene.apache.org
> Subject: MMapDirectory or FSDirectory
> 
> Hi,
> I am doing some performance analysis with lucene. I have 1 million resources
> with 1000 attributes.
> According to how I index, I will have 1 million documents with 1000 fields.
> For me the total data was about 100 GB and while using FSDirectory to store
> my indices, index size was almost 6 GB.
> I have virtual memory available of almost 8 GB. Is it advised to use
> MMapDirectory for increased performance?
> Many blogs suggest it doesnt bring out much performance difference.
> 
> 
> Best Regards,
> Sreedevi S


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org