You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Regan Heath <re...@BridgeHeadSoftware.com> on 2010/06/07 10:52:13 UTC

Re: indexWriter.addIndexes, Disk space, and open files

If you don't want to use the ImDisk software, a small flash drive will do
just as well...


Regan Heath wrote:
> 
> Windows XP.  
> 
> The problem occurs on the local file system, but to replicate it more
> easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a
> virtual 10mb disk on F:\.  It is formatted as an NTFS file system.
> 
> The files can be removed normally (delete from explorer or command prompt)
> after program shut down.  In fact, the program cleans them up itself on
> restart (an interim solution).
> 
> Process Explorer shows the program has handles to these three files open.
> 
> 
> Erick Erickson wrote:
>> 
>> What op system and what file system are you using? Is the file system
>> local
>> or
>> networked?
>> 
>> What does it take to remove the files. That is, can you do it manually
>> after
>> the
>> program shuts down?
>> 
>> Best
>> Erick
>> 
> 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: MultiPhraseQuery.toString() throws null pointer exception

Posted by Michael McCandless <lu...@mikemccandless.com>.
I opened LUCENE-2526 for this...

Mike

On Thu, Jul 1, 2010 at 2:19 PM, Woolf, Ross <Ro...@bmc.com> wrote:
> In Lucene 2.9.2 (have not checked 3.x) calling MultiPhraseQuery.toString() throws a null pointer exception.  Below is very simple code to test this out.
>
> import org.apache.lucene.search.MultiPhraseQuery;
>
> public class testMPQ {
>
>        public static void main(String[] args){
>                MultiPhraseQuery mpq = new MultiPhraseQuery();
>                System.out.println(mpq.toString());
>        }
> }
>
> It will produce the following exception:
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.lucene.search.MultiPhraseQuery.toString(MultiPhraseQuery.java:275)
>        at testMPQ.main(tesMPQ.java:7)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


MultiPhraseQuery.toString() throws null pointer exception

Posted by "Woolf, Ross" <Ro...@BMC.com>.
In Lucene 2.9.2 (have not checked 3.x) calling MultiPhraseQuery.toString() throws a null pointer exception.  Below is very simple code to test this out.

import org.apache.lucene.search.MultiPhraseQuery;

public class testMPQ {

	public static void main(String[] args){
		MultiPhraseQuery mpq = new MultiPhraseQuery();
		System.out.println(mpq.toString());
	}
}

It will produce the following exception:
Exception in thread "main" java.lang.NullPointerException
        at org.apache.lucene.search.MultiPhraseQuery.toString(MultiPhraseQuery.java:275)
        at testMPQ.main(tesMPQ.java:7)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexWriter.addIndexes, Disk space, and open files

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Jun 7, 2010 at 7:19 AM, Regan Heath
<re...@bridgeheadsoftware.com> wrote:
>
>>> That's pretty much exactly what I suspected was happening.  I've had the
> same
>>> problem myself on another occasion... out of interest is there any way to
>>> force the file closed without flushing?
>>
>>No, IndexOutput has no such method.  We could consider adding one...
>
> That sounds useful in general.
>
> In our case what we actually want is to abort the merge and delete all the
> new files created.

This is in fact what Lucene will do, if the disk full is hit during merge.

If the disk full is hit during flush(), Lucene discards those docs
that were the RAM buffer.

> But then, our usage may be slightly unusual in that we
> merge an existing 'master' index and a number of 'temporary' indices into a
> new master index.  On success we delete the old master and rename the new
> master into it's place.
>
> We're doing disk space checks prior to merge, based on the docs here:
> http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexWriter.html#optimize()
>
> but I disabled these to test this out of disk space case, as it is possible
> something else could use up the required space during the merge.
>
>>> From memory I tried everything I
>>> could think of at the time but couldn't manage it.  Best I could do was
>>> catch and swallow the expected exception from close and carry on.
>>
>>I think that's the best to do w/ today's API; but, you should save the
>>first IOE you hit, then force close the remaining files, then throw
>>that IOE.
>
> When you say 'force' close do you just mean wrapping the close calls in
> try/catch(IOException) where the catch block is empty (swallows the
> exception)?  Or is there a specific call to force a file closed?

The former.  There is no method.

>>> So, the only option for us is to upgrade the version of lucene we're
>>> using
>>> to the current trunk?  Is there no existing stable release version
>>> containing the fix?  If not, when do you estimate the next stable release
>>> with the fix will be available?
>>
>>I don't think any release of Lucene will have fixed all of these
>>cases, yet.  Patches welcome :)
>
> I would if I had the time, or sufficient understanding of the existing code,
> sadly I've only looked at it for 5  mins. :(

Start small then iterate... add that static method somewhere and call
it from a place or two :)

>>Actually, the best fix is something Earwin created but is not yet
>>committed (nor in a patch yet, I think), which adds a nice API for
>>closing multiple IndexOutputs safely.  Earwin, maybe you could pull
>>out just this part of your patch and open a separate issue?  Then we
>>can fix all places in Lucene that need to close multiple IndexOutputs
>>to use this API.
>
> That sounds great.. I'm not sure if something like this is useful to you..
>
> public class Safe
> {
>    /**
>     * Safely closes any object that implements closeable
>     *
>     * @param     closeable The object to close
>     */
>    public static void close(Closeable closeable)
>    {
>        try
>        {
>            closeable.close();
>        }
>        catch(Exception e)
>        {
>            // ignore
>        }
>        finally
>        {
>            closeable = null;
>        }
>    }
> }
>
> We use this in catch and finally blocks where we do not want to raise an
> exception.

That looks great!  I think Earwin's version took multiple Closeables
and closed them all, which would be useful.

We'd also want a way to close N closeables, but, if any exception is
hit on closing any of them, throw that exception, but still force the
remaining ones closed.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexWriter.addIndexes, Disk space, and open files

Posted by Regan Heath <re...@BridgeHeadSoftware.com>.
>> That's pretty much exactly what I suspected was happening.  I've had the
same
>> problem myself on another occasion... out of interest is there any way to
>> force the file closed without flushing?
>
>No, IndexOutput has no such method.  We could consider adding one...

That sounds useful in general.  

In our case what we actually want is to abort the merge and delete all the
new files created.  But then, our usage may be slightly unusual in that we
merge an existing 'master' index and a number of 'temporary' indices into a
new master index.  On success we delete the old master and rename the new
master into it's place.

We're doing disk space checks prior to merge, based on the docs here:
http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexWriter.html#optimize()

but I disabled these to test this out of disk space case, as it is possible
something else could use up the required space during the merge.

>> From memory I tried everything I
>> could think of at the time but couldn't manage it.  Best I could do was
>> catch and swallow the expected exception from close and carry on.
>
>I think that's the best to do w/ today's API; but, you should save the
>first IOE you hit, then force close the remaining files, then throw
>that IOE.

When you say 'force' close do you just mean wrapping the close calls in
try/catch(IOException) where the catch block is empty (swallows the
exception)?  Or is there a specific call to force a file closed?

>> So, the only option for us is to upgrade the version of lucene we're
>> using
>> to the current trunk?  Is there no existing stable release version
>> containing the fix?  If not, when do you estimate the next stable release
>> with the fix will be available?
>
>I don't think any release of Lucene will have fixed all of these
>cases, yet.  Patches welcome :)

I would if I had the time, or sufficient understanding of the existing code,
sadly I've only looked at it for 5  mins. :(

>Actually, the best fix is something Earwin created but is not yet
>committed (nor in a patch yet, I think), which adds a nice API for
>closing multiple IndexOutputs safely.  Earwin, maybe you could pull
>out just this part of your patch and open a separate issue?  Then we
>can fix all places in Lucene that need to close multiple IndexOutputs
>to use this API.

That sounds great.. I'm not sure if something like this is useful to you..

public class Safe
{
    /**
     * Safely closes any object that implements closeable
     *
     * @param     closeable The object to close
     */
    public static void close(Closeable closeable)
    {
        try
        {
            closeable.close();
        }
        catch(Exception e)
        {
            // ignore
        }
        finally
        {
            closeable = null;
        }
    }
}

We use this in catch and finally blocks where we do not want to raise an
exception.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p876022.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexWriter.addIndexes, Disk space, and open files

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Mon, Jun 7, 2010 at 6:18 AM, Regan Heath
<re...@bridgeheadsoftware.com> wrote:
>
> That's pretty much exactly what I suspected was happening.  I've had the same
> problem myself on another occasion... out of interest is there any way to
> force the file closed without flushing?

No, IndexOutput has no such method.  We could consider adding one...

> From memory I tried everything I
> could think of at the time but couldn't manage it.  Best I could do was
> catch and swallow the expected exception from close and carry on.

I think that's the best to do w/ today's API; but, you should save the
first IOE you hit, then force close the remaining files, then throw
that IOE.

> So, the only option for us is to upgrade the version of lucene we're using
> to the current trunk?  Is there no existing stable release version
> containing the fix?  If not, when do you estimate the next stable release
> with the fix will be available?

I don't think any release of Lucene will have fixed all of these
cases, yet.  Patches welcome :)

Actually, the best fix is something Earwin created but is not yet
committed (nor in a patch yet, I think), which adds a nice API for
closing multiple IndexOutputs safely.  Earwin, maybe you could pull
out just this part of your patch and open a separate issue?  Then we
can fix all places in Lucene that need to close multiple IndexOutputs
to use this API.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexWriter.addIndexes, Disk space, and open files

Posted by Regan Heath <re...@BridgeHeadSoftware.com>.
That's pretty much exactly what I suspected was happening.  I've had the same
problem myself on another occasion... out of interest is there any way to
force the file closed without flushing?  From memory I tried everything I
could think of at the time but couldn't manage it.  Best I could do was
catch and swallow the expected exception from close and carry on.

So, the only option for us is to upgrade the version of lucene we're using
to the current trunk?  Is there no existing stable release version
containing the fix?  If not, when do you estimate the next stable release
with the fix will be available?

Thanks,
Regan


Michael McCandless-2 wrote:
> 
> This is a bug in how Lucene handles IOException while closing files.
> 
> Look at SegmentMerger's sources, for 2.3.2:
> 
>  
> https://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_2/src/java/org/apache/lucene/index/SegmentMerger.java
> 
> Look at the finally clause in mergeTerms:
> 
>     } finally {
>       if (freqOutput != null) freqOutput.close();
>       if (proxOutput != null) proxOutput.close();
>       if (termInfosWriter != null) termInfosWriter.close();
>       if (queue != null) queue.close();
>     }
> 
> You are hitting an exception in that freqOutput.close, which means the
> proxOutput (*.prx) and termInfosWriter (*.tii, *.tis) are not
> successfully closed.  It looks like the bug is still present to some
> degree through 3x, but fixed (at least specifically for segment
> merging, but likely not in other places) in trunk.
> 
> Likely what happened is you hit a disk full inside the "try" part, and
> so the finally clause went to close the files, but close then tries to
> flush the pending buffer, which also hits disk full.
> 
> Mike
> 
> On Mon, Jun 7, 2010 at 4:52 AM, Regan Heath
> <re...@bridgeheadsoftware.com> wrote:
>>
>> If you don't want to use the ImDisk software, a small flash drive will do
>> just as well...
>>
>>
>> Regan Heath wrote:
>>>
>>> Windows XP.
>>>
>>> The problem occurs on the local file system, but to replicate it more
>>> easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a
>>> virtual 10mb disk on F:\.  It is formatted as an NTFS file system.
>>>
>>> The files can be removed normally (delete from explorer or command
>>> prompt)
>>> after program shut down.  In fact, the program cleans them up itself on
>>> restart (an interim solution).
>>>
>>> Process Explorer shows the program has handles to these three files
>>> open.
>>>
>>>
>>> Erick Erickson wrote:
>>>>
>>>> What op system and what file system are you using? Is the file system
>>>> local
>>>> or
>>>> networked?
>>>>
>>>> What does it take to remove the files. That is, can you do it manually
>>>> after
>>>> the
>>>> program shuts down?
>>>>
>>>> Best
>>>> Erick
>>>>
>>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875884.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexWriter.addIndexes, Disk space, and open files

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is a bug in how Lucene handles IOException while closing files.

Look at SegmentMerger's sources, for 2.3.2:

  https://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_2/src/java/org/apache/lucene/index/SegmentMerger.java

Look at the finally clause in mergeTerms:

    } finally {
      if (freqOutput != null) freqOutput.close();
      if (proxOutput != null) proxOutput.close();
      if (termInfosWriter != null) termInfosWriter.close();
      if (queue != null) queue.close();
    }

You are hitting an exception in that freqOutput.close, which means the
proxOutput (*.prx) and termInfosWriter (*.tii, *.tis) are not
successfully closed.  It looks like the bug is still present to some
degree through 3x, but fixed (at least specifically for segment
merging, but likely not in other places) in trunk.

Likely what happened is you hit a disk full inside the "try" part, and
so the finally clause went to close the files, but close then tries to
flush the pending buffer, which also hits disk full.

Mike

On Mon, Jun 7, 2010 at 4:52 AM, Regan Heath
<re...@bridgeheadsoftware.com> wrote:
>
> If you don't want to use the ImDisk software, a small flash drive will do
> just as well...
>
>
> Regan Heath wrote:
>>
>> Windows XP.
>>
>> The problem occurs on the local file system, but to replicate it more
>> easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a
>> virtual 10mb disk on F:\.  It is formatted as an NTFS file system.
>>
>> The files can be removed normally (delete from explorer or command prompt)
>> after program shut down.  In fact, the program cleans them up itself on
>> restart (an interim solution).
>>
>> Process Explorer shows the program has handles to these three files open.
>>
>>
>> Erick Erickson wrote:
>>>
>>> What op system and what file system are you using? Is the file system
>>> local
>>> or
>>> networked?
>>>
>>> What does it take to remove the files. That is, can you do it manually
>>> after
>>> the
>>> program shuts down?
>>>
>>> Best
>>> Erick
>>>
>>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org