You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2003/10/01 00:37:37 UTC

Re: Testcase failure on OSX

Hani Suleiman wrote:

> The count for FSDirectory on OSX is around 700-800. The problem is 
> that  the filesystem timestamps on OSX has abysmal resolution (nothing 
> finer  than 1 whole second!) 

Wow! Write once, debug everywhere...
Is this an OSX "feature" or is this a feature of a particular disk / 
filesystem / driver / whatnot that is in use in the current run?
If this is unavoidable on OSX, perhaps other platforms are also 
affected? Does anyone know the timestamp resolution on Wintel, Lintel, 
Solaris, etc?

Finally, what if we allocate a "long" in the beginning of the files that 
need to be touched (like the segments and other files) and use it to 
store the full System.currentTimeMillis timestamp? Does this method 
return a more granular number on OSX? This would be a big change, but it 
would work the same on all platforms and will not have a slowdown.

>
>
> So the touch method will spin its wheels for anything up to a second  
> before we've registered an actual 'touch'.
>
> This isn't so bad as files aren't touch that often, but it's a lot 
> more  problematic in the case of SegmentInfos.write(), where a new 
> file is  created, then copied over the old one. The  
> testDeleteReaderWriteConflict testcase fails due to this. The old 
> file  and the new have the same timestamp, and so the staleness check 
> fails.  How should this be fixed?
>
> Also, won't the fix for FSDirectory mean there's a window where other  
> threads could end up with bad info, if we're taking up to a second to  
> touch a file?

I don't know off hand, but I believe there is file locking in place to 
prevent other threads from getting into this code until the update 
operation is completed, and it will not be completed until the touch 
finishes. This still sucks with a 1-second blocking...

Dmitry.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
Hm, something about this feels a bit hacky/wrong, but I can't quite put 
my finger on it!

Having said this, this does seem like it'd fix stuff. The 'hack' value 
comes from basically using the modification time as a counter, not a 
timestamp. N in the case below should be something like 1000 (or 1500?)

On Tuesday, September 30, 2003, at 08:05 PM, Dmitry Serebrennikov wrote:

>
> Steve Rowe wrote:
>
>> Why not just add 1000ms to the current time when touching the files 
>> (instead of waiting for a second to pass). I tried in on my Linux box 
>> (e2fs) via the java File.setLastModified() method, and it allowed the 
>> file to have a timestamp in the future.
>>
>> Wouldn't this be a much smaller effort?
>
> Yes! This is a great idea!
>
> I think to make this work, the new timestamp should be set to the old 
> timestamp + N, where N is the corsest resolution of the timestamp that 
> Lucene is willing to support. For example, if the old segments file 
> has timestamp t1, and a new file is created at t2, the new timestamp 
> should be t1 + N, not t2 + N. I think this will help with cases when 
> t2 is less than t1 + N. Without it, if we have two updates in quick 
> succession, the second update may happen before the time set on the 
> segments file by the first update, and the new timestamp may be 
> rounded to the same value as the old. But I think if we always make 
> sure that the new timestamp is greater than the timestamp on the old 
> file (even if the old file's timestamp is still in the future), this 
> should work.
>
> Can anyone see a problem with this implementation?
> Dmitry
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Steve Rowe wrote:

> Why not just add 1000ms to the current time when touching the files 
> (instead of waiting for a second to pass). I tried in on my Linux box 
> (e2fs) via the java File.setLastModified() method, and it allowed the 
> file to have a timestamp in the future.
>
> Wouldn't this be a much smaller effort? 

Yes! This is a great idea!

I think to make this work, the new timestamp should be set to the old 
timestamp + N, where N is the corsest resolution of the timestamp that 
Lucene is willing to support. For example, if the old segments file has 
timestamp t1, and a new file is created at t2, the new timestamp should 
be t1 + N, not t2 + N. I think this will help with cases when t2 is less 
than t1 + N. Without it, if we have two updates in quick succession, the 
second update may happen before the time set on the segments file by the 
first update, and the new timestamp may be rounded to the same value as 
the old. But I think if we always make sure that the new timestamp is 
greater than the timestamp on the old file (even if the old file's 
timestamp is still in the future), this should work.

Can anyone see a problem with this implementation?
Dmitry



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Christoph Goller <go...@detego-software.de>.
I am not too concerned about the index bound checks. They are an additional
safety and I don´t think they matter in terms of efficiency. The only point
is that I forgot them in my patch and without them the tests fail. So if you
decide to apply my patch, the checks have to be added. If you give your ok,
I can do the commit too.

Christoph

Dmitry Serebrennikov schrieb:
> I put those in mostly to assure myself that I got things right. I think 
> the key question is whether it possible to read part of another file. If 
> not, I think that's fine. If yes, I think that's a problem.
> 
> Dmitry.
> 
> Christoph Goller wrote:
> 
>> Hi Dmitry,
>>
>> Now I tried all test cases. They all work except for Russian 
>> analyser/stemmer
>> and occational fails of TestIndexReader (the timestamp problem). So I 
>> think
>> it should be ok as far as CoumpoundFile is concerned. Off course we still
>> have to find a good solution for the timestamp problem.
>>
>> However,I stumbled over a problem that I had missed last time. 
>> TestCompoundFile
>> only succedds with your index bound tests in CSInputStream.seekInternal.
>> On Thursday I had deleted them after trying your test cases because 
>> the other
>> implementations don´t do these tests either. I did not go too deep 
>> into your
>> tests, but do you think the bahaviour of throwing an exception if the 
>> seek
>> index is out of bound is required? Its not part of the contract of the 
>> other
>> implementations of InputStream. Maybe I am missing something here.
>>
>> Dmitry Serebrennikov schrieb:
>>
>>> Dear Christoph,
>>>
>>> Sounds like an excellent enhancement. From a quick look, it appears 
>>> that you are right and everything should work just fine but use less 
>>> memory. One question: have you tried the other test cases also or 
>>> just the TestCompoundFile. There are quite a few conditions that 
>>> TestCompoundFile does not cover.
>>>
>>> At first I thought that the synchronization around readBytes would 
>>> cause too much performance degradation when a lot of concurrent 
>>> queries were executing. But after I looked at it some more, I 
>>> convinced myself that it should be ok. Have you ran any 
>>> multi-threaded tests / benchmarks? I think it might also be a good 
>>> idea before making this change.
>>>
>>> Christoph, do you think it is possible to just call readInternal on 
>>> the base stream instead of the readBytes? The main difference is that 
>>> we would bypass the buffering in the base stream. I think the 
>>> buffering done by the superclass of the CSInputStream would be quite 
>>> enough (which is your point to begin with, right)? Perhaps it would 
>>> be worthwhile to make InputStream.readInternal() public instead of 
>>> protected?
>>
>>
>>
>> In CSInputStream.readInternal I call:
>>
>> synchronized (base) {
>>   base.seek(fileOffset + getFilePointer());
>>   base.readBytes(b, offset, len);
>> }
>>
>> Calling base.seek does nothing more than setting the file pointer
>> (bufferStart + bufferPosition) of base correctly.
>>
>> base.readBytes(b, offset, len) in this case does not use the buffer of
>> base (at least in most cases). Look into InputStream.readBytes.
>> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
>> used directly.
>>
>> I think synchronized in our case does not much more than synchronizing
>> on the actual file in FSInputStream.readInternal.
>>
>> Christoph
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*************************************************************
* Dr. Christoph Goller     Tel. : +49 89 203 45734          *
* Managing Director        Email: goller@detego-software.de *
* Detego Software GmbH     Mail : Keuslinstr. 13,           *
*                                 80798 München, Germany    *
*************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
+1
Looks like a great optimization.
Dmitry.


Christoph Goller wrote:

> Dmitry Serebrennikov schrieb:
>
>> I put those in mostly to assure myself that I got things right. I 
>> think the key question is whether it possible to read part of another 
>> file. If not, I think that's fine. If yes, I think that's a problem.
>>
>> Dmitry.
>>
>
> Hi Dmitry,
>
> I have been using my patch for a couple of weaks now and I think it is 
> ok.
> I kept the index bound checks, so that the original tests go through.
> I would like to commit these changes. Did I answer all your questions
> in my last email? Do you have any objections against commiting the patch?
>
> Christoph
>
> PS: patch is attached again.
>
>------------------------------------------------------------------------
>
>Index: CompoundFileReader.java
>===================================================================
>RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v
>retrieving revision 1.2
>diff -u -r1.2 CompoundFileReader.java
>--- CompoundFileReader.java	13 Oct 2003 14:18:04 -0000	1.2
>+++ CompoundFileReader.java	19 Nov 2003 13:06:40 -0000
>@@ -240,10 +240,9 @@
>                       final long length)
>         throws IOException
>         {
>-            this.base = (InputStream) base.clone();
>+            this.base = base;
>             this.fileOffset = fileOffset;
>             this.length = length;   // variable in the superclass
>-            seekInternal(0);        // position to the adjusted 0th byte
>         }
> 
>         /** Expert: implements buffer refill.  Reads bytes from the current
>@@ -255,7 +254,10 @@
>         protected void readInternal(byte[] b, int offset, int len)
>         throws IOException
>         {
>-            base.readBytes(b, offset, len);
>+            synchronized (base) {
>+              base.seek(fileOffset + getFilePointer());
>+              base.readBytes(b, offset, len);
>+            }
>         }
> 
>         /** Expert: implements seek.  Sets current position in this file, where
>@@ -269,29 +271,11 @@
> 
>             if (pos < 0)
>                 throw new IOException("Seek to a negative offset");
>-
>-            base.seek(fileOffset + pos);
>         }
> 
>         /** Closes the stream to futher operations. */
>         public void close() throws IOException
>-        {
>-            base.close();
>-        }
>+        {}
> 
>-        /** Returns a clone of this stream.
>-         *
>-         * <p>Clones of a stream access the same data, and are positioned at the same
>-         * point as the stream they were cloned from.
>-         *
>-         * <p>Expert: Subclasses must ensure that clones may be positioned at
>-         * different points in the input from each other and from the stream they
>-         * were cloned from.
>-         */
>-        public Object clone() {
>-            CSInputStream other = (CSInputStream) super.clone();
>-            other.base = (InputStream) base.clone();
>-            return other;
>-        }
>     }
> }
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Doug Cutting <cu...@lucene.com>.
Christoph Goller wrote:
> I would like to commit these changes.

+1

Thanks,

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Christoph Goller <go...@detego-software.de>.
Otis Gospodnetic schrieb:
> Hello Christoph,
> 
> I don't know enough to judge all changes you made with confidence, and
> don't have the code on this machine to apply the patch and look at it
> properly.  However, it looks like seekInternal(long) should now be left
> with only the conditionals that can cause IOException, and nothing
> else.  Why do we need to keep the body of the method then?

I discussed that already with Dmitry and he was not very passionate
regarding these conditionals. Finally I kept them since they were required
by Dmitry's test. But throwing an exception on seek is
not a behavior implemented by the other subclasses of InputStream.
I decided to remove the conditionals and slightly changed the unit test.

If more index checks are desirable, I would put them into InputStream.
Despite that I added a "read past EOF" check to CSInputStream.readInternal
to guarantee the same behavior as FSInputStream.

> 
> I also see getFilePointer() call, but can't tell where that method is. 
> I'll assume it's from the parent InputStream.

yes.






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello Christoph,

I don't know enough to judge all changes you made with confidence, and
don't have the code on this machine to apply the patch and look at it
properly.  However, it looks like seekInternal(long) should now be left
with only the conditionals that can cause IOException, and nothing
else.  Why do we need to keep the body of the method then?

I also see getFilePointer() call, but can't tell where that method is. 
I'll assume it's from the parent InputStream.

Otis


--- Christoph Goller <go...@detego-software.de> wrote:
> Dmitry Serebrennikov schrieb:
> > I put those in mostly to assure myself that I got things right. I
> think 
> > the key question is whether it possible to read part of another
> file. If 
> > not, I think that's fine. If yes, I think that's a problem.
> > 
> > Dmitry.
> > 
> 
> Hi Dmitry,
> 
> I have been using my patch for a couple of weaks now and I think it
> is ok.
> I kept the index bound checks, so that the original tests go through.
> I would like to commit these changes. Did I answer all your questions
> in my last email? Do you have any objections against commiting the
> patch?
> 
> Christoph
> 
> PS: patch is attached again.
> > Index: CompoundFileReader.java
> ===================================================================
> RCS file:
>
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v
> retrieving revision 1.2
> diff -u -r1.2 CompoundFileReader.java
> --- CompoundFileReader.java	13 Oct 2003 14:18:04 -0000	1.2
> +++ CompoundFileReader.java	19 Nov 2003 13:06:40 -0000
> @@ -240,10 +240,9 @@
>                        final long length)
>          throws IOException
>          {
> -            this.base = (InputStream) base.clone();
> +            this.base = base;
>              this.fileOffset = fileOffset;
>              this.length = length;   // variable in the superclass
> -            seekInternal(0);        // position to the adjusted 0th
> byte
>          }
>  
>          /** Expert: implements buffer refill.  Reads bytes from the
> current
> @@ -255,7 +254,10 @@
>          protected void readInternal(byte[] b, int offset, int len)
>          throws IOException
>          {
> -            base.readBytes(b, offset, len);
> +            synchronized (base) {
> +              base.seek(fileOffset + getFilePointer());
> +              base.readBytes(b, offset, len);
> +            }
>          }
>  
>          /** Expert: implements seek.  Sets current position in this
> file, where
> @@ -269,29 +271,11 @@
>  
>              if (pos < 0)
>                  throw new IOException("Seek to a negative offset");
> -
> -            base.seek(fileOffset + pos);
>          }
>  
>          /** Closes the stream to futher operations. */
>          public void close() throws IOException
> -        {
> -            base.close();
> -        }
> +        {}
>  
> -        /** Returns a clone of this stream.
> -         *
> -         * <p>Clones of a stream access the same data, and are
> positioned at the same
> -         * point as the stream they were cloned from.
> -         *
> -         * <p>Expert: Subclasses must ensure that clones may be
> positioned at
> -         * different points in the input from each other and from
> the stream they
> -         * were cloned from.
> -         */
> -        public Object clone() {
> -            CSInputStream other = (CSInputStream) super.clone();
> -            other.base = (InputStream) base.clone();
> -            return other;
> -        }
>      }
>  }
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Christoph Goller <go...@detego-software.de>.
Dmitry Serebrennikov schrieb:
> I put those in mostly to assure myself that I got things right. I think 
> the key question is whether it possible to read part of another file. If 
> not, I think that's fine. If yes, I think that's a problem.
> 
> Dmitry.
> 

Hi Dmitry,

I have been using my patch for a couple of weaks now and I think it is ok.
I kept the index bound checks, so that the original tests go through.
I would like to commit these changes. Did I answer all your questions
in my last email? Do you have any objections against commiting the patch?

Christoph

PS: patch is attached again.

Re: CompoundFileReader

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
I put those in mostly to assure myself that I got things right. I think 
the key question is whether it possible to read part of another file. If 
not, I think that's fine. If yes, I think that's a problem.

Dmitry.

Christoph Goller wrote:

> Hi Dmitry,
>
> Now I tried all test cases. They all work except for Russian 
> analyser/stemmer
> and occational fails of TestIndexReader (the timestamp problem). So I 
> think
> it should be ok as far as CoumpoundFile is concerned. Off course we still
> have to find a good solution for the timestamp problem.
>
> However,I stumbled over a problem that I had missed last time. 
> TestCompoundFile
> only succedds with your index bound tests in CSInputStream.seekInternal.
> On Thursday I had deleted them after trying your test cases because 
> the other
> implementations don´t do these tests either. I did not go too deep 
> into your
> tests, but do you think the bahaviour of throwing an exception if the 
> seek
> index is out of bound is required? Its not part of the contract of the 
> other
> implementations of InputStream. Maybe I am missing something here.
>
> Dmitry Serebrennikov schrieb:
>
>> Dear Christoph,
>>
>> Sounds like an excellent enhancement. From a quick look, it appears 
>> that you are right and everything should work just fine but use less 
>> memory. One question: have you tried the other test cases also or 
>> just the TestCompoundFile. There are quite a few conditions that 
>> TestCompoundFile does not cover.
>>
>> At first I thought that the synchronization around readBytes would 
>> cause too much performance degradation when a lot of concurrent 
>> queries were executing. But after I looked at it some more, I 
>> convinced myself that it should be ok. Have you ran any 
>> multi-threaded tests / benchmarks? I think it might also be a good 
>> idea before making this change.
>>
>> Christoph, do you think it is possible to just call readInternal on 
>> the base stream instead of the readBytes? The main difference is that 
>> we would bypass the buffering in the base stream. I think the 
>> buffering done by the superclass of the CSInputStream would be quite 
>> enough (which is your point to begin with, right)? Perhaps it would 
>> be worthwhile to make InputStream.readInternal() public instead of 
>> protected?
>
>
> In CSInputStream.readInternal I call:
>
> synchronized (base) {
>   base.seek(fileOffset + getFilePointer());
>   base.readBytes(b, offset, len);
> }
>
> Calling base.seek does nothing more than setting the file pointer
> (bufferStart + bufferPosition) of base correctly.
>
> base.readBytes(b, offset, len) in this case does not use the buffer of
> base (at least in most cases). Look into InputStream.readBytes.
> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
> used directly.
>
> I think synchronized in our case does not much more than synchronizing
> on the actual file in FSInputStream.readInternal.
>
> Christoph
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Christoph Goller <go...@detego-software.de>.
Thank you Erik,

TestRussianAnalyzer now works from within Eclipse.
I made/committed similar changes to TestRussianStem
and it now works from within Eclipse too.

Erik Hatcher schrieb:
> 
> On Monday, October 20, 2003, at 01:10  PM, Christoph Goller wrote:
> 
>> Sorry fo all the mess. I found out it works for me from the command 
>> line, too.
>> My IDE fooled me. The file not found errors for TestRussianAnalyzer 
>> TestRussianStem
>> only occur if I call ant from within Eclipse, even if I explicitly set 
>> the base
>> directory????? Well, maybe I should look into the manual :-)
> 
> 
> I just made a change that will hopefully make this test work fine in 
> your IDE.  Set the system VM property "dataDir" to the absolute path to 
> your src/test directory and it should work.  There are other ways to 
> make JUnit test cases resilient to their environment, but this was the 
> quickest/easiest one.
> 
> Let me know if there are still issues with TestRussianAnalyzer.
> 
>     Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Monday, October 20, 2003, at 01:10  PM, Christoph Goller wrote:
> Sorry fo all the mess. I found out it works for me from the command 
> line, too.
> My IDE fooled me. The file not found errors for TestRussianAnalyzer 
> TestRussianStem
> only occur if I call ant from within Eclipse, even if I explicitly set 
> the base
> directory????? Well, maybe I should look into the manual :-)

I just made a change that will hopefully make this test work fine in 
your IDE.  Set the system VM property "dataDir" to the absolute path to 
your src/test directory and it should work.  There are other ways to 
make JUnit test cases resilient to their environment, but this was the 
quickest/easiest one.

Let me know if there are still issues with TestRussianAnalyzer.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Christoph Goller <go...@detego-software.de>.
Sorry fo all the mess. I found out it works for me from the command line, too.
My IDE fooled me. The file not found errors for TestRussianAnalyzer TestRussianStem
only occur if I call ant from within Eclipse, even if I explicitly set the base
directory????? Well, maybe I should look into the manual :-)

Christoph

Erik Hatcher schrieb:
> TestRussionAnalyzer works fine for me on Mac OS X running 'ant test' 
> from the command-line.
> 
> Could you make sure you have the latest from CVS and let me know the 
> details of the error you're seeing?
> 
> Thanks,
>     Erik
> 
> 
> 
> On Monday, October 20, 2003, at 12:02  PM, Christoph Goller wrote:
> 
>> Its not an encoding problem. I now checked that. What happens is that 
>> the ASCII
>> files for the tests are not found if I run the test from ant using 
>> build.xml.
>> Could you try if you can run the tests from ant using Erik´s latest 
>> build.xml?
>> I am not an ant expert. Either I somewhere have a wrong global 
>> base/root directory
>> setting, or some setting has to be changed in build.xml or within the 
>> tests.
>>
>> Christoph
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*************************************************************
* Dr. Christoph Goller     Tel. : +49 89 203 45734          *
* Managing Director        Email: goller@detego-software.de *
* Detego Software GmbH     Mail : Keuslinstr. 13,           *
*                                 80798 München, Germany    *
*************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
TestRussionAnalyzer works fine for me on Mac OS X running 'ant test' 
from the command-line.

Could you make sure you have the latest from CVS and let me know the 
details of the error you're seeing?

Thanks,
	Erik



On Monday, October 20, 2003, at 12:02  PM, Christoph Goller wrote:
> Its not an encoding problem. I now checked that. What happens is that 
> the ASCII
> files for the tests are not found if I run the test from ant using 
> build.xml.
> Could you try if you can run the tests from ant using Erik´s latest 
> build.xml?
> I am not an ant expert. Either I somewhere have a wrong global 
> base/root directory
> setting, or some setting has to be changed in build.xml or within the 
> tests.
>
> Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Christoph Goller <go...@detego-software.de>.
Its not an encoding problem. I now checked that. What happens is that the ASCII
files for the tests are not found if I run the test from ant using build.xml.
Could you try if you can run the tests from ant using Erik´s latest build.xml?
I am not an ant expert. Either I somewhere have a wrong global base/root directory
setting, or some setting has to be changed in build.xml or within the tests.

Christoph

Hani Suleiman schrieb:
> What platform are you running on? The russian stuff was failing 
> previously due to encoding issues, whereby the tests assumed the the 
> default file encoding is always iso-8859-1, which isn't the case on all 
> OS's.
> 
> The reader and writer tests both fail occasionally on linux without the 
> timestamp patch. Would be nice if you could test out my patch and see if 
> those errors go away ;)
> 
> Christoph Goller wrote:
> 
>>
>> Hani Suleiman schrieb:
>>
>>> What Russian stemmer problems? I've submitted a fix for that a few 
>>> weeks ago which was checked in, so there shouldn't be any more 
>>> problems with it.
>>>
>>
>> If I run these two tests separately from my eclipse, everything is fine.
>> If I run all tests with ant from the current build.xml these two tests
>> and TestIndexReader fail. I have checked and I am sure I have the latest
>> sources. I have not idea whats happening. Do you think failure of
>> TestRussianAnalyzer and TestRussianStem could be caused by the timestamp
>> problem? I am working with Linux and TestIndexReader occasionally fails
>> if FSDirectory is used.
>>
>> Christoph
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*************************************************************
* Dr. Christoph Goller     Tel. : +49 89 203 45734          *
* Managing Director        Email: goller@detego-software.de *
* Detego Software GmbH     Mail : Keuslinstr. 13,           *
*                                 80798 München, Germany    *
*************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: TestRussianAnalyzer TestRussianStem

Posted by Hani Suleiman <ha...@formicary.net>.
What platform are you running on? The russian stuff was failing 
previously due to encoding issues, whereby the tests assumed the the 
default file encoding is always iso-8859-1, which isn't the case on all 
OS's.

The reader and writer tests both fail occasionally on linux without the 
timestamp patch. Would be nice if you could test out my patch and see if 
those errors go away ;)

Christoph Goller wrote:

> 
> Hani Suleiman schrieb:
> 
>> What Russian stemmer problems? I've submitted a fix for that a few 
>> weeks ago which was checked in, so there shouldn't be any more 
>> problems with it.
>>
> 
> If I run these two tests separately from my eclipse, everything is fine.
> If I run all tests with ant from the current build.xml these two tests
> and TestIndexReader fail. I have checked and I am sure I have the latest
> sources. I have not idea whats happening. Do you think failure of
> TestRussianAnalyzer and TestRussianStem could be caused by the timestamp
> problem? I am working with Linux and TestIndexReader occasionally fails
> if FSDirectory is used.
> 
> Christoph
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


TestRussianAnalyzer TestRussianStem

Posted by Christoph Goller <go...@detego-software.de>.
Hani Suleiman schrieb:
> What Russian stemmer problems? I've submitted a fix for that a few weeks 
> ago which was checked in, so there shouldn't be any more problems with it.
> 

If I run these two tests separately from my eclipse, everything is fine.
If I run all tests with ant from the current build.xml these two tests
and TestIndexReader fail. I have checked and I am sure I have the latest
sources. I have not idea whats happening. Do you think failure of
TestRussianAnalyzer and TestRussianStem could be caused by the timestamp
problem? I am working with Linux and TestIndexReader occasionally fails
if FSDirectory is used.

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Hani Suleiman <ha...@formicary.net>.
What Russian stemmer problems? I've submitted a fix for that a few 
weeks ago which was checked in, so there shouldn't be any more problems 
with it.

On Saturday, October 18, 2003, at 01:07 PM, Christoph Goller wrote:

> Hi Dmitry,
>
> Now I tried all test cases. They all work except for Russian 
> analyser/stemmer
> and occational fails of TestIndexReader (the timestamp problem). So I 
> think
> it should be ok as far as CoumpoundFile is concerned. Off course we 
> still
> have to find a good solution for the timestamp problem.
>
> However,I stumbled over a problem that I had missed last time. 
> TestCompoundFile
> only succedds with your index bound tests in 
> CSInputStream.seekInternal.
> On Thursday I had deleted them after trying your test cases because 
> the other
> implementations don´t do these tests either. I did not go too deep 
> into your
> tests, but do you think the bahaviour of throwing an exception if the 
> seek
> index is out of bound is required? Its not part of the contract of the 
> other
> implementations of InputStream. Maybe I am missing something here.
>
> Dmitry Serebrennikov schrieb:
>> Dear Christoph,
>> Sounds like an excellent enhancement. From a quick look, it appears 
>> that you are right and everything should work just fine but use less 
>> memory. One question: have you tried the other test cases also or 
>> just the TestCompoundFile. There are quite a few conditions that 
>> TestCompoundFile does not cover.
>> At first I thought that the synchronization around readBytes would 
>> cause too much performance degradation when a lot of concurrent 
>> queries were executing. But after I looked at it some more, I 
>> convinced myself that it should be ok. Have you ran any 
>> multi-threaded tests / benchmarks? I think it might also be a good 
>> idea before making this change.
>> Christoph, do you think it is possible to just call readInternal on 
>> the base stream instead of the readBytes? The main difference is that 
>> we would bypass the buffering in the base stream. I think the 
>> buffering done by the superclass of the CSInputStream would be quite 
>> enough (which is your point to begin with, right)? Perhaps it would 
>> be worthwhile to make InputStream.readInternal() public instead of 
>> protected?
>
> In CSInputStream.readInternal I call:
>
> synchronized (base) {
>   base.seek(fileOffset + getFilePointer());
>   base.readBytes(b, offset, len);
> }
>
> Calling base.seek does nothing more than setting the file pointer
> (bufferStart + bufferPosition) of base correctly.
>
> base.readBytes(b, offset, len) in this case does not use the buffer of
> base (at least in most cases). Look into InputStream.readBytes.
> If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
> used directly.
>
> I think synchronized in our case does not much more than synchronizing
> on the actual file in FSInputStream.readInternal.
>
> Christoph
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Christoph Goller <go...@detego-software.de>.
Hi Dmitry,

Now I tried all test cases. They all work except for Russian analyser/stemmer
and occational fails of TestIndexReader (the timestamp problem). So I think
it should be ok as far as CoumpoundFile is concerned. Off course we still
have to find a good solution for the timestamp problem.

However,I stumbled over a problem that I had missed last time. TestCompoundFile
only succedds with your index bound tests in CSInputStream.seekInternal.
On Thursday I had deleted them after trying your test cases because the other
implementations don´t do these tests either. I did not go too deep into your
tests, but do you think the bahaviour of throwing an exception if the seek
index is out of bound is required? Its not part of the contract of the other
implementations of InputStream. Maybe I am missing something here.

Dmitry Serebrennikov schrieb:
> Dear Christoph,
> 
> Sounds like an excellent enhancement. From a quick look, it appears that 
> you are right and everything should work just fine but use less memory. 
> One question: have you tried the other test cases also or just the 
> TestCompoundFile. There are quite a few conditions that TestCompoundFile 
> does not cover.
> 
> At first I thought that the synchronization around readBytes would cause 
> too much performance degradation when a lot of concurrent queries were 
> executing. But after I looked at it some more, I convinced myself that 
> it should be ok. Have you ran any multi-threaded tests / benchmarks? I 
> think it might also be a good idea before making this change.
> 
> Christoph, do you think it is possible to just call readInternal on the 
> base stream instead of the readBytes? The main difference is that we 
> would bypass the buffering in the base stream. I think the buffering 
> done by the superclass of the CSInputStream would be quite enough (which 
> is your point to begin with, right)? Perhaps it would be worthwhile to 
> make InputStream.readInternal() public instead of protected?

In CSInputStream.readInternal I call:

synchronized (base) {
   base.seek(fileOffset + getFilePointer());
   base.readBytes(b, offset, len);
}

Calling base.seek does nothing more than setting the file pointer
(bufferStart + bufferPosition) of base correctly.

base.readBytes(b, offset, len) in this case does not use the buffer of
base (at least in most cases). Look into InputStream.readBytes.
If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is
used directly.

I think synchronized in our case does not much more than synchronizing
on the actual file in FSInputStream.readInternal.

Christoph



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: CompoundFileReader

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Dear Christoph,

Sounds like an excellent enhancement. From a quick look, it appears that 
you are right and everything should work just fine but use less memory. 
One question: have you tried the other test cases also or just the 
TestCompoundFile. There are quite a few conditions that TestCompoundFile 
does not cover.

At first I thought that the synchronization around readBytes would cause 
too much performance degradation when a lot of concurrent queries were 
executing. But after I looked at it some more, I convinced myself that 
it should be ok. Have you ran any multi-threaded tests / benchmarks? I 
think it might also be a good idea before making this change.

Christoph, do you think it is possible to just call readInternal on the 
base stream instead of the readBytes? The main difference is that we 
would bypass the buffering in the base stream. I think the buffering 
done by the superclass of the CSInputStream would be quite enough (which 
is your point to begin with, right)? Perhaps it would be worthwhile to 
make InputStream.readInternal() public instead of protected?

Thanks.
Dmitry.


Christoph Goller wrote:

> Dear Dmitry,
>
> I finally found time to look into your compound file implementation and
> to try it out. It works and I like it. However, I wondered, why you are
> using clones of the base input stream in 
> CompoundFileReader.CSInputStream.
> The "problem" (actually it is not a real problem) I see is that you are
> double buffering your input. Note that the clone has a buffer of 1024 
> bytes and
> that CompoundFileReader.CSInputStream, which extends InputStream, also 
> has
> such a buffer. I think in the way you implemented it your input gets 
> copied
> into the base buffer and from there into the 
> CompoundFileReader.CSInputStream
> buffer before it is actually used. Please have a look at my patch. At 
> first
> one might think that my implementation is simpler but less efficient. 
> This
> is not the case. Actually it is even a little bit more efficient. 
> Points to
> think about:
>
> *) stream is private to CompoundFileReader, so nobody else can use it, 
> if not
> explicitly granted access. CompoundFileReader uses stream only in its
> constructor and when calling openFile.
>
> *) Therefore, CompoundFileReader.CSInputStream can share the same 
> instance of
> stream if they take care of synchronization and seek.
>
> *) An InputStream not necessarily has to implement seekInternal if 
> readInternal
> takes care of seeking on the real stream (see FSInputStream).
>
> *) Synchronization of CompoundFileReader.CSInputStream.readInternal on 
> base
> does no harm since it is called only when the real file has to be 
> accessed
> and thi is synchronized anyway. However, this synchronizytion is 
> necessary!
>
> I tested my patch with your TestCompoundFile and everything seems fine.
> I also tried it on one of my indices and searched on this index. It also
> works. My impression from my tests is that my patch is a little bit 
> faster
> than your original version but there seems to be not much difference in
> efficiency.
>
> Christoph
>
>------------------------------------------------------------------------
>
>Index: CompoundFileReader.java
>===================================================================
>RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v
>retrieving revision 1.2
>diff -u -r1.2 CompoundFileReader.java
>--- CompoundFileReader.java	13 Oct 2003 14:18:04 -0000	1.2
>+++ CompoundFileReader.java	16 Oct 2003 18:58:02 -0000
>@@ -240,10 +240,9 @@
>                       final long length)
>         throws IOException
>         {
>-            this.base = (InputStream) base.clone();
>+            this.base = base;
>             this.fileOffset = fileOffset;
>             this.length = length;   // variable in the superclass
>-            seekInternal(0);        // position to the adjusted 0th byte
>         }
> 
>         /** Expert: implements buffer refill.  Reads bytes from the current
>@@ -255,7 +254,10 @@
>         protected void readInternal(byte[] b, int offset, int len)
>         throws IOException
>         {
>-            base.readBytes(b, offset, len);
>+            synchronized (base) {
>+              base.seek(fileOffset + getFilePointer());
>+              base.readBytes(b, offset, len);
>+            }
>         }
> 
>         /** Expert: implements seek.  Sets current position in this file, where
>@@ -263,35 +265,11 @@
>          * @see #readInternal(byte[],int,int)
>          */
>         protected void seekInternal(long pos) throws IOException
>-        {
>-            if (pos > 0 && pos >= length)
>-                throw new IOException("Seek past the end of file");
>-
>-            if (pos < 0)
>-                throw new IOException("Seek to a negative offset");
>-
>-            base.seek(fileOffset + pos);
>-        }
>+        {}
> 
>         /** Closes the stream to futher operations. */
>         public void close() throws IOException
>-        {
>-            base.close();
>-        }
>+        {}
> 
>-        /** Returns a clone of this stream.
>-         *
>-         * <p>Clones of a stream access the same data, and are positioned at the same
>-         * point as the stream they were cloned from.
>-         *
>-         * <p>Expert: Subclasses must ensure that clones may be positioned at
>-         * different points in the input from each other and from the stream they
>-         * were cloned from.
>-         */
>-        public Object clone() {
>-            CSInputStream other = (CSInputStream) super.clone();
>-            other.base = (InputStream) base.clone();
>-            return other;
>-        }
>     }
> }
>
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


CompoundFileReader

Posted by Christoph Goller <go...@detego-software.de>.
Dear Dmitry,

I finally found time to look into your compound file implementation and
to try it out. It works and I like it. However, I wondered, why you are
using clones of the base input stream in CompoundFileReader.CSInputStream.
The "problem" (actually it is not a real problem) I see is that you are
double buffering your input. Note that the clone has a buffer of 1024 bytes and
that CompoundFileReader.CSInputStream, which extends InputStream, also has
such a buffer. I think in the way you implemented it your input gets copied
into the base buffer and from there into the CompoundFileReader.CSInputStream
buffer before it is actually used. Please have a look at my patch. At first
one might think that my implementation is simpler but less efficient. This
is not the case. Actually it is even a little bit more efficient. Points to
think about:

*) stream is private to CompoundFileReader, so nobody else can use it, if not
explicitly granted access. CompoundFileReader uses stream only in its
constructor and when calling openFile.

*) Therefore, CompoundFileReader.CSInputStream can share the same instance of
stream if they take care of synchronization and seek.

*) An InputStream not necessarily has to implement seekInternal if readInternal
takes care of seeking on the real stream (see FSInputStream).

*) Synchronization of CompoundFileReader.CSInputStream.readInternal on base
does no harm since it is called only when the real file has to be accessed
and thi is synchronized anyway. However, this synchronizytion is necessary!

I tested my patch with your TestCompoundFile and everything seems fine.
I also tried it on one of my indices and searched on this index. It also
works. My impression from my tests is that my patch is a little bit faster
than your original version but there seems to be not much difference in
efficiency.

Christoph

Re: Testcase failure on OSX

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Hani Suleiman wrote:

> HFS is the standard FS on all OSX installs. You can find maybe 0.01% 
> of users who are unix savvy enough to use UFS, but everyone else will 
> be on HFS, with all its limitations. The same problem happens on linux 
> too (but more intermittently). 

I think this problem will also become more frequent as hardware gets 
faster but FS timestamp resolution remains the same.

- Dmitry.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
HFS is the standard FS on all OSX installs. You can find maybe 0.01% of 
users who are unix savvy enough to use UFS, but everyone else will be 
on HFS, with all its limitations. The same problem happens on linux too 
(but more intermittently).

On Wednesday, October 1, 2003, at 08:22 PM, Otis Gospodnetic wrote:

>
> --- Hani Suleiman <ha...@formicary.net> wrote:
>> On Wednesday, October 1, 2003, at 05:12 PM, Dmitry Serebrennikov
>> wrote:
>>
>>> I also think it feels wrong and also cannot find any problematic
>>> cases. Perhaps we can help lower the risk by doing a measurement at
>>
>>> the time FSDirectory is created and setting a value for the minimum
>>
>>> detectable time difference. Then on most systems the code will
>> behave
>>> as before, but on OSX with an old FS (and other things that might
>> come
>>> about) it will have this special behavior.
>
> I have not followed this thread very closely, and since I'm not an OSX
> user, I am not familiar with the particular FS type that has this
> problem.  Is this an old FS?
> Is hacking Lucene to work around this 'bug' in a particular old(?) FS
> the right thing to do, or would it be better to draw the line here and
> say: get a better FS?
> I'm not sure, I don't know anything about that FS, etc., but I'm
> wondering what others think.
>
> Otis
>
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Shopping - with improved product search
> http://shopping.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Otis Gospodnetic <ot...@yahoo.com>.
--- Hani Suleiman <ha...@formicary.net> wrote:
> On Wednesday, October 1, 2003, at 05:12 PM, Dmitry Serebrennikov
> wrote:
> 
> > I also think it feels wrong and also cannot find any problematic 
> > cases. Perhaps we can help lower the risk by doing a measurement at
> 
> > the time FSDirectory is created and setting a value for the minimum
> 
> > detectable time difference. Then on most systems the code will
> behave 
> > as before, but on OSX with an old FS (and other things that might
> come 
> > about) it will have this special behavior.

I have not followed this thread very closely, and since I'm not an OSX
user, I am not familiar with the particular FS type that has this
problem.  Is this an old FS?
Is hacking Lucene to work around this 'bug' in a particular old(?) FS
the right thing to do, or would it be better to draw the line here and
say: get a better FS?
I'm not sure, I don't know anything about that FS, etc., but I'm
wondering what others think.

Otis


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
On Wednesday, October 1, 2003, at 05:12 PM, Dmitry Serebrennikov wrote:

> I also think it feels wrong and also cannot find any problematic 
> cases. Perhaps we can help lower the risk by doing a measurement at 
> the time FSDirectory is created and setting a value for the minimum 
> detectable time difference. Then on most systems the code will behave 
> as before, but on OSX with an old FS (and other things that might come 
> about) it will have this special behavior.
>
Well you still end up with an operation (creating an FSDirectory) 
taking a second to complete, which IMHO is far too long.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Hani Suleiman wrote:

>It feels wrong, but I can't figure out any negative ramifications.
>
>There are two cases, either the file is modified in the past, in which
>case the current time should be used, or it's modified in the future, so
>the current mod time + X should be used.
>
>Basically you can cover all bases by just incrementing the higher value of
>current time/mod time.
>
I also think it feels wrong and also cannot find any problematic cases. 
Perhaps we can help lower the risk by doing a measurement at the time 
FSDirectory is created and setting a value for the minimum detectable 
time difference. Then on most systems the code will behave as before, 
but on OSX with an old FS (and other things that might come about) it 
will have this special behavior.

Dmitry.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
It feels wrong, but I can't figure out any negative ramifications.

There are two cases, either the file is modified in the past, in which
case the current time should be used, or it's modified in the future, so
the current mod time + X should be used.

Basically you can cover all bases by just incrementing the higher value of
current time/mod time.

Doug Cutting said:
> Maybe I'm being too conservative.  Can anyone imagine bad things that
> could happen if a 'segments' file's last modified date is set to
> something in the future?  Typically this is compared with a previous
> value of its last modified date, not to the current time.  I can't think
>  of why one would ever compare this to the current time.  So, unless
> someone can come up with a pitfall, I remove my objection to this
> solution.
>
> Doug
>
> Hani Suleiman wrote:
>> Well, the downside to doing that is that sleeping for 1 second really
>> is an awfully (unacceptably?) long time.
>>
>> Doug Cutting said:
>>
>>>Hani Suleiman wrote:
>>>
>>>>Still doesn't 'feel' right though.
>>>
>>>I agree.  It makes it no longer a date field, but a version field.
>>> Any application which uses the value as a date may fail.
>>>
>>>I think a better approach might be to define a sleep constant with
>>> different values on different OSes.  It could be 1ms by default, and
>>> 1000 on OSX.  It would be better yet if the sleep time could be
>>>determined based on the filesystem, and only set to 1000 for indexes
>>> that reside on HFS systems, but that may or may not be easy.
>>>
>>>Doug
>>>
>>>
>>>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For
>> additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Doug Cutting <cu...@lucene.com>.
Maybe I'm being too conservative.  Can anyone imagine bad things that 
could happen if a 'segments' file's last modified date is set to 
something in the future?  Typically this is compared with a previous 
value of its last modified date, not to the current time.  I can't think 
of why one would ever compare this to the current time.  So, unless 
someone can come up with a pitfall, I remove my objection to this solution.

Doug

Hani Suleiman wrote:
> Well, the downside to doing that is that sleeping for 1 second really is
> an awfully (unacceptably?) long time.
> 
> Doug Cutting said:
> 
>>Hani Suleiman wrote:
>>
>>>Still doesn't 'feel' right though.
>>
>>I agree.  It makes it no longer a date field, but a version field.  Any
>>application which uses the value as a date may fail.
>>
>>I think a better approach might be to define a sleep constant with
>>different values on different OSes.  It could be 1ms by default, and
>>1000 on OSX.  It would be better yet if the sleep time could be
>>determined based on the filesystem, and only set to 1000 for indexes
>>that reside on HFS systems, but that may or may not be easy.
>>
>>Doug
>>
>>
>>--------------------------------------------------------------------- To
>>unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
Well, the downside to doing that is that sleeping for 1 second really is
an awfully (unacceptably?) long time.

Doug Cutting said:
> Hani Suleiman wrote:
>> Still doesn't 'feel' right though.
>
> I agree.  It makes it no longer a date field, but a version field.  Any
> application which uses the value as a date may fail.
>
> I think a better approach might be to define a sleep constant with
> different values on different OSes.  It could be 1ms by default, and
> 1000 on OSX.  It would be better yet if the sleep time could be
> determined based on the filesystem, and only set to 1000 for indexes
> that reside on HFS systems, but that may or may not be easy.
>
> Doug
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Doug Cutting <cu...@lucene.com>.
Hani Suleiman wrote:
> Still doesn't 'feel' right though.

I agree.  It makes it no longer a date field, but a version field.  Any 
application which uses the value as a date may fail.

I think a better approach might be to define a sleep constant with 
different values on different OSes.  It could be 1ms by default, and 
1000 on OSX.  It would be better yet if the sleep time could be 
determined based on the filesystem, and only set to 1000 for indexes 
that reside on HFS systems, but that may or may not be easy.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
Ok, changing the touchFile method in FSDirectory to the below makes all 
testcases pass (and to boot, is noticeably faster from the previous 
solution)

     File file = new File(directory, name);
     long oldModified = file.lastModified();
     long now = System.currentTimeMillis();
     long latest = oldModified > now ? oldModified : now;
     file.setLastModified(latest + 1500);

Still doesn't 'feel' right though.

On Tuesday, September 30, 2003, at 07:09 PM, Steve Rowe wrote:

> Why not just add 1000ms to the current time when touching the files 
> (instead of waiting for a second to pass).  I tried in on my Linux box 
> (e2fs) via the java File.setLastModified() method, and it allowed the 
> file to have a timestamp in the future.
>
> Wouldn't this be a much smaller effort?
>
> Steve
>
> Hani Suleiman wrote:
>
>> Aha, yeah, System.currentTimeMillis would definitely be of much 
>> higher resolution.
>> I know the problem also intermittently affects linux, as I've seen it 
>> happen on there too (but a lot more rarely).
>> I think it's a feature of the antiquated HFS mac filesystem that OSX 
>> yses by default, rather than the OS itself. Adding a long timestamp 
>> to the segments file would fix this everywhere I think, since the 
>> worst resolution timers out there are in the region of 10ms, which 
>> isn't so bad.
>> I can work on that (pretty big, I suspect) patch if everyone else is 
>> too busy, since I'm quite looking forward to all tests passing on all 
>> platforms. I can verify on windows, linux, solaris, and OSX. Of 
>> course it'd be better if someone a bit more experienced with the 
>> codebase were to handle it ;)
>> On Tuesday, September 30, 2003, at 06:37 PM, Dmitry Serebrennikov 
>> wrote:
>>> Hani Suleiman wrote:
>>>
>>>> The count for FSDirectory on OSX is around 700-800. The problem is 
>>>> that  the filesystem timestamps on OSX has abysmal resolution 
>>>> (nothing finer  than 1 whole second!)
>>>
>>>
>>> Wow! Write once, debug everywhere...
>>> Is this an OSX "feature" or is this a feature of a particular disk / 
>>> filesystem / driver / whatnot that is in use in the current run?
>>> If this is unavoidable on OSX, perhaps other platforms are also 
>>> affected? Does anyone know the timestamp resolution on Wintel, 
>>> Lintel, Solaris, etc?
>>>
>>> Finally, what if we allocate a "long" in the beginning of the files 
>>> that need to be touched (like the segments and other files) and use 
>>> it to store the full System.currentTimeMillis timestamp? Does this 
>>> method return a more granular number on OSX? This would be a big 
>>> change, but it would work the same on all platforms and will not 
>>> have a slowdown.
>>>
>>>>
>>>>
>>>> So the touch method will spin its wheels for anything up to a 
>>>> second  before we've registered an actual 'touch'.
>>>>
>>>> This isn't so bad as files aren't touch that often, but it's a lot 
>>>> more  problematic in the case of SegmentInfos.write(), where a new 
>>>> file is  created, then copied over the old one. The  
>>>> testDeleteReaderWriteConflict testcase fails due to this. The old 
>>>> file  and the new have the same timestamp, and so the staleness 
>>>> check fails.  How should this be fixed?
>>>>
>>>> Also, won't the fix for FSDirectory mean there's a window where 
>>>> other  threads could end up with bad info, if we're taking up to a 
>>>> second to  touch a file?
>>>
>>>
>>> I don't know off hand, but I believe there is file locking in place 
>>> to prevent other threads from getting into this code until the 
>>> update operation is completed, and it will not be completed until 
>>> the touch finishes. This still sucks with a 1-second blocking...
>>>
>>> Dmitry.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
> -- 
> Steve Rowe
> Software Engineer
> Center for Natural Language Processing
> School of Information Studies
> Syracuse University
> www.cnlp.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Steve Rowe <sa...@gwmail.syr.edu>.
Why not just add 1000ms to the current time when touching the files 
(instead of waiting for a second to pass).  I tried in on my Linux box 
(e2fs) via the java File.setLastModified() method, and it allowed the 
file to have a timestamp in the future.

Wouldn't this be a much smaller effort?

Steve

Hani Suleiman wrote:

> Aha, yeah, System.currentTimeMillis would definitely be of much higher 
> resolution.
> 
> I know the problem also intermittently affects linux, as I've seen it 
> happen on there too (but a lot more rarely).
> 
> I think it's a feature of the antiquated HFS mac filesystem that OSX 
> yses by default, rather than the OS itself. Adding a long timestamp to 
> the segments file would fix this everywhere I think, since the worst 
> resolution timers out there are in the region of 10ms, which isn't so bad.
> 
> I can work on that (pretty big, I suspect) patch if everyone else is too 
> busy, since I'm quite looking forward to all tests passing on all 
> platforms. I can verify on windows, linux, solaris, and OSX. Of course 
> it'd be better if someone a bit more experienced with the codebase were 
> to handle it ;)
> 
> On Tuesday, September 30, 2003, at 06:37 PM, Dmitry Serebrennikov wrote:
> 
>> Hani Suleiman wrote:
>>
>>> The count for FSDirectory on OSX is around 700-800. The problem is 
>>> that  the filesystem timestamps on OSX has abysmal resolution 
>>> (nothing finer  than 1 whole second!)
>>
>>
>> Wow! Write once, debug everywhere...
>> Is this an OSX "feature" or is this a feature of a particular disk / 
>> filesystem / driver / whatnot that is in use in the current run?
>> If this is unavoidable on OSX, perhaps other platforms are also 
>> affected? Does anyone know the timestamp resolution on Wintel, Lintel, 
>> Solaris, etc?
>>
>> Finally, what if we allocate a "long" in the beginning of the files 
>> that need to be touched (like the segments and other files) and use it 
>> to store the full System.currentTimeMillis timestamp? Does this method 
>> return a more granular number on OSX? This would be a big change, but 
>> it would work the same on all platforms and will not have a slowdown.
>>
>>>
>>>
>>> So the touch method will spin its wheels for anything up to a second  
>>> before we've registered an actual 'touch'.
>>>
>>> This isn't so bad as files aren't touch that often, but it's a lot 
>>> more  problematic in the case of SegmentInfos.write(), where a new 
>>> file is  created, then copied over the old one. The  
>>> testDeleteReaderWriteConflict testcase fails due to this. The old 
>>> file  and the new have the same timestamp, and so the staleness check 
>>> fails.  How should this be fixed?
>>>
>>> Also, won't the fix for FSDirectory mean there's a window where other 
>>>  threads could end up with bad info, if we're taking up to a second 
>>> to  touch a file?
>>
>>
>> I don't know off hand, but I believe there is file locking in place to 
>> prevent other threads from getting into this code until the update 
>> operation is completed, and it will not be completed until the touch 
>> finishes. This still sucks with a 1-second blocking...
>>
>> Dmitry.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 

-- 
Steve Rowe
Software Engineer
Center for Natural Language Processing
School of Information Studies
Syracuse University
www.cnlp.org


Re: Testcase failure on OSX

Posted by Steve Rowe <sa...@gwmail.syr.edu>.
Why not just add 1000ms to the current time when touching the files 
(instead of waiting for a second to pass).  I tried in on my Linux box 
(e2fs) via the java File.setLastModified() method, and it allowed the 
file to have a timestamp in the future.

Wouldn't this be a much smaller effort?

Steve

Hani Suleiman wrote:

> Aha, yeah, System.currentTimeMillis would definitely be of much higher 
> resolution.
> 
> I know the problem also intermittently affects linux, as I've seen it 
> happen on there too (but a lot more rarely).
> 
> I think it's a feature of the antiquated HFS mac filesystem that OSX 
> yses by default, rather than the OS itself. Adding a long timestamp to 
> the segments file would fix this everywhere I think, since the worst 
> resolution timers out there are in the region of 10ms, which isn't so bad.
> 
> I can work on that (pretty big, I suspect) patch if everyone else is too 
> busy, since I'm quite looking forward to all tests passing on all 
> platforms. I can verify on windows, linux, solaris, and OSX. Of course 
> it'd be better if someone a bit more experienced with the codebase were 
> to handle it ;)
> 
> On Tuesday, September 30, 2003, at 06:37 PM, Dmitry Serebrennikov wrote:
> 
>> Hani Suleiman wrote:
>>
>>> The count for FSDirectory on OSX is around 700-800. The problem is 
>>> that  the filesystem timestamps on OSX has abysmal resolution 
>>> (nothing finer  than 1 whole second!)
>>
>>
>> Wow! Write once, debug everywhere...
>> Is this an OSX "feature" or is this a feature of a particular disk / 
>> filesystem / driver / whatnot that is in use in the current run?
>> If this is unavoidable on OSX, perhaps other platforms are also 
>> affected? Does anyone know the timestamp resolution on Wintel, Lintel, 
>> Solaris, etc?
>>
>> Finally, what if we allocate a "long" in the beginning of the files 
>> that need to be touched (like the segments and other files) and use it 
>> to store the full System.currentTimeMillis timestamp? Does this method 
>> return a more granular number on OSX? This would be a big change, but 
>> it would work the same on all platforms and will not have a slowdown.
>>
>>>
>>>
>>> So the touch method will spin its wheels for anything up to a second  
>>> before we've registered an actual 'touch'.
>>>
>>> This isn't so bad as files aren't touch that often, but it's a lot 
>>> more  problematic in the case of SegmentInfos.write(), where a new 
>>> file is  created, then copied over the old one. The  
>>> testDeleteReaderWriteConflict testcase fails due to this. The old 
>>> file  and the new have the same timestamp, and so the staleness check 
>>> fails.  How should this be fixed?
>>>
>>> Also, won't the fix for FSDirectory mean there's a window where other 
>>>  threads could end up with bad info, if we're taking up to a second 
>>> to  touch a file?
>>
>>
>> I don't know off hand, but I believe there is file locking in place to 
>> prevent other threads from getting into this code until the update 
>> operation is completed, and it will not be completed until the touch 
>> finishes. This still sucks with a 1-second blocking...
>>
>> Dmitry.
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 

-- 
Steve Rowe
Software Engineer
Center for Natural Language Processing
School of Information Studies
Syracuse University
www.cnlp.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
Aha, yeah, System.currentTimeMillis would definitely be of much higher 
resolution.

I know the problem also intermittently affects linux, as I've seen it 
happen on there too (but a lot more rarely).

I think it's a feature of the antiquated HFS mac filesystem that OSX 
yses by default, rather than the OS itself. Adding a long timestamp to 
the segments file would fix this everywhere I think, since the worst 
resolution timers out there are in the region of 10ms, which isn't so 
bad.

I can work on that (pretty big, I suspect) patch if everyone else is 
too busy, since I'm quite looking forward to all tests passing on all 
platforms. I can verify on windows, linux, solaris, and OSX. Of course 
it'd be better if someone a bit more experienced with the codebase were 
to handle it ;)

On Tuesday, September 30, 2003, at 06:37 PM, Dmitry Serebrennikov wrote:

> Hani Suleiman wrote:
>
>> The count for FSDirectory on OSX is around 700-800. The problem is 
>> that  the filesystem timestamps on OSX has abysmal resolution 
>> (nothing finer  than 1 whole second!)
>
> Wow! Write once, debug everywhere...
> Is this an OSX "feature" or is this a feature of a particular disk / 
> filesystem / driver / whatnot that is in use in the current run?
> If this is unavoidable on OSX, perhaps other platforms are also 
> affected? Does anyone know the timestamp resolution on Wintel, Lintel, 
> Solaris, etc?
>
> Finally, what if we allocate a "long" in the beginning of the files 
> that need to be touched (like the segments and other files) and use it 
> to store the full System.currentTimeMillis timestamp? Does this method 
> return a more granular number on OSX? This would be a big change, but 
> it would work the same on all platforms and will not have a slowdown.
>
>>
>>
>> So the touch method will spin its wheels for anything up to a second  
>> before we've registered an actual 'touch'.
>>
>> This isn't so bad as files aren't touch that often, but it's a lot 
>> more  problematic in the case of SegmentInfos.write(), where a new 
>> file is  created, then copied over the old one. The  
>> testDeleteReaderWriteConflict testcase fails due to this. The old 
>> file  and the new have the same timestamp, and so the staleness check 
>> fails.  How should this be fixed?
>>
>> Also, won't the fix for FSDirectory mean there's a window where other 
>>  threads could end up with bad info, if we're taking up to a second 
>> to  touch a file?
>
> I don't know off hand, but I believe there is file locking in place to 
> prevent other threads from getting into this code until the update 
> operation is completed, and it will not be completed until the touch 
> finishes. This still sucks with a 1-second blocking...
>
> Dmitry.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


Re: Testcase failure on OSX

Posted by Hani Suleiman <ha...@formicary.net>.
Aha, yeah, System.currentTimeMillis would definitely be of much higher 
resolution.

I know the problem also intermittently affects linux, as I've seen it 
happen on there too (but a lot more rarely).

I think it's a feature of the antiquated HFS mac filesystem that OSX 
yses by default, rather than the OS itself. Adding a long timestamp to 
the segments file would fix this everywhere I think, since the worst 
resolution timers out there are in the region of 10ms, which isn't so 
bad.

I can work on that (pretty big, I suspect) patch if everyone else is 
too busy, since I'm quite looking forward to all tests passing on all 
platforms. I can verify on windows, linux, solaris, and OSX. Of course 
it'd be better if someone a bit more experienced with the codebase were 
to handle it ;)

On Tuesday, September 30, 2003, at 06:37 PM, Dmitry Serebrennikov wrote:

> Hani Suleiman wrote:
>
>> The count for FSDirectory on OSX is around 700-800. The problem is 
>> that  the filesystem timestamps on OSX has abysmal resolution 
>> (nothing finer  than 1 whole second!)
>
> Wow! Write once, debug everywhere...
> Is this an OSX "feature" or is this a feature of a particular disk / 
> filesystem / driver / whatnot that is in use in the current run?
> If this is unavoidable on OSX, perhaps other platforms are also 
> affected? Does anyone know the timestamp resolution on Wintel, Lintel, 
> Solaris, etc?
>
> Finally, what if we allocate a "long" in the beginning of the files 
> that need to be touched (like the segments and other files) and use it 
> to store the full System.currentTimeMillis timestamp? Does this method 
> return a more granular number on OSX? This would be a big change, but 
> it would work the same on all platforms and will not have a slowdown.
>
>>
>>
>> So the touch method will spin its wheels for anything up to a second  
>> before we've registered an actual 'touch'.
>>
>> This isn't so bad as files aren't touch that often, but it's a lot 
>> more  problematic in the case of SegmentInfos.write(), where a new 
>> file is  created, then copied over the old one. The  
>> testDeleteReaderWriteConflict testcase fails due to this. The old 
>> file  and the new have the same timestamp, and so the staleness check 
>> fails.  How should this be fixed?
>>
>> Also, won't the fix for FSDirectory mean there's a window where other 
>>  threads could end up with bad info, if we're taking up to a second 
>> to  touch a file?
>
> I don't know off hand, but I believe there is file locking in place to 
> prevent other threads from getting into this code until the update 
> operation is completed, and it will not be completed until the touch 
> finishes. This still sucks with a 1-second blocking...
>
> Dmitry.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org