You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2010/03/27 02:55:09 UTC

Re: [Lucy] Snapshot: "segments" key

Marvin Humphrey wrote on 3/26/10 4:04 PM:

> After the change, it would look like this:
> 
>     {
>        "entries" : [ 
>           "schema_3.json",
>        ],
>        "segments" : [
>           "seg_3"
>        ],  
>        "format" : "1" 
>     }
> 

sounds good.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [Lucy] Snapshot: "segments" key

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Mar 27, 2010 at 06:21:19PM -0700, Marvin Humphrey wrote:
> On Sat, Mar 27, 2010 at 01:47:40PM -0700, Marvin Humphrey wrote:
> > We should probably make a 0.30_091 bugfix release.
> 
> Screw it, let's just move forward.  I'm just going to move ahead with the file
> format changes.  They shouldn't take me more than a few days, max, and then we
> can release 0.30_10 and 0.31.

I'm done with the non-forward-compatible file format changes and I'm going to
push out 0.30_10.

Marvin Humphrey


Re: [Lucy] Snapshot: "segments" key

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Mar 27, 2010 at 01:47:40PM -0700, Marvin Humphrey wrote:
> We should probably make a 0.30_091 bugfix release.

Screw it, let's just move forward.  I'm just going to move ahead with the file
format changes.  They shouldn't take me more than a few days, max, and then we
can release 0.30_10 and 0.31.

Marvin Humphrey


Re: [Lucy] Portability of KS 0.30_09 to various Unixen

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Mar 28, 2010 at 10:10:56PM -0500, Peter Karman wrote:
> I can help in that way though. I'll try and set up some VMs for FreeBSD 8.0 and
> OpenSolaris to make it easier to test things.

That would be great.  :)

The non-forwards-compatible file format changes are going smoothly... I've got
three out of four done. 

There were originally some API changes I was thinking would go into 0.30_10,
but since people won't be able to downgrade without reindexing, I think it
might make sense to do a minimal release.

Marvin Humphrey


Re: [Lucy] Portability of KS 0.30_09 to various Unixen

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 3/28/10 10:31 AM:
> However, I'm trying to keep a lid on the amount of time I
> spend chasing portability issues.  The actual work on KS isn't hard or
> time-consuming, but setting up all those environments is a PITA.

sounds right to me that you would limit your time this way. Sysadmin-y stuff is
a PITA.

I can help in that way though. I'll try and set up some VMs for FreeBSD 8.0 and
OpenSolaris to make it easier to test things.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Portability of KS 0.30_09 to various Unixen

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Mar 27, 2010 at 09:31:25PM -0500, Peter Karman wrote:
> Marvin Humphrey wrote on 3/27/10 6:54 PM:
> 
> > Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
> > here:
> > 
> >     http://www.cpantesters.org/cpan/report/7013414
> > 
> > It looks like a memory error somewhere in Charmonizer, but I can't reveal it
> > by running "CHARM_VALGRIND=1 ./Build charmony" on OS X the way I could other
> > charmonizer memory errors that were plaguing FreeBSD 8.x.
> 
> that error appears in several of the KS failures. for one:
> 
> http://www.cpantesters.org/cpan/report/6915815

Cygwin.  

I've put in my Windows portability time getting MSVC set up.  If we can
compile and pass tests under MSVC, we're 90% of the way to compiling under
Cygwin.  But it's too much effort for me to set up a Cygwin environment for
that last 10%.  Same with Strawberry Perl.  

Hopefully if we solve the problem that's troubling FreeBSD 8.x etc. while
still keeping MSVC happy, Cygwin will suddenly start passing.  That's how
things have worked before.

> notice too the solaris hate:
> http://matrix.cpantesters.org/?dist=KinoSearch+0.30_083

Yeah.  Amazon EC2 offers OpenSolaris, so I suppose I could try opening up an
instance there.  However, I'm trying to keep a lid on the amount of time I
spend chasing portability issues.  The actual work on KS isn't hard or
time-consuming, but setting up all those environments is a PITA.  There's a
big return on MSVC, so that one I think is worthwhile for me to maintain.  For
the others, it's diminishing returns.

FWIW, KinoSearch1 doesn't work on Solaris 2.9 because of memory alignment
issues.  Solaris 2.10+ is more tolerant, and those alignment issues no longer
exist in KS, but it wouldn't surprise me if Solaris 2.9 still doesn't work, as
I don't know how well it handles C99.

Marvin Humphrey


Re: [Lucy] Snapshot: "segments" key

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 3/27/10 6:54 PM:

> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
> here:
> 
>     http://www.cpantesters.org/cpan/report/7013414
> 
> It looks like a memory error somewhere in Charmonizer, but I can't reveal it
> by running "CHARM_VALGRIND=1 ./Build charmony" on OS X the way I could other
> charmonizer memory errors that were plaguing FreeBSD 8.x.

that error appears in several of the KS failures. for one:

http://www.cpantesters.org/cpan/report/6915815

notice too the solaris hate:
http://matrix.cpantesters.org/?dist=KinoSearch+0.30_083


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [Lucy] FreeBSD builds

Posted by Peter Karman <pe...@peknet.com>.
Peter Karman wrote on 4/2/10 9:36 PM:
> Marvin Humphrey wrote on 3/30/10 2:58 PM:
> 
>>> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
>>> here:
>>>
>>>     http://www.cpantesters.org/cpan/report/7013414
>>
>> Still broken, same problem:
>>
>>    http://www.cpantesters.org/cpan/report/7037224
>>
> 
> I just did a fresh install of FreeBSD 8.0 stable on a local VM, and tested both
> KS 0.30_10 from CPAN and from svn, against the following:
> 
>  * locally compiled Perl 5.8.9
>  * locally compiled Perl 5.10.1
>  * perl 5.8.9_3 from ports
> 
> All built fine and passed all tests.
> 
> Which leads me to believe that there is something wonky about that cpantesters
> environment.
> 

I should add that this was a i386 FreeBSD on a VirtualBox vm running on OSX 10.6
Intel 64-bit arch. It looks like that test referenced about was running the same
dist, but had these compiler flags different than mine:

 -O2 -mtune=athlon64 -pipe

whereas I simply had:

 -O

I tried running with the same compiler flags and it still worked for me.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [Lucy] FreeBSD builds

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 3/30/10 2:58 PM:

>> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
>> here:
>>
>>     http://www.cpantesters.org/cpan/report/7013414
> 
> Still broken, same problem:
> 
>    http://www.cpantesters.org/cpan/report/7037224
> 

I just did a fresh install of FreeBSD 8.0 stable on a local VM, and tested both
KS 0.30_10 from CPAN and from svn, against the following:

 * locally compiled Perl 5.8.9
 * locally compiled Perl 5.10.1
 * perl 5.8.9_3 from ports

All built fine and passed all tests.

Which leads me to believe that there is something wonky about that cpantesters
environment.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [Lucy] FreeBSD builds

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sun, Apr 04, 2010 at 11:01:30PM -0500, Peter Karman wrote:
> >> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
> >> here:
> >>
> >>     http://www.cpantesters.org/cpan/report/7013414
> > 
> > Still broken, same problem:
> > 
> >    http://www.cpantesters.org/cpan/report/7037224
> 
> this should be fixed in svn trunk now as of r6046. The problem was not
> FreeBSD-specific, but instead the case where $Config{'cc'} value had whitespace,
> as with 'ccache cc'. Along the way I also fixed the case where 'cc' could not be
> overridden per the Module::Build docs, so now this works:
> 
>  perl Build.PL --config cc=something

Super!

Marvin Humphrey


Re: [Lucy] FreeBSD builds

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 3/30/10 2:58 PM:

>> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
>> here:
>>
>>     http://www.cpantesters.org/cpan/report/7013414
> 
> Still broken, same problem:
> 
>    http://www.cpantesters.org/cpan/report/7037224

this should be fixed in svn trunk now as of r6046. The problem was not
FreeBSD-specific, but instead the case where $Config{'cc'} value had whitespace,
as with 'ccache cc'. Along the way I also fixed the case where 'cc' could not be
overridden per the Module::Build docs, so now this works:

 perl Build.PL --config cc=something


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

FreeBSD builds

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Mar 27, 2010 at 04:54:22PM -0700, Marvin Humphrey wrote:
> On Sat, Mar 27, 2010 at 01:47:40PM -0700, Marvin Humphrey wrote:
> > I'll probably get the alloca() problem sorted with a Charmonizer patch later
> > today.
> 
> Done.  That should solve this FreeBSD 6.x problem:
> 
>     http://www.cpantesters.org/cpan/report/7014815

Success:

   http://www.cpantesters.org/cpan/report/7041287

> Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
> here:
> 
>     http://www.cpantesters.org/cpan/report/7013414

Still broken, same problem:

   http://www.cpantesters.org/cpan/report/7037224

Marvin Humphrey


Re: [Lucy] Snapshot: "segments" key

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Sat, Mar 27, 2010 at 01:47:40PM -0700, Marvin Humphrey wrote:
> I'll probably get the alloca() problem sorted with a Charmonizer patch later
> today.

Done.  That should solve this FreeBSD 6.x problem:

    http://www.cpantesters.org/cpan/report/7014815

Wish I had access to a FreeBSD 8.x box so I could figure out what went wrong
here:

    http://www.cpantesters.org/cpan/report/7013414

It looks like a memory error somewhere in Charmonizer, but I can't reveal it
by running "CHARM_VALGRIND=1 ./Build charmony" on OS X the way I could other
charmonizer memory errors that were plaguing FreeBSD 8.x.

Marvin Humphrey


Re: [Lucy] Snapshot: "segments" key

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Mar 26, 2010 at 08:55:09PM -0500, Peter Karman wrote:
> > After the change, it would look like this:
> > 
> >     {
> >        "entries" : [ 
> >           "schema_3.json",
> >        ],
> >        "segments" : [
> >           "seg_3"
> >        ],  
> >        "format" : "1" 
> >     }
> > 
> 
> sounds good.

Thanks, and you're right, that would have been a fine solution.  However, I
discovered while trying to implement it that things were a little more
complicated than I'd thought, so I revisited the simplest of the alternates I'd
originally discarded -- and I believe I've arrived at an improvement.  

(It's always worthwhile to spend extra time and effort exploring simple
solutions when it comes to file formats, since file formats impose such a heavy
backwards compatibility burden.)

The new approach is to have directories listed in a snapshot file serve as
stand-ins for their contents -- so when the snapshot is in use, they protect
their contents recursively, and when the snapshot becomes obsolete, their
contents becomes subject to recursive deletion.

I originally thought to implement this by calling Folder_Delete_Tree() on each
entry in the snapshot and ruled it out because it seemed too draconian.  But
Delete_Tree() turns out not to have been the right way to go about things
anyway, and the final implementation is much more sane.

The advantage of this approach is simplicity: Snapshot files continue to be
only a single flat list, as there's no need to add a new "segments" key.  Even
better, snapshot files contain vastly fewer entries after this change, making
them easier to read.

   {
      "entries" : [ 
         "schema_3.json",
         "seg_3"
      ],  
      "format" : "1" 
   }

Yet we still achieve our desired result of avoiding the need to implement
Delete_Segment() for all DataWriters.  Writing plugins just got a little
easier.  :) 

To make this approach safe security-wise, I had to take a couple steps.
First, I had to make sure that we don't follow symlinks when recursing so that
e.g. a maliciously placed symlink to "~" can't wreak havok.  Second, I had to
ensure that we don't follow filepaths upwards out of the index directory, so
that a maliciously crafted snapshot file containing e.g. "../../../" can't do
any harm. 

I actually uncovered an existing minor security vulnerability during this
work: under the current release, a maliciously crafted snapshot file
containing variants on e.g. "../../../../etc/passwd" could cause problems if
all the stars align.  An attacker would first need to supply a malicious
index, which that most KinoSearch users aren't vulnerable since few if any use
untrusted indexes.  Then there would need to be a permissions mistake for an
attacker to do something like bring down a box.  However, if the attacker
guesses correctly on filepaths belonging to a user ("../../mail",
"../../../mail"), they could do some damage.  There's no limit on the number
of entries in a snapshot file, so an attacker would be free to make a very
large number of guesses.

We should probably make a 0.30_091 bugfix release to get that vulnerability
taken care of, as well as the FreeBSD alloca() problem that's showing up in
CPAN testers (solution is to #include <stdlib.h> to get alloca() on that
platform).

The mods I've made since 0.30_09 dropped are all backwards compatible -- even
with the changes to FilePurger implementing the snapshot file semantics
change, we won't break compat until we empty out the Delete_Segment() routines
for all of the DataWriter subcomponents.

I'll probably get the alloca() problem sorted with a Charmonizer patch later
today.

Marvin Humphrey