You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2010/01/15 05:22:32 UTC

Re: [KinoSearch] Release strategies

On Thu, Jan 14, 2010 at 10:20:34AM -0600, Peter Karman wrote:
> Are the file format problems actually bugs in the current format, or features 
> you would like to see added?

Both.  With regards to features, there are two:

  * Make term dictionaries pluggable to work with non-text field types.
  * Make posting formats able to work with multiple streams.

We can probably handle those without hard compatibility breaks; it will just
be more of a pain. 

Then there's one long standing file format bug: 

  * Skipping on SegPostingList is disabled because the skip files are broken.

I'd like to get that one taken care of before we make a non-dev release, as it
will help performance for a variety of queries.

> CPAN's versioning is not ideal in that regard.

I'll be less kind.  It's grievously flawed.

But these problems are also just hard to avoid by nature when dealing with
dynamic dependencies and global namespaces.  We have the same problem with C.

> However, there are already checks in Build.PL for incompatible index formats, 

If we release into a new "KinoSearch2" namespace, those checks can be
discarded.  They're pretty unfriendly to the CPAN toolchain -- you can get
yourself on a CPAN Testers blacklist by hanging on manual user input.

> Rebuilding an index is not the end of the world. We (and by we I mean search 
> developers) do it all the time, even with big doc corpora.

If you have a lot of fast moving indexes, and an expectation of uninterrupted
up-time, it can be difficult to schedule swaps.  That's our situation here at
Eventful.  We could do it, but it would be a major PITA.

Ironically, this pushes in the direction of release, because managing a hard
compat break would probably cost us more than it costs us to have me write
bridge code.

> Small, stable, incremental and frequent releases to CPAN. I've been converted to 
> that idea.

I've also seen the benefits of date-driven release schedules, for instance as
now practiced by the Perl 5 Porters.  Jesse Vincent has managed to get a bunch
of good devs more highly involved by separating the roles of release mananger
and pumpking.

In the abstract, I'd like to try that.  It would require changes to my personal
development routines, and Git is better suited to it than Subversion, but I
think we could make it work.  

> What about #3: stabilize svn trunk and release it as KS 0.30.
> 
> When there's another index compat change, release it as KS 0.40, etc.

I don't want to screw over the MojoMojo people, the Socialtext people, etc. By
releasing into a new namespace we give our users a lot more options for
transitioning.

I actually think we might be able to go a long way without hard compat breaks
in the file format.  Maybe even from here on out.  Now that all metadata is in
JSON, the sort cache issue is solved, and we have a provisional implementation
for pluggable index components, it should be a lot easier to keep compat
across a transitional release at least, and usually longer.  

> How can I help move toward a KS 0.30 release?

I'll draw up a todo list.

Marvin Humphrey


Re: Release strategies

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 01/21/2010 11:01 PM:

> So that's one item checked off the TODO list.
> 

\o/


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [KinoSearch] Release strategies

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Jan 14, 2010 at 08:22:32PM -0800, Marvin Humphrey wrote:

> Then there's one long standing file format bug: 
> 
>   * Skipping on SegPostingList is disabled because the skip files are broken.
> 
> I'd like to get that one taken care of before we make a non-dev release, as it
> will help performance for a variety of queries.

I believe I have finally squashed this bug with KS commit r5729, which
re-enables the skipping optimization on SegPostingList.  It turns out to have
been a search-time problem in SegPostingList after all, rather than something
hiding in PostingsWriter's main loop -- but that didn't become clear until
after I'd cleaned out most of the
PostingsWriter/PostingPool/PostingPoolQueue/SortExternal/SortExRun rat's nest.

So that's one item checked off the TODO list.

Marvin Humphrey


Re: Release strategies

Posted by Peter Karman <pe...@peknet.com>.
Marvin Humphrey wrote on 1/22/10 6:32 PM:
> On Thu, Jan 14, 2010 at 08:22:32PM -0800, Marvin Humphrey wrote:
>>> How can I help move toward a KS 0.30 release?
>> I'll draw up a todo list.
> 
> I've updated the TODO list on the KinoSearch wiki:
> 
>   http://www.rectangular.com/kinosearch/wiki/ToDoList
> 
> To summarize, I think we need to move in three stages:

I'm still not sure that it's really necessary to fork a KinoSearch2, and I have 
some worry that it might confuse more than help. But I'll defer to you, Marvin, 
on this since this is your baby.

Thanks for responding to my original plea in such a gracious way.

-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Re: [KinoSearch] Release strategies

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Jan 14, 2010 at 08:22:32PM -0800, Marvin Humphrey wrote:
> > How can I help move toward a KS 0.30 release?
> 
> I'll draw up a todo list.

I've updated the TODO list on the KinoSearch wiki:

  http://www.rectangular.com/kinosearch/wiki/ToDoList

To summarize, I think we need to move in three stages:

First, we need a KinoSearch 0.30_08 release, which must be index-compatible
with 0.30_072.  There has been a lot of churn in the code base since 0.30_07
came out, and it would be nice to give our users the option of downgrading
without needing to reindex if 0.30_08 turns out to have problems.

Next, we need a KinoSearch 0.30_09 release, where we make some changes to the
index format which earlier releases will not be able to read.

After that, we can release KinoSearch2.  The main thing I think we should do
in between KinoSearch 0.30_09 and KinoSearch2 is finish the class
reorganization we discussed a little while back (Moving Schema, FieldType
under KinoSearch2::Plan, Searcher to KinoSearch2::Search::IndexSearcher, etc).

One thing that is omitted from the TODO list for KinoSearch2 is opening up the
APIs for classes under Store -- InStream, OutStream, FileHandle, DirHandle --
which have been substantially improved since 0.30_07 came out and are close to
ready for public use.  I had originally planned to make them available for
0.30_08, but they shouldn't block.

Marvin Humphrey