You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/03/27 13:28:50 UTC

threads

I know this was broached on one of the threads, here:

  http://mail-archives.apache.org/mod_mbox/lucene-lucy-dev/200903.mbox/%3C20090316172948.GA28202@rectangular.com%3E

But I think it merits more discussion.

Are more than one thread allowed inside the Lucy core at once?

Or are we "up-front" expecting one to always use separate processes to
gain concurrency?

Whichever it is, Lucy will need to do something when crossing the
bridge to "mate" to the Host language's thread model.

At some point, as described above, a single search will need to use
concurrency; it seems like Lucy should allow multiple threads into the
core for this reason.

Mike

Re: threads

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, Mar 30, 2009 at 11:55 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> On Mon, Mar 30, 2009 at 06:08:28PM -0400, Michael McCandless wrote:
>
>> >  * The VTable_registry Hash which is used to associate VTables with class
>> >    names is stateful.
>>
>> What is this hash used for?  Does it map name -> VTable?
>
> Yes.  It is used to support dynamic subclassing, and also to support
> deserialization (when the appopriate deserialization function must be selected
> based on the class name).

The deserialization need makes sense, though I guess you can't you
serialize/deserialize dynamic vtables?  (Since as you showed MyScorer
can be different at different times... maybe you can consider making
the VTable_registry write once).

Dynamic subclassing I think just comes down to you do you gain a
reference to the vtable you want to subclass, which you could do by
lookup by name, but I don't yet see why you must do it that way.

>> It seems like most references within core could be done by the
>> compiler/linker?
>
> I think you could say that most references within core rely on referring to
> the VTable structs generated at compile-time by Boilerplater.
>
>                     vvvv
>  if (!OBJ_IS_A(obj, HASH)) {
>      THROW("That's not a Hash, it's a %o", Obj_Get_Class_Name(obj));
>  }

Right, so if you make a dynamic vtable inside the Lucy core, it'd
presumably have a reference to the vtable via the compiler.  EG maybe
I have a "MakeCustomScorer" method, and inside there it's got a
reference to the global Scorer (statically created) vtable that the C
compiler/linker mapped.

>>  And then when host needs to refer to an object inside the core,
>> shouldn't the bindings be exposed however the host would do it (eg as
>> a Python module(s) with bindings), also compiled/linked statically.
>
> I don't think I follow that...  Are you saying that dynamic subclassing would
> never be needed under Python?

No, it would be needed.

I'm saying the Python module wrapping Lucy would expose such bindings.  EG:

  import lucy
  print 'The scorer vtable is %s' % lucy.Scorer

(or perhaps lucy has sub-modules).

And the C code for that python lucy module would have rerenced the C
static global Scorer (just like the above example).  Dynamic lookup by
name isn't necessary because you'd be relying on Python's module/dict
capabilities to do so.

>> > The VTables themselves are stateful because they are refcounted.
>> > Furthermore, dynamically created VTables are reclaimed once the last object
>> > which needs them goes away -- so the "MyScorer" VTable at one point in the
>> > program might not be the same as the "MyScorer" VTable at another point in
>> > the program.
>>
>> This is neat: so you can make a new subclass at runtime, instantiate a
>> bunch of objects off of it, and once all objects are gone, and nobody
>> directly refers to the subclass (vtable).  Can you define new vtables
>> from the host language?
>
> Yes, transparently, in Perl at least.  Standard Perl subclassing techniques
> work.
>
>   package MyObj;
>   use base qw( Lucy::Obj );
>
>   package main;
>
>   my $obj = MyObj->new; # MyObj vtable created via inherited constructor.
>
>   undef $obj;           # Triggers DESTROY. Last ref to dynamic VTable for
>                         # MyObj disappears, so it gets reclaimed.
>
> At least that's how things work now.  To make things thread-safe, we'll create
> the VTable for "MyObj" dynamically, but once created it will never go away.

Once created it never goes away, because we've decided it's fine to
leak them?  (Which I agree with).

Can a vtable gain new methods dynamically?

>> > However, if we stop refcounting VTables (by making Inc_RefCount and
>> > Dec_RefCount no-ops) and accept that dynamically created VTables will leak,
>> > then those concerns go away.  I think this would be a reasonable approach.
>> > The leak ain't gonna matter unless somebody does something psycho like cycle
>> > through lots of unique class names.
>>
>> That seems tentatively OK.  What do you see dynamic vtables being used
>> for in Lucy?
>
> They are required for subclassing in Perl, at least.

Ahh, I see.  When one subclasses in Perl, it will make a corresponding
dynamic vtable in Lucy?  And then when one instantiates a bunch of
objects from such a class in Perl, these have corresponding bunches of
objects (not vtables) in Lucy's core?

>> So then we don't have to worry about individual objects being thread
>> safe as long as we can ensure threads never share objects.
>
> Yes. :)  [Pending resolution of issues around remaining shared globals.]

I like this approach but I fear it may be fragile, ie hitting SEGV
instead of a RuntimeException on messing something small up.  Not
gracefully degrading...

>> It might get weird if the host also doesn't maintain separate worlds,
>> eg the host can think it created object X in Lucy but then if it
>> checks back later and gets the wrong world, object X is gone.
>
> I don't understand how this could happen, so perhaps I'm not following.

Say from Python you startup Lucy, and make 4 worlds (since that's how
much concurrency you want to use), for searching.  Is there a single
Python interpreter (since Python has no trouble letting many threads
in)?  If so, from Python I must be aware of those separate Lucy
worlds... eg if I want to make a new dynamic class, I'd need to notify
all 4 worlds to create the corresponding vtable?

I think the multi-world approach is going to be problematic.

(Vs the single-world-but-threads-rarely-share-objects, which seems
best except for possible fragility).

>> > Then there's search-time, where the multi-world approach isn't always
>> > adequate.  Here, things get tricky because we have to worry about the
>> > statefulness of objects that could be operating in multiple search threads.
>> >
>> > I think we'll be generally OK if we do most of our scoring at the segment
>> > level.  Individual SegReaders don't share much within a larger PolyReader, so
>> > Scorers derived from those SegReaders won't either.
>>
>> OK even for concurrency within a single search, it sounds like?
>
> All of the core Scorers can be made thread-safe for single-search concurrency.
> We just need to avoid using stateful objects which are shared by multiple
> SegReaders.  The list of those is reasonably small.
>
> Individual DataReaders (PostingsReader, LexReader, DocReader, DeletionsReader,
> etc) might or might not be stateful, but not in ways that are shared (other
> than refcounting).
>
> For instance, consider a PostingsReader.  It holds open at least one InStream
> which is stateful on 32-bit systems because of its sliding window.  It would
> not be safe to share that InStream across multiple threads -- but there's no
> reason that InStream would ever be shared across threads if we are scoring
> per-segment, and for that matter, actually cloning the InStream whenever we
> spawn a PostingList object.
>
> That PostingsReader also holds a reference to a shared Schema instance, which
> may have Analyzers that aren't thread safe.  For that matter, Schema itself
> technically isn't thread safe, because you might call Schema_Spec_Field() at
> any moment.  However, just because you *can* use those shared elements in a
> non-thread-safe way doesn't mean there's a reason you would do so.  No core
> Scorer would, at least.

Right. This would be a "cooperative" approach to threading.  I
carefully don't share things with you, so we won't crash and burn.
Sort of like the each-kid-gets-their-own-cup-so-germs-don't-spread
rule that many parents adopt.

>> > Then there's refcounting.  For most objects operating in a per-segment scoring
>> > thread, it wouldn't be necessary to synchronize on Inc_RefCount and
>> > Dec_RefCount, but I don't see how we'd be able to pick and choose our sync
>> > points.
>>
>> It sounds like we either 1) try to guarantee threads won't share
>> objects, in which case you don't need to lock in the object nor in
>> incref/decref,
>
> ... good enough for indexing...
>
>> or 2) allow for the possibility of sharing.
>
> ... required for searching.
>
>> You could use atomic increment/decrement, though I'm unsure how costly
>> those really are at the CPU level.
>
> For C under pthreads, where the refcount will presumably be a plain old
> integer, that's probably what we want.

Be careful: I'm unsure how bad these instructions are at the CPU
level.  They feel wonderful in C, but I think they may use the LOCK
instruction (on Intel CPUs) and I think I've read bad things about
what that instruction requires of all other running CPUs.  I'm not
certain, though, and that was a while ago.

I think a far better approach, if we can pull it off, is to not do any
locking because we are certain that some objects are never shared.
Perhaps there'd be "thread safe" incref/decref, and "private"
incref/decref.  But fragility is there...

Mike

Re: threads

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mon, Mar 30, 2009 at 06:08:28PM -0400, Michael McCandless wrote:

> >  * The VTable_registry Hash which is used to associate VTables with class
> >    names is stateful.
> 
> What is this hash used for?  Does it map name -> VTable?  

Yes.  It is used to support dynamic subclassing, and also to support
deserialization (when the appopriate deserialization function must be selected
based on the class name).

> It seems like most references within core could be done by the
> compiler/linker?

I think you could say that most references within core rely on referring to
the VTable structs generated at compile-time by Boilerplater.

                     vvvv 
  if (!OBJ_IS_A(obj, HASH)) { 
      THROW("That's not a Hash, it's a %o", Obj_Get_Class_Name(obj));
  }

>  And then when host needs to refer to an object inside the core,
> shouldn't the bindings be exposed however the host would do it (eg as
> a Python module(s) with bindings), also compiled/linked statically.

I don't think I follow that...  Are you saying that dynamic subclassing would
never be needed under Python? 

> > The VTables themselves are stateful because they are refcounted.
> > Furthermore, dynamically created VTables are reclaimed once the last object
> > which needs them goes away -- so the "MyScorer" VTable at one point in the
> > program might not be the same as the "MyScorer" VTable at another point in
> > the program.
> 
> This is neat: so you can make a new subclass at runtime, instantiate a
> bunch of objects off of it, and once all objects are gone, and nobody
> directly refers to the subclass (vtable).  Can you define new vtables
> from the host language?

Yes, transparently, in Perl at least.  Standard Perl subclassing techniques
work.

   package MyObj;
   use base qw( Lucy::Obj );

   package main;

   my $obj = MyObj->new; # MyObj vtable created via inherited constructor.

   undef $obj;           # Triggers DESTROY. Last ref to dynamic VTable for 
                         # MyObj disappears, so it gets reclaimed.

At least that's how things work now.  To make things thread-safe, we'll create
the VTable for "MyObj" dynamically, but once created it will never go away.

> > However, if we stop refcounting VTables (by making Inc_RefCount and
> > Dec_RefCount no-ops) and accept that dynamically created VTables will leak,
> > then those concerns go away.  I think this would be a reasonable approach.
> > The leak ain't gonna matter unless somebody does something psycho like cycle
> > through lots of unique class names.
> 
> That seems tentatively OK.  What do you see dynamic vtables being used
> for in Lucy?

They are required for subclassing in Perl, at least.

> So then we don't have to worry about individual objects being thread
> safe as long as we can ensure threads never share objects.

Yes. :)  [Pending resolution of issues around remaining shared globals.]

> It might get weird if the host also doesn't maintain separate worlds,
> eg the host can think it created object X in Lucy but then if it
> checks back later and gets the wrong world, object X is gone.

I don't understand how this could happen, so perhaps I'm not following.

> > Then there's search-time, where the multi-world approach isn't always
> > adequate.  Here, things get tricky because we have to worry about the
> > statefulness of objects that could be operating in multiple search threads.
> >
> > I think we'll be generally OK if we do most of our scoring at the segment
> > level.  Individual SegReaders don't share much within a larger PolyReader, so
> > Scorers derived from those SegReaders won't either.
> 
> OK even for concurrency within a single search, it sounds like?

All of the core Scorers can be made thread-safe for single-search concurrency.
We just need to avoid using stateful objects which are shared by multiple
SegReaders.  The list of those is reasonably small.

Individual DataReaders (PostingsReader, LexReader, DocReader, DeletionsReader,
etc) might or might not be stateful, but not in ways that are shared (other
than refcounting).  

For instance, consider a PostingsReader.  It holds open at least one InStream
which is stateful on 32-bit systems because of its sliding window.  It would
not be safe to share that InStream across multiple threads -- but there's no
reason that InStream would ever be shared across threads if we are scoring
per-segment, and for that matter, actually cloning the InStream whenever we
spawn a PostingList object.

That PostingsReader also holds a reference to a shared Schema instance, which
may have Analyzers that aren't thread safe.  For that matter, Schema itself
technically isn't thread safe, because you might call Schema_Spec_Field() at
any moment.  However, just because you *can* use those shared elements in a
non-thread-safe way doesn't mean there's a reason you would do so.  No core
Scorer would, at least.

> > Then there's refcounting.  For most objects operating in a per-segment scoring
> > thread, it wouldn't be necessary to synchronize on Inc_RefCount and
> > Dec_RefCount, but I don't see how we'd be able to pick and choose our sync
> > points.
> 
> It sounds like we either 1) try to guarantee threads won't share
> objects, in which case you don't need to lock in the object nor in
> incref/decref, 

... good enough for indexing...

> or 2) allow for the possibility of sharing.  

... required for searching.

> You could use atomic increment/decrement, though I'm unsure how costly
> those really are at the CPU level.

For C under pthreads, where the refcount will presumably be a plain old
integer, that's probably what we want.

Marvin Humphrey

Re: threads

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sun, Mar 29, 2009 at 7:29 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Sun, Mar 29, 2009 at 09:21:58AM -0400, Michael McCandless wrote:
>
>> I think concurrency is so bad that either 1) we're (the "royal we",
>> our species) gonna have to punt on micro-concurrency (not use it, ie
>> use only macro/strongly-divorced concurrency, like multi-process,
>> one-thread-does-whole-search),
>
> Ensuring the bare minimum level of thread support in Lucy for
> "strongly-divorced concurrency" won't be insanely difficult.  We just have to
> ensure that all globals are safe to access -- and there aren't that many
> stateful globals in our KS prototype.

OK

>  * VTables are stateful.
>  * The VTable_registry Hash which is used to associate VTables with class
>    names is stateful.

What is this hash used for?  Does it map name -> VTable?  It seems
like most references within core could be done by the compiler/linker?
 And then when host needs to refer to an object inside the core,
shouldn't the bindings be exposed however the host would do it (eg as
a Python module(s) with bindings), also compiled/linked statically.

>  * The Hash class uses shared keys.
>  * Debug and IndexReader have some stateful test-only globals that don't
>    matter.
>
> All other globals in KS are stateless, including the UNDEF object, format
> constants, string literals, and most importantly, offsets into the VTables
> used by the "inside out vtable" dispatch algo, which are stored in a bazillion
> individual global size_t variables.  The method offsets are all thread-safe,
> because they are constant per-core-compile.
>
> A little footnote though about those offsets, though.  When I was examining
> the assembler generated by GCC for the method dispatch static inline
> functions, I found that it ignored the "const" part of "extern const size_t"
> and performed an ordinary dynamic load.  When we compile under pthreads, we
> should see whether GCC skimps on the optimization because it's worried that
> those "extern const" variables might change.

Hmmm.

> That won't be an issue for core, because we will probably end up
> pound-defining the offset symbols as actual integer constants if LUCY_CORE is
> defined.  But it might matter for compiled extensions if they run tight loops.

OK.

> The VTables themselves are stateful because they are refcounted.
> Furthermore, dynamically created VTables are reclaimed once the last object
> which needs them goes away -- so the "MyScorer" VTable at one point in the
> program might not be the same as the "MyScorer" VTable at another point in
> the program.

This is neat: so you can make a new subclass at runtime, instantiate a
bunch of objects off of it, and once all objects are gone, and nobody
directly refers to the subclass (vtable).  Can you define new vtables
from the host language?

> However, if we stop refcounting VTables (by making Inc_RefCount and
> Dec_RefCount no-ops) and accept that dynamically created VTables will leak,
> then those concerns go away.  I think this would be a reasonable approach.
> The leak ain't gonna matter unless somebody does something psycho like cycle
> through lots of unique class names.

That seems tentatively OK.  What do you see dynamic vtables being used
for in Lucy?

> That leaves the VTable_registry Hash, which is the hard part.
>
> One option is to write a lock-free hash implementation.  Cliff Click of Azul
> did a presentation at Google on a Java version.  The crucial requirement is an
> atomic pointer-compare-and-swap.  It'd take some work to set up probes for
> that to achieve portability, but it's probably doable.
>
> Another option is to throw a mutex around every read and write op on
> VTable_registry.
>
> A third option is to lock writes only but disallow deletions or displacements.
> Rehashing (where we double the table size and redistribute entries) is
> problematic; we'd probably have to keep all old versions around.
>
> The shared-keys aspect of Hash was a recent change, and we can just junk it
> in favor of per-object keys.

OK it sounds like "macro threading" should work fine.

So then we don't have to worry about individual objects being thread
safe as long as we can ensure threads never share objects.

>> In any event, if Lucy will not allow more than one thread in the core
>> at once, it still must mate properly with the threads running in the
>> host.  EG you can't just up and call a Python API; you have to
>> register your thread with it (likewise for JNI), release/reaquire
>> Python's global lock when crossing the bridge, etc.
>
> Let's avoid that mess by allowing more than one thread in the Lucy core.

Good.

>> > For indexing, I thing we should make it possible to support concurrency using
>> > multiple indexer *objects*.  Whether those multiple indexer objects get used
>> > within a single multi-threaded app, or within separate processes shouldn't be
>> > important.  However, I think it's very important that we not *require* threads
>> > to exploit concurrency at index-time.
>>
>> Interesting... so you could take the "strongly divorced" approach,
>> yet, allow multiple such "worlds" inside a single Lucy process?  This
>> seems like a nice middle ground.  And mmap is the basis for sharing.
>>
>> Python allows this too.  You can create multiple interpreter "worlds",
>> and each freely runs independent of the other (separate global lock).
>
> I agree -- this sounds like our best option.
>
> I think separate "worlds" should be good enough for indexing, provided that our
> multi-writer concurrency plans work out in practice as they have in theory.

It might get weird if the host also doesn't maintain separate worlds,
eg the host can think it created object X in Lucy but then if it
checks back later and gets the wrong world, object X is gone.

> Then there's search-time, where the multi-world approach isn't always
> adequate.  Here, things get tricky because we have to worry about the
> statefulness of objects that could be operating in multiple search threads.
>
> I think we'll be generally OK if we do most of our scoring at the segment
> level.  Individual SegReaders don't share much within a larger PolyReader, so
> Scorers derived from those SegReaders won't either.

OK even for concurrency within a single search, it sounds like?

> The main classes I'd be worried about right now in KS are Analyzers, which are
> cached per-FieldSpec in Schema.  It turns out that one of the main Analyzers
> in KS, Tokenizer, is stateful because it uses a Perl regex object which is
> stateful.  If for some reason a custom Scorer requests to use an Analyzer from
> a Schema instance (which is shared by all the SegReaders), it wouldn't be
> safe.

OK.

> Then there's refcounting.  For most objects operating in a per-segment scoring
> thread, it wouldn't be necessary to synchronize on Inc_RefCount and
> Dec_RefCount, but I don't see how we'd be able to pick and choose our sync
> points.

It sounds like we either 1) try to guarantee threads won't share
objects, in which case you don't need to lock in the object nor in
incref/decref, or 2) allow for the possibility of sharing.  (And in
either case we need known stateful globals to handle multiple
threads).

You could use atomic increment/decrement, though I'm unsure how costly
those really are at the CPU level.

>> Say lots of searches need to run concurrently (call this "macro"
>> concurrency, vs the "micro" concurrency below where more than one
>> thread works on a single search): will Lucy allow a host to send in N
>> threads doing their own searching, and allow those threads to run
>> concurrently (if the C platform supports real threads)?
>
> Yeah, that should work fine pending resolution of the issues surrounding the
> remaining stateful globals.

OK

Mike

Re: threads

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Sun, Mar 29, 2009 at 09:21:58AM -0400, Michael McCandless wrote:

> I think concurrency is so bad that either 1) we're (the "royal we",
> our species) gonna have to punt on micro-concurrency (not use it, ie
> use only macro/strongly-divorced concurrency, like multi-process,
> one-thread-does-whole-search), 

Ensuring the bare minimum level of thread support in Lucy for
"strongly-divorced concurrency" won't be insanely difficult.  We just have to
ensure that all globals are safe to access -- and there aren't that many
stateful globals in our KS prototype.

  * VTables are stateful.
  * The VTable_registry Hash which is used to associate VTables with class
    names is stateful.
  * The Hash class uses shared keys.
  * Debug and IndexReader have some stateful test-only globals that don't
    matter.

All other globals in KS are stateless, including the UNDEF object, format
constants, string literals, and most importantly, offsets into the VTables
used by the "inside out vtable" dispatch algo, which are stored in a bazillion
individual global size_t variables.  The method offsets are all thread-safe,
because they are constant per-core-compile.  

A little footnote though about those offsets, though.  When I was examining
the assembler generated by GCC for the method dispatch static inline
functions, I found that it ignored the "const" part of "extern const size_t"
and performed an ordinary dynamic load.  When we compile under pthreads, we
should see whether GCC skimps on the optimization because it's worried that
those "extern const" variables might change.

That won't be an issue for core, because we will probably end up
pound-defining the offset symbols as actual integer constants if LUCY_CORE is
defined.  But it might matter for compiled extensions if they run tight loops.

The VTables themselves are stateful because they are refcounted.
Furthermore, dynamically created VTables are reclaimed once the last object
which needs them goes away -- so the "MyScorer" VTable at one point in the
program might not be the same as the "MyScorer" VTable at another point in
the program.

However, if we stop refcounting VTables (by making Inc_RefCount and
Dec_RefCount no-ops) and accept that dynamically created VTables will leak,
then those concerns go away.  I think this would be a reasonable approach.
The leak ain't gonna matter unless somebody does something psycho like cycle
through lots of unique class names.

That leaves the VTable_registry Hash, which is the hard part.

One option is to write a lock-free hash implementation.  Cliff Click of Azul
did a presentation at Google on a Java version.  The crucial requirement is an
atomic pointer-compare-and-swap.  It'd take some work to set up probes for
that to achieve portability, but it's probably doable.

Another option is to throw a mutex around every read and write op on
VTable_registry.

A third option is to lock writes only but disallow deletions or displacements.
Rehashing (where we double the table size and redistribute entries) is
problematic; we'd probably have to keep all old versions around.

The shared-keys aspect of Hash was a recent change, and we can just junk it
in favor of per-object keys.

> or 2) move to better dynamic languages
> that are largely declarative such that the runtime has the freedom to
> use micro-concurrency as it sees fit.

For now, an academically interesting point.  :)  

> In any event, if Lucy will not allow more than one thread in the core
> at once, it still must mate properly with the threads running in the
> host.  EG you can't just up and call a Python API; you have to
> register your thread with it (likewise for JNI), release/reaquire
> Python's global lock when crossing the bridge, etc.  

Let's avoid that mess by allowing more than one thread in the Lucy core.

> > For indexing, I thing we should make it possible to support concurrency using
> > multiple indexer *objects*.  Whether those multiple indexer objects get used
> > within a single multi-threaded app, or within separate processes shouldn't be
> > important.  However, I think it's very important that we not *require* threads
> > to exploit concurrency at index-time.
> 
> Interesting... so you could take the "strongly divorced" approach,
> yet, allow multiple such "worlds" inside a single Lucy process?  This
> seems like a nice middle ground.  And mmap is the basis for sharing.
> 
> Python allows this too.  You can create multiple interpreter "worlds",
> and each freely runs independent of the other (separate global lock).

I agree -- this sounds like our best option.

I think separate "worlds" should be good enough for indexing, provided that our
multi-writer concurrency plans work out in practice as they have in theory.

Then there's search-time, where the multi-world approach isn't always
adequate.  Here, things get tricky because we have to worry about the
statefulness of objects that could be operating in multiple search threads.  

I think we'll be generally OK if we do most of our scoring at the segment
level.  Individual SegReaders don't share much within a larger PolyReader, so
Scorers derived from those SegReaders won't either.  

The main classes I'd be worried about right now in KS are Analyzers, which are
cached per-FieldSpec in Schema.  It turns out that one of the main Analyzers
in KS, Tokenizer, is stateful because it uses a Perl regex object which is
stateful.  If for some reason a custom Scorer requests to use an Analyzer from
a Schema instance (which is shared by all the SegReaders), it wouldn't be
safe.

Then there's refcounting.  For most objects operating in a per-segment scoring
thread, it wouldn't be necessary to synchronize on Inc_RefCount and
Dec_RefCount, but I don't see how we'd be able to pick and choose our sync
points.

> Say lots of searches need to run concurrently (call this "macro"
> concurrency, vs the "micro" concurrency below where more than one
> thread works on a single search): will Lucy allow a host to send in N
> threads doing their own searching, and allow those threads to run
> concurrently (if the C platform supports real threads)?

Yeah, that should work fine pending resolution of the issues surrounding the
remaining stateful globals.

Marvin Humphrey

Re: threads

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sat, Mar 28, 2009 at 9:11 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

>> Are more than one thread allowed inside the Lucy core at once?
>
> I would like to.  However, I think it's important to make three points up
> front.
>
>  1) Concurrency is hard.  Even in languages with comparatively good support
>     for threads like Java, threads programming is a bug-spawning developer
>     timesuck.

I could not agree more (especially coming off of LUCENE-1516!).

I think concurrency is so bad that either 1) we're (the "royal we",
our species) gonna have to punt on micro-concurrency (not use it, ie
use only macro/strongly-divorced concurrency, like multi-process,
one-thread-does-whole-search), or 2) move to better dynamic languages
that are largely declarative such that the runtime has the freedom to
use micro-concurrency as it sees fit.

>  2) We will not be able to abstract multiple host threading models so that we
>     can make sophisticated use of them in the Lucy core.

But it seems like there is a common denominator here ("can more than
one thread be running inside Lucy's core"), regardless of whether
native/user threads, different thread packages, etc, are the actual
threads implementation.  Python does a good job building an abstract
API, that it consumes, on top of the many actual thread
implementations.

In any event, if Lucy will not allow more than one thread in the core
at once, it still must mate properly with the threads running in the
host.  EG you can't just up and call a Python API; you have to
register your thread with it (likewise for JNI), release/reaquire
Python's global lock when crossing the bridge, etc.  And if more than
one thread attempts to cross the bridge into Lucy, there must be a
troll to make the 2nd thread wait, etc.

Punting on threads still means you need to do something ;)

>  3) Multiple processes will always be available -- but threads won't.
>
> For those reasons, in my opinion we should keep our threading ambitions to a
> minimum.

OK.

> I think we should have two priorities:
>
>  1) Don't break the host's threading model.
>  2) Make it possible to exploit threads in a limited way during search.
>
>> Or are we "up-front" expecting one to always use separate processes to
>> gain concurrency?
>
> Fortunately, thanks to mmap, we are going to be able to make excellent use of
> multiple processes.  If we had no choice but to read index caches into process
> memory every time a la Java Lucene, we would have far more motivation to rely
> on threads within a single process as our primary concurrency model.

You are then forcing the host to make use of multiple processes, too
(or... to access Lucy over some remote "bridge", eg spawn subprocess
and talk over pipes).  I imagine Python & Perl are fairly fast to
startup.  Java in the past was not, though it was a sore point and
maybe is better now (the whole client vs server thing).

> For indexing, I thing we should make it possible to support concurrency using
> multiple indexer *objects*.  Whether those multiple indexer objects get used
> within a single multi-threaded app, or within separate processes shouldn't be
> important.  However, I think it's very important that we not *require* threads
> to exploit concurrency at index-time.

Interesting... so you could take the "strongly divorced" approach,
yet, allow multiple such "worlds" inside a single Lucy process?  This
seems like a nice middle ground.  And mmap is the basis for sharing.

Python allows this too.  You can create multiple interpreter "worlds",
and each freely runs independent of the other (separate global lock).

> For searching, I think we have no choice.  There are certain things which
> cannot be achieved using a process-based concurrency model because portable
> IPC techniques are too crude -- e.g. HitCollector-based scoring routines.

Yes, though "you really should not do that" (have your HitCollector on
the other side of a looooong (IPC) bridge).  Surely you could have
host wrapper embedding Lucy and HitCollector runs there locally, and
then looong IPC bridge is crossed to deliver results.

>> Whichever it is, Lucy will need to do something when crossing the
>> bridge to "mate" to the Host language's thread model.
>
> I think what we're going to have to do is issue a callback to the Host
> whenever multiple threads might be launched, and wait for that call to return
> after all threads have concluded their work.
>
> In a multi-threaded Host, several threads might run in parallel.  In a
> single-threaded Host, the threaded calls will run sequentially.

OK.

Say lots of searches need to run concurrently (call this "macro"
concurrency, vs the "micro" concurrency below where more than one
thread works on a single search): will Lucy allow a host to send in N
threads doing their own searching, and allow those threads to run
concurrently (if the C platform supports real threads)?

>> At some point, as described above, a single search will need to use
>> concurrency; it seems like Lucy should allow multiple threads into the
>> core for this reason.
>
> I think we have no choice but to allow threads during search in order to
> exploit multiple processors and return the answer to a query as fast as
> possible.
>
> Mike, I know you would prefer not to tie the index format to our concurrency
> model, but I think a one-thread-per-segment scoring model makes a lot of
> sense.  Using skipping info could work with core classes, but there's tension
> between that and making it easy to write plugins:
>
> It's easy to tell a plugin "skip to the next segment".  (In fact, I think we
> might consider making all Scorers single-segment only.)  It's hard to require
> that all Scorer and DataReader subclasses implement intra-segment skipping
> support.
>
> In order to support multi-threaded search for custom index components, I think
> we should adopt a segment-based model and adjust our index optimization
> APIs and algorithms to fit that model.

OK I think that's a viable approach too.

> I should also note that my personal priority regarding threads has been and
> remains to avoid foreclosing on the option of using them.  However, I'm
> working in a single-threaded environment right now, and I don't have the means
> to test my code for thread-safety.

Not precluding future threads seems like a good overall goal.  Since
you use the host's ref counting, you inherit thread safety for that,
which is nice.

> The first module we'd have to work on to make Lucy safe for threads would be
> Lucy::Util::Hash, which is used to associate class names with VTable
> instances.  However, I'm not going to delay submitting that module for the
> sake of making it thread-safe first.

OK.

Mike

Re: threads

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, Mar 27, 2009 at 08:28:50AM -0400, Michael McCandless wrote:

> Are more than one thread allowed inside the Lucy core at once?

I would like to.  However, I think it's important to make three points up
front. 

  1) Concurrency is hard.  Even in languages with comparatively good support
     for threads like Java, threads programming is a bug-spawning developer
     timesuck.
  2) We will not be able to abstract multiple host threading models so that we
     can make sophisticated use of them in the Lucy core.
  3) Multiple processes will always be available -- but threads won't.

For those reasons, in my opinion we should keep our threading ambitions to a
minimum.

I think we should have two priorities:

  1) Don't break the host's threading model.
  2) Make it possible to exploit threads in a limited way during search.

> Or are we "up-front" expecting one to always use separate processes to
> gain concurrency?

Fortunately, thanks to mmap, we are going to be able to make excellent use of
multiple processes.  If we had no choice but to read index caches into process
memory every time a la Java Lucene, we would have far more motivation to rely
on threads within a single process as our primary concurrency model.  

For indexing, I thing we should make it possible to support concurrency using
multiple indexer *objects*.  Whether those multiple indexer objects get used
within a single multi-threaded app, or within separate processes shouldn't be
important.  However, I think it's very important that we not *require* threads
to exploit concurrency at index-time.

For searching, I think we have no choice.  There are certain things which
cannot be achieved using a process-based concurrency model because portable
IPC techniques are too crude -- e.g. HitCollector-based scoring routines.

> Whichever it is, Lucy will need to do something when crossing the
> bridge to "mate" to the Host language's thread model.

I think what we're going to have to do is issue a callback to the Host
whenever multiple threads might be launched, and wait for that call to return
after all threads have concluded their work.

In a multi-threaded Host, several threads might run in parallel.  In a
single-threaded Host, the threaded calls will run sequentially.

> At some point, as described above, a single search will need to use
> concurrency; it seems like Lucy should allow multiple threads into the
> core for this reason.

I think we have no choice but to allow threads during search in order to
exploit multiple processors and return the answer to a query as fast as
possible.

Mike, I know you would prefer not to tie the index format to our concurrency
model, but I think a one-thread-per-segment scoring model makes a lot of
sense.  Using skipping info could work with core classes, but there's tension
between that and making it easy to write plugins:  

It's easy to tell a plugin "skip to the next segment".  (In fact, I think we
might consider making all Scorers single-segment only.)  It's hard to require
that all Scorer and DataReader subclasses implement intra-segment skipping
support.

In order to support multi-threaded search for custom index components, I think
we should adopt a segment-based model and adjust our index optimization
APIs and algorithms to fit that model.

---

I should also note that my personal priority regarding threads has been and
remains to avoid foreclosing on the option of using them.  However, I'm
working in a single-threaded environment right now, and I don't have the means
to test my code for thread-safety.

The first module we'd have to work on to make Lucy safe for threads would be
Lucy::Util::Hash, which is used to associate class names with VTable
instances.  However, I'm not going to delay submitting that module for the
sake of making it thread-safe first.

Marvin Humphrey