You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2016/01/29 12:49:40 UTC

Parallelism in query execution

Rob,

In dotNetRDF, there is parallel execution, isn't there?

I have been thinking (toying with) the idea of parallel execution and I 
wondered what unit of work is for the parallelism in dotNetRDF.

What little thinking I've done suggests that tapping into the 
parallelism in java streams is not the right way to do it (which is a 
shame as that's less work).  It needs more control and probably larger 
units of work. There is a danger that small/fast queries slow down due 
to too much thinking.

It needs more control as well to limit how much of the machine it will 
take over because, in Fuseki, it might lead to starvation of other 
requests.  As some usage is " many clients, many small requests", 
parallelism can impact the the system negatively as well as positively. 
At some point, the limitation will be the connection of CPUs to RAM 
rather than cycles.

     Andy

Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL 
query is a BGP+Filter and not more.  The filter ran on a separate thread 
to the BGP solver.  Timed gain ... just 10%.  This was on a early 
generation 2 CPU, 2 processor machine so the cost of threading was huge. 
  Most users then did not have a multi-anything machine. It lead to lots 
of problems with thread management when Java wasn't what it is today.

Re: Parallelism in query execution

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Comments inline:

On 04/02/2016 14:54, "Andy Seaborne" <an...@apache.org> wrote:

>On 02/02/16 14:52, Rob Vesse wrote:
>> The other place huge UNIONs can appear is if you want to try and flatten
>> out property paths queries involving +, - and * into approximated
>>queries
>> that yield the same results
>>
>> I don't think Jena does this but we at Cray certainly provide users the
>> option to do this
>
>Do you formally support :p{N,M} in paths?  IIRC they have different
>cardinality from * and +

No we don't

>
>What's "-"?

A typo, meant to be "|" I.e. alternative
>
>> I like the idea of maybe buffering slightly and then intelligently
>> deciding what kind of join to use.  However this probably needs to take
>> into account the overall query structure because if you have a small
>> LIMIT/OFFSET present then you want to do the minimum work possible so
>> don't want to compute an unnecessary hash join when an index join could
>> have yielded the one result you needed much faster.
>
>I agree that the technique needs some care.  It is in danger of
>multiplying up as well, because an index/substitution execution begats N
>follow up units.
>
>For parallelism, the index could be chunked into work units of, say, 25
>intermediates and run one parallel processor per 25 hits. This is trying
>to amortize the channel and synchronization costs.

I like that approach, it seems like a sensible balance

Rob

>
>	Andy
>
>>
>> Rob
>>
>> On 02/02/2016 14:17, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>> Rob,
>>>
>>> Thanks for the details.  Lots of good stuff in there.
>>>
>>> I am thinking about both tweaking current execution and for a new
>>> (hypothetical) execution engine more targeted at analytics workloads,
>>>say.
>>>
>>> I have seem some truly monstrous UNIONs (1000s), and FILTER (NOT) IN
>>> leads to unions as well.
>>>
>>> The case of a IFP yielding one result, and using that as an index join
>>> is somewhat import. A small amount of buffering probably would not hurt
>>> the execution and make it adapt to large LHS matches. One thing that
>>> might be interesting is switch between index and hash join by sniffing
>>> the input stream and if it is below some threshold use index join and
>>>do
>>> a block hash join on a few multiple threads above the threshold.  Or a
>>> full-blown parallel hash join (takes twice the working state - it
>>> retains up to about twice the size of the smaller side of the join).
>>>
>>>
>>> 	Andy
>>>
>>> On 29/01/16 14:06, Rob Vesse wrote:
>>>> Andy
>>>>
>>>> Parallelism is used in a couple of main places, bear in mind that
>>>> dotNetRDF uses a block based engine rather than a streaming engine
>>>>(I.e.
>>>> no OpSequence, OpConditional) although it does do a form of index join
>>>> to
>>>> avoid unnecessary work wherever possible.  The main areas in which it
>>>> uses
>>>> parallelism are joins and filters.  So the unit of work is over a
>>>> multi-set of possible solutions in SPARQL spec parlance.
>>>>
>>>> For joins since we are joining (or left joining or minus-ing) the
>>>> results
>>>> of two operators we essentially do a parallelized hash join between
>>>>the
>>>> two sides.  The hash table is built in serial from the LHS results and
>>>> then we parallelise over the RHS results doing look ups into the hash
>>>> table and outputting the join results in parallel.  You have to be
>>>> careful
>>>> about the data structures used to avoid threads stomping each others
>>>> results but this isn't too difficult.
>>>>
>>>> For filters we do something much closer to vectorization where
>>>> essentially
>>>> we parallelise the evaluation of the expression over all the possible
>>>> solutions (again we're a blocking engine) and recombine the results
>>>> afterwards.
>>>>
>>>> In terms of control in the .Net world we benefit from PLinq which are
>>>> CLR
>>>> supplied extensions to the basic Linq constructs (aka Streams in Java
>>>>8)
>>>> that automatically parallelize according to the available resources on
>>>> the
>>>> machine.  I'm not sure how smart this is but at least in the .Net
>>>>world
>>>> you can constrain it if you want.
>>>>
>>>> For Jena which is a predominantly streaming engine I don't see either
>>>>of
>>>> these approaches providing great benefits.  They could be used in the
>>>> cases where ARQ does have to do block evaluation to improve things
>>>> subject
>>>> to the points about how best to control the level of parallelism.  The
>>>> other area where there may be some benefit is to consider the case of
>>>> queries with UNION constructs, particularly for queries where we are
>>>> doing
>>>> lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may
>>>> be
>>>> possible to dispatch multiple branches of the UNION in parallel.
>>>>
>>>> Rob
>>>>
>>>> On 29/01/2016 11:49, "Andy Seaborne" <an...@apache.org> wrote:
>>>>
>>>>> Rob,
>>>>>
>>>>> In dotNetRDF, there is parallel execution, isn't there?
>>>>>
>>>>> I have been thinking (toying with) the idea of parallel execution
>>>>>and I
>>>>> wondered what unit of work is for the parallelism in dotNetRDF.
>>>>>
>>>>> What little thinking I've done suggests that tapping into the
>>>>> parallelism in java streams is not the right way to do it (which is a
>>>>> shame as that's less work).  It needs more control and probably
>>>>>larger
>>>>> units of work. There is a danger that small/fast queries slow down
>>>>>due
>>>>> to too much thinking.
>>>>>
>>>>> It needs more control as well to limit how much of the machine it
>>>>>will
>>>>> take over because, in Fuseki, it might lead to starvation of other
>>>>> requests.  As some usage is " many clients, many small requests",
>>>>> parallelism can impact the the system negatively as well as
>>>>>positively.
>>>>> At some point, the limitation will be the connection of CPUs to RAM
>>>>> rather than cycles.
>>>>>
>>>>>       Andy
>>>>>
>>>>> Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>>>>> query is a BGP+Filter and not more.  The filter ran on a separate
>>>>> thread
>>>>> to the BGP solver.  Timed gain ... just 10%.  This was on a early
>>>>> generation 2 CPU, 2 processor machine so the cost of threading was
>>>>> huge.
>>>>>    Most users then did not have a multi-anything machine. It lead to
>>>>> lots
>>>>> of problems with thread management when Java wasn't what it is today.
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>





Re: Parallelism in query execution

Posted by Andy Seaborne <an...@apache.org>.
On 02/02/16 14:52, Rob Vesse wrote:
> The other place huge UNIONs can appear is if you want to try and flatten
> out property paths queries involving +, - and * into approximated queries
> that yield the same results
>
> I don't think Jena does this but we at Cray certainly provide users the
> option to do this

Do you formally support :p{N,M} in paths?  IIRC they have different 
cardinality from * and +

What's "-"?

> I like the idea of maybe buffering slightly and then intelligently
> deciding what kind of join to use.  However this probably needs to take
> into account the overall query structure because if you have a small
> LIMIT/OFFSET present then you want to do the minimum work possible so
> don't want to compute an unnecessary hash join when an index join could
> have yielded the one result you needed much faster.

I agree that the technique needs some care.  It is in danger of 
multiplying up as well, because an index/substitution execution begats N 
follow up units.

For parallelism, the index could be chunked into work units of, say, 25 
intermediates and run one parallel processor per 25 hits. This is trying 
to amortize the channel and synchronization costs.

	Andy

>
> Rob
>
> On 02/02/2016 14:17, "Andy Seaborne" <an...@apache.org> wrote:
>
>> Rob,
>>
>> Thanks for the details.  Lots of good stuff in there.
>>
>> I am thinking about both tweaking current execution and for a new
>> (hypothetical) execution engine more targeted at analytics workloads, say.
>>
>> I have seem some truly monstrous UNIONs (1000s), and FILTER (NOT) IN
>> leads to unions as well.
>>
>> The case of a IFP yielding one result, and using that as an index join
>> is somewhat import. A small amount of buffering probably would not hurt
>> the execution and make it adapt to large LHS matches. One thing that
>> might be interesting is switch between index and hash join by sniffing
>> the input stream and if it is below some threshold use index join and do
>> a block hash join on a few multiple threads above the threshold.  Or a
>> full-blown parallel hash join (takes twice the working state - it
>> retains up to about twice the size of the smaller side of the join).
>>
>>
>> 	Andy
>>
>> On 29/01/16 14:06, Rob Vesse wrote:
>>> Andy
>>>
>>> Parallelism is used in a couple of main places, bear in mind that
>>> dotNetRDF uses a block based engine rather than a streaming engine (I.e.
>>> no OpSequence, OpConditional) although it does do a form of index join
>>> to
>>> avoid unnecessary work wherever possible.  The main areas in which it
>>> uses
>>> parallelism are joins and filters.  So the unit of work is over a
>>> multi-set of possible solutions in SPARQL spec parlance.
>>>
>>> For joins since we are joining (or left joining or minus-ing) the
>>> results
>>> of two operators we essentially do a parallelized hash join between the
>>> two sides.  The hash table is built in serial from the LHS results and
>>> then we parallelise over the RHS results doing look ups into the hash
>>> table and outputting the join results in parallel.  You have to be
>>> careful
>>> about the data structures used to avoid threads stomping each others
>>> results but this isn't too difficult.
>>>
>>> For filters we do something much closer to vectorization where
>>> essentially
>>> we parallelise the evaluation of the expression over all the possible
>>> solutions (again we're a blocking engine) and recombine the results
>>> afterwards.
>>>
>>> In terms of control in the .Net world we benefit from PLinq which are
>>> CLR
>>> supplied extensions to the basic Linq constructs (aka Streams in Java 8)
>>> that automatically parallelize according to the available resources on
>>> the
>>> machine.  I'm not sure how smart this is but at least in the .Net world
>>> you can constrain it if you want.
>>>
>>> For Jena which is a predominantly streaming engine I don't see either of
>>> these approaches providing great benefits.  They could be used in the
>>> cases where ARQ does have to do block evaluation to improve things
>>> subject
>>> to the points about how best to control the level of parallelism.  The
>>> other area where there may be some benefit is to consider the case of
>>> queries with UNION constructs, particularly for queries where we are
>>> doing
>>> lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may
>>> be
>>> possible to dispatch multiple branches of the UNION in parallel.
>>>
>>> Rob
>>>
>>> On 29/01/2016 11:49, "Andy Seaborne" <an...@apache.org> wrote:
>>>
>>>> Rob,
>>>>
>>>> In dotNetRDF, there is parallel execution, isn't there?
>>>>
>>>> I have been thinking (toying with) the idea of parallel execution and I
>>>> wondered what unit of work is for the parallelism in dotNetRDF.
>>>>
>>>> What little thinking I've done suggests that tapping into the
>>>> parallelism in java streams is not the right way to do it (which is a
>>>> shame as that's less work).  It needs more control and probably larger
>>>> units of work. There is a danger that small/fast queries slow down due
>>>> to too much thinking.
>>>>
>>>> It needs more control as well to limit how much of the machine it will
>>>> take over because, in Fuseki, it might lead to starvation of other
>>>> requests.  As some usage is " many clients, many small requests",
>>>> parallelism can impact the the system negatively as well as positively.
>>>> At some point, the limitation will be the connection of CPUs to RAM
>>>> rather than cycles.
>>>>
>>>>       Andy
>>>>
>>>> Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>>>> query is a BGP+Filter and not more.  The filter ran on a separate
>>>> thread
>>>> to the BGP solver.  Timed gain ... just 10%.  This was on a early
>>>> generation 2 CPU, 2 processor machine so the cost of threading was
>>>> huge.
>>>>    Most users then did not have a multi-anything machine. It lead to
>>>> lots
>>>> of problems with thread management when Java wasn't what it is today.
>>>
>>>
>>>
>>>
>>
>
>
>
>


Re: Parallelism in query execution

Posted by Rob Vesse <rv...@dotnetrdf.org>.
The other place huge UNIONs can appear is if you want to try and flatten
out property paths queries involving +, - and * into approximated queries
that yield the same results

I don't think Jena does this but we at Cray certainly provide users the
option to do this

I like the idea of maybe buffering slightly and then intelligently
deciding what kind of join to use.  However this probably needs to take
into account the overall query structure because if you have a small
LIMIT/OFFSET present then you want to do the minimum work possible so
don't want to compute an unnecessary hash join when an index join could
have yielded the one result you needed much faster.

Rob

On 02/02/2016 14:17, "Andy Seaborne" <an...@apache.org> wrote:

>Rob,
>
>Thanks for the details.  Lots of good stuff in there.
>
>I am thinking about both tweaking current execution and for a new
>(hypothetical) execution engine more targeted at analytics workloads, say.
>
>I have seem some truly monstrous UNIONs (1000s), and FILTER (NOT) IN
>leads to unions as well.
>
>The case of a IFP yielding one result, and using that as an index join
>is somewhat import. A small amount of buffering probably would not hurt
>the execution and make it adapt to large LHS matches. One thing that
>might be interesting is switch between index and hash join by sniffing
>the input stream and if it is below some threshold use index join and do
>a block hash join on a few multiple threads above the threshold.  Or a
>full-blown parallel hash join (takes twice the working state - it
>retains up to about twice the size of the smaller side of the join).
>
>
>	Andy
>
>On 29/01/16 14:06, Rob Vesse wrote:
>> Andy
>>
>> Parallelism is used in a couple of main places, bear in mind that
>> dotNetRDF uses a block based engine rather than a streaming engine (I.e.
>> no OpSequence, OpConditional) although it does do a form of index join
>>to
>> avoid unnecessary work wherever possible.  The main areas in which it
>>uses
>> parallelism are joins and filters.  So the unit of work is over a
>> multi-set of possible solutions in SPARQL spec parlance.
>>
>> For joins since we are joining (or left joining or minus-ing) the
>>results
>> of two operators we essentially do a parallelized hash join between the
>> two sides.  The hash table is built in serial from the LHS results and
>> then we parallelise over the RHS results doing look ups into the hash
>> table and outputting the join results in parallel.  You have to be
>>careful
>> about the data structures used to avoid threads stomping each others
>> results but this isn't too difficult.
>>
>> For filters we do something much closer to vectorization where
>>essentially
>> we parallelise the evaluation of the expression over all the possible
>> solutions (again we're a blocking engine) and recombine the results
>> afterwards.
>>
>> In terms of control in the .Net world we benefit from PLinq which are
>>CLR
>> supplied extensions to the basic Linq constructs (aka Streams in Java 8)
>> that automatically parallelize according to the available resources on
>>the
>> machine.  I'm not sure how smart this is but at least in the .Net world
>> you can constrain it if you want.
>>
>> For Jena which is a predominantly streaming engine I don't see either of
>> these approaches providing great benefits.  They could be used in the
>> cases where ARQ does have to do block evaluation to improve things
>>subject
>> to the points about how best to control the level of parallelism.  The
>> other area where there may be some benefit is to consider the case of
>> queries with UNION constructs, particularly for queries where we are
>>doing
>> lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may
>>be
>> possible to dispatch multiple branches of the UNION in parallel.
>>
>> Rob
>>
>> On 29/01/2016 11:49, "Andy Seaborne" <an...@apache.org> wrote:
>>
>>> Rob,
>>>
>>> In dotNetRDF, there is parallel execution, isn't there?
>>>
>>> I have been thinking (toying with) the idea of parallel execution and I
>>> wondered what unit of work is for the parallelism in dotNetRDF.
>>>
>>> What little thinking I've done suggests that tapping into the
>>> parallelism in java streams is not the right way to do it (which is a
>>> shame as that's less work).  It needs more control and probably larger
>>> units of work. There is a danger that small/fast queries slow down due
>>> to too much thinking.
>>>
>>> It needs more control as well to limit how much of the machine it will
>>> take over because, in Fuseki, it might lead to starvation of other
>>> requests.  As some usage is " many clients, many small requests",
>>> parallelism can impact the the system negatively as well as positively.
>>> At some point, the limitation will be the connection of CPUs to RAM
>>> rather than cycles.
>>>
>>>      Andy
>>>
>>> Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>>> query is a BGP+Filter and not more.  The filter ran on a separate
>>>thread
>>> to the BGP solver.  Timed gain ... just 10%.  This was on a early
>>> generation 2 CPU, 2 processor machine so the cost of threading was
>>>huge.
>>>   Most users then did not have a multi-anything machine. It lead to
>>>lots
>>> of problems with thread management when Java wasn't what it is today.
>>
>>
>>
>>
>





Re: Parallelism in query execution

Posted by Andy Seaborne <an...@apache.org>.
Rob,

Thanks for the details.  Lots of good stuff in there.

I am thinking about both tweaking current execution and for a new 
(hypothetical) execution engine more targeted at analytics workloads, say.

I have seem some truly monstrous UNIONs (1000s), and FILTER (NOT) IN 
leads to unions as well.

The case of a IFP yielding one result, and using that as an index join 
is somewhat import. A small amount of buffering probably would not hurt 
the execution and make it adapt to large LHS matches. One thing that 
might be interesting is switch between index and hash join by sniffing 
the input stream and if it is below some threshold use index join and do 
a block hash join on a few multiple threads above the threshold.  Or a 
full-blown parallel hash join (takes twice the working state - it 
retains up to about twice the size of the smaller side of the join).


	Andy

On 29/01/16 14:06, Rob Vesse wrote:
> Andy
>
> Parallelism is used in a couple of main places, bear in mind that
> dotNetRDF uses a block based engine rather than a streaming engine (I.e.
> no OpSequence, OpConditional) although it does do a form of index join to
> avoid unnecessary work wherever possible.  The main areas in which it uses
> parallelism are joins and filters.  So the unit of work is over a
> multi-set of possible solutions in SPARQL spec parlance.
>
> For joins since we are joining (or left joining or minus-ing) the results
> of two operators we essentially do a parallelized hash join between the
> two sides.  The hash table is built in serial from the LHS results and
> then we parallelise over the RHS results doing look ups into the hash
> table and outputting the join results in parallel.  You have to be careful
> about the data structures used to avoid threads stomping each others
> results but this isn't too difficult.
>
> For filters we do something much closer to vectorization where essentially
> we parallelise the evaluation of the expression over all the possible
> solutions (again we're a blocking engine) and recombine the results
> afterwards.
>
> In terms of control in the .Net world we benefit from PLinq which are CLR
> supplied extensions to the basic Linq constructs (aka Streams in Java 8)
> that automatically parallelize according to the available resources on the
> machine.  I'm not sure how smart this is but at least in the .Net world
> you can constrain it if you want.
>
> For Jena which is a predominantly streaming engine I don't see either of
> these approaches providing great benefits.  They could be used in the
> cases where ARQ does have to do block evaluation to improve things subject
> to the points about how best to control the level of parallelism.  The
> other area where there may be some benefit is to consider the case of
> queries with UNION constructs, particularly for queries where we are doing
> lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may be
> possible to dispatch multiple branches of the UNION in parallel.
>
> Rob
>
> On 29/01/2016 11:49, "Andy Seaborne" <an...@apache.org> wrote:
>
>> Rob,
>>
>> In dotNetRDF, there is parallel execution, isn't there?
>>
>> I have been thinking (toying with) the idea of parallel execution and I
>> wondered what unit of work is for the parallelism in dotNetRDF.
>>
>> What little thinking I've done suggests that tapping into the
>> parallelism in java streams is not the right way to do it (which is a
>> shame as that's less work).  It needs more control and probably larger
>> units of work. There is a danger that small/fast queries slow down due
>> to too much thinking.
>>
>> It needs more control as well to limit how much of the machine it will
>> take over because, in Fuseki, it might lead to starvation of other
>> requests.  As some usage is " many clients, many small requests",
>> parallelism can impact the the system negatively as well as positively.
>> At some point, the limitation will be the connection of CPUs to RAM
>> rather than cycles.
>>
>>      Andy
>>
>> Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>> query is a BGP+Filter and not more.  The filter ran on a separate thread
>> to the BGP solver.  Timed gain ... just 10%.  This was on a early
>> generation 2 CPU, 2 processor machine so the cost of threading was huge.
>>   Most users then did not have a multi-anything machine. It lead to lots
>> of problems with thread management when Java wasn't what it is today.
>
>
>
>


Re: Parallelism in query execution

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Andy

Parallelism is used in a couple of main places, bear in mind that
dotNetRDF uses a block based engine rather than a streaming engine (I.e.
no OpSequence, OpConditional) although it does do a form of index join to
avoid unnecessary work wherever possible.  The main areas in which it uses
parallelism are joins and filters.  So the unit of work is over a
multi-set of possible solutions in SPARQL spec parlance.

For joins since we are joining (or left joining or minus-ing) the results
of two operators we essentially do a parallelized hash join between the
two sides.  The hash table is built in serial from the LHS results and
then we parallelise over the RHS results doing look ups into the hash
table and outputting the join results in parallel.  You have to be careful
about the data structures used to avoid threads stomping each others
results but this isn't too difficult.

For filters we do something much closer to vectorization where essentially
we parallelise the evaluation of the expression over all the possible
solutions (again we're a blocking engine) and recombine the results
afterwards.

In terms of control in the .Net world we benefit from PLinq which are CLR
supplied extensions to the basic Linq constructs (aka Streams in Java 8)
that automatically parallelize according to the available resources on the
machine.  I'm not sure how smart this is but at least in the .Net world
you can constrain it if you want.

For Jena which is a predominantly streaming engine I don't see either of
these approaches providing great benefits.  They could be used in the
cases where ARQ does have to do block evaluation to improve things subject
to the points about how best to control the level of parallelism.  The
other area where there may be some benefit is to consider the case of
queries with UNION constructs, particularly for queries where we are doing
lazy evaluation (there's a LIMIT and/or OFFSET, ASK etc.) where it may be
possible to dispatch multiple branches of the UNION in parallel.

Rob

On 29/01/2016 11:49, "Andy Seaborne" <an...@apache.org> wrote:

>Rob,
>
>In dotNetRDF, there is parallel execution, isn't there?
>
>I have been thinking (toying with) the idea of parallel execution and I
>wondered what unit of work is for the parallelism in dotNetRDF.
>
>What little thinking I've done suggests that tapping into the
>parallelism in java streams is not the right way to do it (which is a
>shame as that's less work).  It needs more control and probably larger
>units of work. There is a danger that small/fast queries slow down due
>to too much thinking.
>
>It needs more control as well to limit how much of the machine it will
>take over because, in Fuseki, it might lead to starvation of other
>requests.  As some usage is " many clients, many small requests",
>parallelism can impact the the system negatively as well as positively.
>At some point, the limitation will be the connection of CPUs to RAM
>rather than cycles.
>
>     Andy
>
>Historical note: RDQL had true parallelism once-upon-a-time.  An RDQL
>query is a BGP+Filter and not more.  The filter ran on a separate thread
>to the BGP solver.  Timed gain ... just 10%.  This was on a early
>generation 2 CPU, 2 processor machine so the cost of threading was huge.
>  Most users then did not have a multi-anything machine. It lead to lots
>of problems with thread management when Java wasn't what it is today.