You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2011/12/23 01:24:07 UTC

[lucy-dev] Improving the ClusterSearcher

Henry C. nudged me to have a look at the ClusterSearcher and here are 
two quick improvements that I came up with:

Process ClusterSearcher RPCs in parallel
https://issues.apache.org/jira/browse/LUCY-204

Parallel processing for SearchServer
https://issues.apache.org/jira/browse/LUCY-205

The latter makes use of the excellent perl module Net::Server.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 28/12/11 20:43, Marvin Humphrey wrote:
> On Tue, Dec 27, 2011 at 11:22:57PM +0100, Nick Wellnhofer wrote:
>>> I think this problem is solved if we move the code establishing the listening
>>> socket from SearchServer#new to the top of SearchServer#serve.
>>
>> Yes, that should work. It's not ideal but maybe it's the best
>> compromise. I'll give it a try.
>
> What aspects continue to trouble you?

Only things you just mentioned like the port argument.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Dec 27, 2011 at 11:22:57PM +0100, Nick Wellnhofer wrote:
>> I think this problem is solved if we move the code establishing the listening
>> socket from SearchServer#new to the top of SearchServer#serve.
>
> Yes, that should work. It's not ideal but maybe it's the best  
> compromise. I'll give it a try.

What aspects continue to trouble you?

I'm quite happy with the refinements we've made. :)  By exposing
handle_request(), we fix an important problem with SearchServer: decisions
about forking/pre-forking etc. really belong in the realm of the application,
not the library.  Now it will be possible for application developers to make
their own call without monkey-patching SearchServer.pm.

Even better, that improvement has been achieved without exposing any
implementation details of the protocol used by LucyX::Remote classes to encode
messages sent over socket connections.  We are free to evolve our
serialization techniques along the lines of what was suggested on this list a
few weeks ago without concern for breaking public APIs.

I can think of a couple minor tweaks I'd like to see.  It would be nice to
move the "port" argument from SearchServer#new to SearchServer#serve, and it
would be nice to do away with "password" entirely for all LucyX::Remote
classes.

It would also be nice if SearchServer#serve had better out-of-the-box
performance degradation characteristics under increasing load, but I don't
think that's a requirement.  We can offer a clear path forward by documenting
how to roll-your-own solution with handle_request().

There are lots of problems yet to be solved with regards to using Lucy in a
clustered environment -- we haven't tackled indexing or updates at all -- but
I think we're making excellent forward progress while avoiding painting
ourselves into a corner.  Thanks for taking this on!

Marvin Humphrey



Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 27/12/11 23:05, Marvin Humphrey wrote:
> On Tue, Dec 27, 2011 at 10:15:01PM +0100, Nick Wellnhofer wrote:
>> First, we can't use the SearchServer constructor because we don't want the
>> sockets to be created there.
>
> I missed this the first time around.
>
> I think this problem is solved if we move the code establishing the listening
> socket from SearchServer#new to the top of SearchServer#serve.

Yes, that should work. It's not ideal but maybe it's the best 
compromise. I'll give it a try.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Dec 27, 2011 at 10:15:01PM +0100, Nick Wellnhofer wrote:
> First, we can't use the SearchServer constructor because we don't want the
> sockets to be created there. 

I missed this the first time around.

I think this problem is solved if we move the code establishing the listening
socket from SearchServer#new to the top of SearchServer#serve.

Marvin Humphrey


Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 27/12/11 22:31, Marvin Humphrey wrote:
> On Tue, Dec 27, 2011 at 10:15:01PM +0100, Nick Wellnhofer wrote:
>> On 27/12/11 21:33, Marvin Humphrey wrote:
>>> Ah, I see that you have a process_request() method in the LUCY-205 patch
>>> already -- and though it does not take a socket-handle/fileno as an argument,
>>> it can be modified easily to do so.
>>
>> That's an implementaion detail of Net::Server. To use Net::Server, you
>> simply subclass it and implement a process_request method.
>
> You can write a Net::Server::PreFork subclass which has-a
> LucyX::Search::SearchServer and implements process_request() like so:
>
>    sub process_request {
>        my $self = shift;
>        my $client_sock = $self->get_property('client');
>        $self->{search_server}->handle_request($client_sock);
>    }

Yes, that's exactly what I want to do. But we still have to instantiate 
the SearchServer somewhere, and we can't use the current SearchServer 
constructor because it would create additional sockets.

>> Another solution would be to simply duplicate the SearchServer request
>> handling code in an external module. That's not very elegant but maybe
>> it's the easiest way to go, at least for now.
>
> Isn't that pretty much what we get if we implement handle_request($sock) as a
> method on SearchServer?

It's mainly the SearchServer constructor that doesn't play well with 
subclassing.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Dec 27, 2011 at 10:15:01PM +0100, Nick Wellnhofer wrote:
> On 27/12/11 21:33, Marvin Humphrey wrote:
>> Ah, I see that you have a process_request() method in the LUCY-205 patch
>> already -- and though it does not take a socket-handle/fileno as an argument,
>> it can be modified easily to do so.
>
> That's an implementaion detail of Net::Server. To use Net::Server, you  
> simply subclass it and implement a process_request method.

You can write a Net::Server::PreFork subclass which has-a
LucyX::Search::SearchServer and implements process_request() like so:

  sub process_request {
      my $self = shift;
      my $client_sock = $self->get_property('client');
      $self->{search_server}->handle_request($client_sock);
  }

>> I think that means that we cannot simply delete SearchServer#serve -- though
>> we can improve on things by making it possible to override serve() or
>> otherwise avoid it.
>
> IMO subclassing SearchServer as it is now wouldn't be the best solution  
> if it's possible at all. 

It's not possible to subclass it in the context of Net::Server, but you can
e.g. write a simple subclass which invokes fork() on each request.

> First, we can't use the SearchServer  
> constructor because we don't want the sockets to be created there. Then,  
> AFAIU classes derived from Lucy::Object::Obj must be inside-out. Is it  
> safe to subclass additional Perl classes like Net::Server that use their  
> own hashref attributes in the traditional way?

No, that won't work.  (But hash-based classes are dangerous to subclass too
because of collisions within the hash's flat namespace.)

> Another solution would be to simply duplicate the SearchServer request  
> handling code in an external module. That's not very elegant but maybe  
> it's the easiest way to go, at least for now.

Isn't that pretty much what we get if we implement handle_request($sock) as a
method on SearchServer?

Marvin Humphrey


Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 27/12/11 21:33, Marvin Humphrey wrote:
> Ah, I see that you have a process_request() method in the LUCY-205 patch
> already -- and though it does not take a socket-handle/fileno as an argument,
> it can be modified easily to do so.

That's an implementaion detail of Net::Server. To use Net::Server, you 
simply subclass it and implement a process_request method.

> I think that means that we cannot simply delete SearchServer#serve -- though
> we can improve on things by making it possible to override serve() or
> otherwise avoid it.

IMO subclassing SearchServer as it is now wouldn't be the best solution 
if it's possible at all. First, we can't use the SearchServer 
constructor because we don't want the sockets to be created there. Then, 
AFAIU classes derived from Lucy::Object::Obj must be inside-out. Is it 
safe to subclass additional Perl classes like Net::Server that use their 
own hashref attributes in the traditional way?

Another solution would be to simply duplicate the SearchServer request 
handling code in an external module. That's not very elegant but maybe 
it's the easiest way to go, at least for now.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Dec 27, 2011 at 04:50:26PM +0100, Nick Wellnhofer wrote:
>> How about we expose a handle_request() method on SearchServer which takes a
>> socket handle as an argument, reads the incoming request and sends a response
>> back?
>>
>>    =head2 handle_request
>>
>>       $search_server->handle_request($sock);
>>
>>    Process a request from a socket handle which is ready for reading.
>
> That's what I had in mind.

Ah, I see that you have a process_request() method in the LUCY-205 patch
already -- and though it does not take a socket-handle/fileno as an argument,
it can be modified easily to do so.

If we make that method public and have it accept a socket-handle/fileno, then
it becomes possible to implement any number of server configurations.

> But if we want to use Net::Server, we can't use SearchServer's constructor
> in its current form.

Well, it's my understanding that it would violate Apache legal policy if we
were to distribute a subclass of Net::Server as part of Apache Lucy.  

So, instead, let's do something even better!  Let's give our users the freedom
to use not only Net::Server::PreFork, but lots of other options.

> So I would split SearchServer into two classes.

If you or somebody else wants to publish a CPAN distro that provides
SearchServer capabilities in conjunction with Net::Server::PreFork, that's
cool.  You get to make your own call on licensing that way.

We just have to make sure that we are in compliance with this guideline:

    http://www.apache.org/legal/resolved.html#optional

    Optional means that the component is not required for standard use of the
    product or for the product to achieve a desirable level of quality. The
    question to ask yourself in this situation is:

        "Will the majority of users want to use my product without adding the
        optional components?

I think that means that we cannot simply delete SearchServer#serve -- though
we can improve on things by making it possible to override serve() or
otherwise avoid it.

Marvin Humphrey


Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 24/12/2011 05:34, Marvin Humphrey wrote:
> On Fri, Dec 23, 2011 at 08:36:02PM +0100, Nick Wellnhofer wrote:
>> On 23/12/11 03:52, Marvin Humphrey wrote:
>>> Can we somehow make SearchServer pluggable or subclassable so that the user
>>> can supply routines for forking/preforking/etc?
>>
>> I think the easiest solution would be to move the guts of the request
>> handling from SearchServer to another module. This could then be used
>> from different server modules.
>
> How about we expose a handle_request() method on SearchServer which takes a
> socket handle as an argument, reads the incoming request and sends a response
> back?
>
>    =head2 handle_request
>
>       $search_server->handle_request($sock);
>
>    Process a request from a socket handle which is ready for reading.

That's what I had in mind. But if we want to use Net::Server, we can't 
use SearchServer's constructor in its current form. So I would split 
SearchServer into two classes.

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Dec 23, 2011 at 08:36:02PM +0100, Nick Wellnhofer wrote:
> On 23/12/11 03:52, Marvin Humphrey wrote:
>> Unfortunately, legally I think we have to hold off on applying this patch, for
>> licensing reasons.  Here's the licensing block from Net::Server:
>>
>>      This package may be distributed under the terms of either the
>>
>>        GNU General Public License
>>          or the
>>        Perl Artistic License
>>
>> Apache products cannot have mandatory dependencies on GPL'd components, so we
>> cannot use Net::Server under the terms of the GPL.  Usage under the terms of
>> the Artistic license has not been approved by Apache Legal.  See the following:
>>
>>    http://www.apache.org/legal/resolved.html for a list of approved licenses.
>>
>>    https://issues.apache.org/jira/browse/LEGAL-86 for an attempt to get
>>    Artistic-1 added to the list of approved licenses, which resulted in a
>>    temporary variance.
>>
>> Can we somehow make SearchServer pluggable or subclassable so that the user
>> can supply routines for forking/preforking/etc?
>
> I think the easiest solution would be to move the guts of the request  
> handling from SearchServer to another module. This could then be used  
> from different server modules.

How about we expose a handle_request() method on SearchServer which takes a
socket handle as an argument, reads the incoming request and sends a response
back?

  =head2 handle_request

     $search_server->handle_request($sock);
  
  Process a request from a socket handle which is ready for reading.

(We might need a timeout to deal with stuck handles.)

Users would be free to write their own serve() loop then.

It would be acceptable to provide sample code in the docs which illustrates
how to use SearchServer in conjunction with Net::Server, though it would be
better if we were to either provide a superior solution on our own or to
promote a liberally licensed alternative instead of Net::Server.

> Do I understand correctly that if we keep the current SearchServer, we can
> also ship another module based on  Net::Server because then it's not a
> mandatory dependency?

My reply had a bug in it.  To clarify: I don't believe that an Apache product
can have a dependency on a GPL'd component, period.

The FSF's position on derivative works is that if you reference a component,
your software derives from it, and since the GPL applies at the boundary of
derivative works, if you use a GPL'd component, the GPL's copyleft provisions
kick in[1].  Some people consider this position legal nonsense, but the ASF
chooses to respect the FSF's wishes nonetheless, so that the licensing of our
products is not contentious and downstream consumers of ASF products can rest
easy.

It's possible, even likely, that the authors of Net::Server do not feel as
strongly about copyleft as the FSF.  The Artistic License -- a weak copyleft
license which has reciprocity provisions regarding the central work but allows
unmodified usage within proprietary software -- better reflects Larry Wall's
wishes, and perhaps the wishes of many within the Perl community.  But the
Artistic License 1.0 is a murky, muddled piece of crap, and very few projects
are actually licensed under the Artistic License 2.0.  And the GPL is what it
is.

Under certain strict conditions, Apache projects can have dependencies
licensed under the LGPL or other prohibited licenses, but not GPL:

  http://www.apache.org/legal/resolved.html#prohibited

  Can Apache projects rely on components under prohibited licenses?

    Apache projects cannot distribute any such components. As with the
    previous question on platforms, the component can be relied on if the
    component's licence terms do not affect the Apache product's licensing.
    For example, using a GPL'ed tool during the build is OK.

  Can Apache projects rely on components whose licensing affects the Apache
  product?

    Apache projects cannot distribute any such components. However, if the
    component is only needed for optional features, a project can provide
    the user with instructions on how to obtain and install the
    non-included work.  Optional means that the component is not required
    for standard use of the product or for the product to achieve a
    desirable level of quality. The question to ask yourself in this
    situation is:

      * "Will the majority of users want to use my product without adding the
        optional components?

In conclusion:

  * It would be nice if most of CPAN wasn't licensed under
    legally-lousy-Artistic/draconian-copyleft-GPL, but we're stuck with the
    consequences of that dubious decision from a long time ago.  (I'm glad
    that a lot of Ruby gems are licensed under MIT.)
  * Dealing with copyleft licenses is a PITA, and so if we can solve a
    problem any other way we should.

Marvin Humphrey

[1] http://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySystem


Re: [lucy-dev] Improving the ClusterSearcher

Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/12/11 03:52, Marvin Humphrey wrote:
> Unfortunately, legally I think we have to hold off on applying this patch, for
> licensing reasons.  Here's the licensing block from Net::Server:
>
>      This package may be distributed under the terms of either the
>
>        GNU General Public License
>          or the
>        Perl Artistic License
>
> Apache products cannot have mandatory dependencies on GPL'd components, so we
> cannot use Net::Server under the terms of the GPL.  Usage under the terms of
> the Artistic license has not been approved by Apache Legal.  See the following:
>
>    http://www.apache.org/legal/resolved.html for a list of approved licenses.
>
>    https://issues.apache.org/jira/browse/LEGAL-86 for an attempt to get
>    Artistic-1 added to the list of approved licenses, which resulted in a
>    temporary variance.
>
> Can we somehow make SearchServer pluggable or subclassable so that the user
> can supply routines for forking/preforking/etc?

I think the easiest solution would be to move the guts of the request 
handling from SearchServer to another module. This could then be used 
from different server modules. Do I understand correctly that if we keep 
the current SearchServer, we can also ship another module based on 
Net::Server because then it's not a mandatory dependency?

Nick

Re: [lucy-dev] Improving the ClusterSearcher

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Dec 23, 2011 at 01:24:07AM +0100, Nick Wellnhofer wrote:
>
> Henry C. nudged me to have a look at the ClusterSearcher and here are  
> two quick improvements that I came up with:
>
> Process ClusterSearcher RPCs in parallel
> https://issues.apache.org/jira/browse/LUCY-204

+1, great stuff!  I've left some feedback in the JIRA comments, but it's all
minor implementation tweaks, so no need to discuss at length here.

> Parallel processing for SearchServer
> https://issues.apache.org/jira/browse/LUCY-205
>
> The latter makes use of the excellent perl module Net::Server.

Hmm, this is a little harder.  

Technically, I'm +0.5 in favor of this change.  Maybe I would have liked to
see things done a little differently, and without the dependency -- but it's a
sorely needed step forward.

Unfortunately, legally I think we have to hold off on applying this patch, for
licensing reasons.  Here's the licensing block from Net::Server:

    This package may be distributed under the terms of either the

      GNU General Public License
        or the
      Perl Artistic License

Apache products cannot have mandatory dependencies on GPL'd components, so we
cannot use Net::Server under the terms of the GPL.  Usage under the terms of
the Artistic license has not been approved by Apache Legal.  See the following:

  http://www.apache.org/legal/resolved.html for a list of approved licenses.

  https://issues.apache.org/jira/browse/LEGAL-86 for an attempt to get
  Artistic-1 added to the list of approved licenses, which resulted in a
  temporary variance.

Can we somehow make SearchServer pluggable or subclassable so that the user
can supply routines for forking/preforking/etc?

Marvin Humphrey