You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by goran kent <go...@gmail.com> on 2011/11/14 11:59:30 UTC

[lucy-user] Concurrent searching

Hi,

...at the risk of being swatted down for beating this poor horse again...

I know it's not the done thing for a user (especially an
early-adopter) to ask this of a FOSS project, but I don't suppose it
would be possible to get an indication, even if rough, of when the
concurrent search issue will be addressed?  /me does self-deprecating
bow with hand-flourish  :)

Sadly, for us, to use bugzilla parlance, it's a major blocker.  We
*could* try and hack away with ignorant gusto and slap something
together, but it will be an ugly pulsing tumour, and never work as
well as a Lucy Progenitor could make it work,  if at all.

If these two could be addressed:

-  LucyX::Remote::SearchClient to perform remote searches in parallel
as opposed to serially, and

-  LucyX::Remote::SearchServer to fork on each new client or otherwise
allow multiple search clients at once (ie, typical TCP/IP
client/server behaviour)

...it would allow us to move forward with renewed momentum.  The
proposed ClusterSearcher sounds like the ideal long-term solution, but
just having the existing remote search work as expected would be
über-fantastic.

Otherwise, we're a little bit buggered.

cheers

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Mon, Nov 14, 2011 at 3:15 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> As of this moment (r1201554), ClusterSearcher's interface, documentation, and
> result aggregation logic are done....

dayamn!

> -- you can start test-driving ClusterSearcher now if you like.

Thank you so much.  I'll get onto it asap.

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Fri, Nov 18, 2011 at 4:14 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> Then fails with:
>>  at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
>> line 104
>>       LucyX::Remote::SearchServer::serve('LucyX::Remote::SearchServer=SCALAR(0xb8d5190)')
>> called at ./lucy_remote_search_server line 212
>>
>> The client fails with:
>> Use of uninitialized value in numeric eq (==) at
>> /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/ClusterSearcher.pm
>> line 158.
>
> This is almost certainly happening because we have enabled non-blocking i/o
> but not yet taken all the necessary precautions to detect and retry when
> reads/writes do not succeed.  I expect to work on this soon.  In the meantime,
> I suggest commenting out one line in ClusterSearcher.pm (only needed on the
> client node):
>
>    +++ b/perl/lib/LucyX/Remote/ClusterSearcher.pm
>    @@ -53,7 +53,7 @@ sub new {
>             my $sock = IO::Socket::INET->new(
>                 PeerAddr => $shard,
>                 Proto    => 'tcp',
>    -            Blocking => 0,
>    +            #Blocking => 0,
>             );

OK, unfortunately it's failing with exactly the same error despite the change.

I'll monitor the commit list and try not to bug you ;-)

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Fri, Nov 18, 2011 at 4:14 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> The internal application protocol changed incompatibly.  Sorry, this is part
> of living on trunk.  It would be ideal if we could support rolling updates
> during development, but particularly at this stage, imposing that constraint
> would slow down innovation, wouldn't be 100% reliable, and wouldn't always be
> practical in any case.

No worries at all - this is to be expected.

> This is almost certainly happening because we have enabled non-blocking i/o
> but not yet taken all the necessary precautions to detect and retry when
> reads/writes do not succeed.  I expect to work on this soon.  In the meantime,
> I suggest commenting out one line in ClusterSearcher.pm (only needed on the
> client node):
>
>    +++ b/perl/lib/LucyX/Remote/ClusterSearcher.pm
>    @@ -53,7 +53,7 @@ sub new {
>             my $sock = IO::Socket::INET->new(
>                 PeerAddr => $shard,
>                 Proto    => 'tcp',
>    -            Blocking => 0,
>    +            #Blocking => 0,
>             );
>
> I'll let you know when I think it's safe to make the i/o non-blocking once again.


cool.

Thanks for the feedback!

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 1:25 PM, goran kent <go...@gmail.com> wrote:
> Suggested patch:
>
> - confess $! unless $check_val == $len;
> + confess "packet length mismatch: $!" unless $check_val == $len;


Another suggested patch:

- confess $! unless $check_val == 4;
+ confess "mangled packet header: $!" unless $check_val == 4;

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 2:31 PM, goran kent <go...@gmail.com> wrote:
> my $ret;
> do {
>     $ret = $client_sock->sysread( $buf, $len );
>      last if not $ret;
>      $check_val += $ret;
> } while $ret;

Using $offset and checks to make sure sysread doesn't stray into
trailing transactions, of course... my Perlish is failing me here.

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 2:06 PM, goran kent <go...@gmail.com> wrote:
> What the hell is causing $client_sock->sysread( $buf, 6959) to only
> return 2892 bytes!?

PERLFUNC(1) for sysread says "*Attempts* to read LENGTH bytes of
data..." [emphasis mine] which means there might be unread chunks of
bytes.  It looks like the 6k client request is being sent as two or
more chunks, so sysread needs a loop to get it all:

my $ret;
do {
     $ret = $client_sock->sysread( $buf, $len );
      last if not $ret;
      $check_val += $ret;
} while $ret;

or some such, instead of just

$check_val = $client_sock->sysread( $buf, $len );

in SearchServer::serve.

Re: [lucy-user] Concurrent searching

Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/11/2011 13:06, goran kent wrote:
> What the hell is causing $client_sock->sysread( $buf, 6959) to only
> return 2892 bytes!?

That's completely normal. From the recv(2) man page: "The receive calls 
normally return any data available, up to the requested amount, rather 
than waiting for receipt of the full amount requested."

Nick

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 1:50 PM, goran kent <go...@gmail.com> wrote:
> could be autoflush or some such buffering bugger-up.

Nope, tried $sock->autoflush(1)  (even though docs say this is active
already as of v1.18), and $sock->flush.

...and further reading reveals that Perl's syswrite/sysread don't use
buffering anyway making them slower, but more reliable.... :|

What the hell is causing $client_sock->sysread( $buf, 6959) to only
return 2892 bytes!?

This is bog standard syswrite/read stuff, nothing wrong with the code AFAICS.

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
syswrite() in _send_request_to_shard returns 6963 bytes written, so it
seems syswrite() believes the packet is being sent to the server ok.

$check_val = $client_sock->sysread( $buf, 4 );
...always returns 4 in serve(), so at least the start of the packet
looks good, so it passes the two tests on $check_val.

$len = unpack( 'N', $buf );
...returns 6959 (6963-4 for the check_val) which matches what the
client is sending, so we know unpack is brimming with goodness.


could be autoflush or some such buffering bugger-up.

/me starts googling.

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Nov 24, 2011 at 01:58:08PM +0200, goran kent wrote:
> On Wed, Nov 23, 2011 at 10:35 PM, Marvin Humphrey
> <ma...@rectangular.com> wrote:
> > Something like this:
> >
> >    my @responses;
> >    for (my $i = 0; $i < $num_shards; $i++) {
> >        my $response  = $self->_retrieve_response_from_shard($i);
> >        $responses[$i] = $response->{retval};
> >    }
> 
> Thanks Marvin - that works quite well actually.  My famous 20s search
> time is now subsecond.

OK, now we're getting somewhere. :)

I've updated trunk to apply this technique and use blocking i/o.  I don't plan
further experimentation with non-blocking i/o and select() loops until after
we release 0.3.0.

Marvin Humphrey



Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 10:35 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> Something like this:
>
>    my @responses;
>    for (my $i = 0; $i < $num_shards; $i++) {
>        my $response  = $self->_retrieve_response_from_shard($i);
>        $responses[$i] = $response->{retval};
>    }

Thanks Marvin - that works quite well actually.  My famous 20s search
time is now subsecond.

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Thu, Nov 24, 2011 at 12:17 PM, Nick Wellnhofer <we...@aevum.de> wrote:
>> bloody $sock->write doesn't return number of bytes written (like C), but
>> 1...
>
> $sock->write uses print internally, which should take care of partial
> writes. I think it's safe to assume that all bytes are written to the
> socket.

Thanks, yes, I eventually found out about that little detail.  I've
disabled the post-write() confess checks on both ends, and it looks
more promising.

Re: [lucy-user] Concurrent searching

Posted by Nick Wellnhofer <we...@aevum.de>.
On 24/11/2011 09:50, goran kent wrote:
> On Wed, Nov 23, 2011 at 10:35 PM, Marvin Humphrey
> <ma...@rectangular.com>  wrote:
>> Something like this:
>>
>>     my @responses;
>>     for (my $i = 0; $i<  $num_shards; $i++) {
>>         my $response  = $self->_retrieve_response_from_shard($i);
>>         $responses[$i] = $response->{retval};
>>     }
>
> bloody $sock->write doesn't return number of bytes written (like C), but 1...

$sock->write uses print internally, which should take care of partial 
writes. I think it's safe to assume that all bytes are written to the 
socket.

Nick


Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 10:35 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:
> Something like this:
>
>    my @responses;
>    for (my $i = 0; $i < $num_shards; $i++) {
>        my $response  = $self->_retrieve_response_from_shard($i);
>        $responses[$i] = $response->{retval};
>    }

bloody $sock->write doesn't return number of bytes written (like C), but 1...

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, Nov 23, 2011 at 03:56:05PM +0200, goran kent wrote:
> Presumably you're referring to this section in _multi_rpc() which
> needs to change?:
> 
>     my @responses;
>     my $remaining = $num_shards;
>     my $select    = $select{$$self};
>     my $sock_map  = $sock_map{$$self};
>     while ($remaining) {
>         my @ready = $select->can_read;
>         for my $sock ( @{ $ready[0] } ) {
>             my $shard_num = $sock_map->{"$sock"};
>             my $response  = $self->_retrieve_response_from_shard($shard_num);
>             $responses[$shard_num] = $response->{retval};
>             $remaining--;
>         }
>     }
> 

Something like this: 

    my @responses;
    for (my $i = 0; $i < $num_shards; $i++) {
        my $response  = $self->_retrieve_response_from_shard($i);
        $responses[$i] = $response->{retval};
    }

Marvin Humphrey


Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Wed, Nov 23, 2011 at 2:41 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
> Those confess() calls are placeholders
understood

>   * Change every sysread() to read(), and every syswrite() to write().
done

>   * Set $socket->autoflush(1);
done

>   * Make sure 'Blocking => 0' is commented out.
done

>   * Replace the select() loop with a "for" loop, because select() and
>     blocking i/o don't mix.
Not sure about this one - my head is still spinning with all those
hashes yer usin' - they are slowly starting to make sense though.

Presumably you're referring to this section in _multi_rpc() which
needs to change?:

    my @responses;
    my $remaining = $num_shards;
    my $select    = $select{$$self};
    my $sock_map  = $sock_map{$$self};
    while ($remaining) {
        my @ready = $select->can_read;
        for my $sock ( @{ $ready[0] } ) {
            my $shard_num = $sock_map->{"$sock"};
            my $response  = $self->_retrieve_response_from_shard($shard_num);
            $responses[$shard_num] = $response->{retval};
            $remaining--;
        }
    }

I'll need your help here, purty please :)

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, Nov 23, 2011 at 01:25:09PM +0200, goran kent wrote:
> Something is weird with the length for the top_docs packet.
> 
> In SearchServer::serve, ~line 106, the confess is chucking a null
> error because $check_val != $len, hence the meaningless error:
> 
> " at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
> line 106"
> 
> In ClusterSearcher::_serialize_request for top_docs
> length($serialized)==6959, but SearchServer::serve is receiving
> length==2892.
> 
> So, that's why SearchServer is failing.  What's causing the short send
> (or receive, or pack/unpack not co-operating across machines) will
> hopefully soon be revealed.

As we move away from blocking i/o, we need to manage buffers manually and be
prepared for partial success.  (Eventually we need to deal with timeouts and
failovers, because otherwise the system remains vulnerable to its weakest
link and hangs when a single node goes down -- but that's for later.)
 
> Suggested patch:
> 
> - confess $! unless $check_val == $len;
> + confess "packet length mismatch: $!" unless $check_val == $len;

Those confess() calls are placeholders, to be swapped out at some future
time with a less aggressive error reporting mechanism that does not take down
the server process.  The idea was to use confess() during early rapid
prototyping to flag each place a system call return value needs to be checked.

In some cases, including here, the code also needs to be refactored around
non-blocking i/o.  What we ultimately need to do is accept a partial read,
store the incomplete buffer, and return to waiting for the next ready socket.
The code will become more complicated because we'll have to keep multiple
buffers alive, but that's concurrency for ya.

For now though, try this:

   * Change every sysread() to read(), and every syswrite() to write().
   * Set $socket->autoflush(1);
   * Make sure 'Blocking => 0' is commented out.
   * Replace the select() loop with a "for" loop, because select() and
     blocking i/o don't mix.

What I'm hoping to do with those changes is return to forcing every socket
communication to block, restoring predictable program execution order.

Marvin Humphrey


Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
Something is weird with the length for the top_docs packet.

In SearchServer::serve, ~line 106, the confess is chucking a null
error because $check_val != $len, hence the meaningless error:

" at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
line 106"

In ClusterSearcher::_serialize_request for top_docs
length($serialized)==6959, but SearchServer::serve is receiving
length==2892.

So, that's why SearchServer is failing.  What's causing the short send
(or receive, or pack/unpack not co-operating across machines) will
hopefully soon be revealed.

Suggested patch:

- confess $! unless $check_val == $len;
+ confess "packet length mismatch: $!" unless $check_val == $len;

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Fri, Nov 18, 2011 at 4:14 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> Then fails with:
>>  at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
>> line 104
>>       LucyX::Remote::SearchServer::serve('LucyX::Remote::SearchServer=SCALAR(0xb8d5190)')
>> called at ./lucy_remote_search_server line 212
>>
>> The client fails with:
>> Use of uninitialized value in numeric eq (==) at
>> /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/ClusterSearcher.pm
>> line 158.
>
> This is almost certainly happening because we have enabled non-blocking i/o
> but not yet taken all the necessary precautions to detect and retry when
> reads/writes do not succeed.  I expect to work on this soon.  In the meantime,
> I suggest commenting out one line in ClusterSearcher.pm (only needed on the
> client node):
>
>    +++ b/perl/lib/LucyX/Remote/ClusterSearcher.pm
>    @@ -53,7 +53,7 @@ sub new {
>             my $sock = IO::Socket::INET->new(
>                 PeerAddr => $shard,
>                 Proto    => 'tcp',
>    -            Blocking => 0,
>    +            #Blocking => 0,
>             );

I've got some time to kill and would like to help debug what's
occurring between the client and server.

Any pointers on what I should look for?  The locus is obviously around
SearchServer::serve and ClusterSearcher::_multi_rpc and surrounding
brush.

I think I've narrowed it down to the server not handling the top_docs
method (handshake, doc_max, doc_freq all seem ok), or more likely, the
client not waiting for a response wrt top_docs - it also looks like
the server never actually *gets* the top_doc method request.

...more to follow.

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Fri, Nov 18, 2011 at 11:19:34AM +0200, goran kent wrote:
> More accurately, the remote servers are still using the older
> pre-ClusterSearcher code, which expects a cleartext pw.  New version
> expects the pw in a serialised packet, and possibly other important
> stuff as well...
 
The internal application protocol changed incompatibly.  Sorry, this is part
of living on trunk.  It would be ideal if we could support rolling updates
during development, but particularly at this stage, imposing that constraint
would slow down innovation, wouldn't be 100% reliable, and wouldn't always be
practical in any case.

I'll try to give you a heads-up when you can cheat and only update the node
running ClusterSearcher without updating the remotes.

> Have now upgraded to latest trunk on the remote machines, and here's
> the latest (testing with a single shard):
> 
> Then fails with:
>  at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
> line 104
> 	LucyX::Remote::SearchServer::serve('LucyX::Remote::SearchServer=SCALAR(0xb8d5190)')
> called at ./lucy_remote_search_server line 212
> 
> The client fails with:
> Use of uninitialized value in numeric eq (==) at
> /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/ClusterSearcher.pm
> line 158.

This is almost certainly happening because we have enabled non-blocking i/o
but not yet taken all the necessary precautions to detect and retry when
reads/writes do not succeed.  I expect to work on this soon.  In the meantime,
I suggest commenting out one line in ClusterSearcher.pm (only needed on the
client node):

    +++ b/perl/lib/LucyX/Remote/ClusterSearcher.pm
    @@ -53,7 +53,7 @@ sub new {
             my $sock = IO::Socket::INET->new(
                 PeerAddr => $shard,
                 Proto    => 'tcp',
    -            Blocking => 0,
    +            #Blocking => 0,
             );

I'll let you know when I think it's safe to make the i/o non-blocking once again.

Marvin Humphrey


Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Fri, Nov 18, 2011 at 9:50 AM, goran kent <go...@gmail.com> wrote:
> /then, stomps off for coffee and ginko biloba
>

Ginko biloba has kicked in and cause of grief is the password.  It's
mangled on receipt.

More accurately, the remote servers are still using the older
pre-ClusterSearcher code, which expects a cleartext pw.  New version
expects the pw in a serialised packet, and possibly other important
stuff as well...

Have now upgraded to latest trunk on the remote machines, and here's
the latest (testing with a single shard):

The server now emits my usual debug lines (from the serve loop):
DEBUG: method:[handshake]
DEBUG: method:[doc_max]
DEBUG: method:[doc_freq]    x31
...(no other methods reached)...

Then fails with:
 at /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi/LucyX/Remote/SearchServer.pm
line 104
	LucyX::Remote::SearchServer::serve('LucyX::Remote::SearchServer=SCALAR(0xb8d5190)')
called at ./lucy_remote_search_server line 212

The client fails with:
Use of uninitialized value in numeric eq (==) at
/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/ClusterSearcher.pm
line 158.

I'll keep sniffing around, but let me know if you need any other
details or tests.

cheers

Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Fri, Nov 18, 2011 at 1:24 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> What is in $@ after this eval?

Sorry, I neglected to check the basics.

ClusterSearcher->new failed: (No socket:  at
/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/ClusterSearcher.pm
line 58

However, something has broken somewhere, and I'm pulling my hair
out...  I tried the previously working remote client script
(LucyX::Remote::SearchClient), and that too now fails:

Failed to read 1885434739 bytes at
/usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/LucyX/Remote/SearchClient.pm
line 88

I even tried rolling back to revision 1203082 since that's the last
time things worked fine for me.  No luck, that does the same (same
error as above).

Tried rebooting (yes, I know, very MS of me, but desperation is a
corrupter) the client machine, to no avail.

I'm now sitting on my head and blowing bubbles.

/then, stomps off for coffee and ginko biloba

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Nov 17, 2011 at 10:37:58AM +0200, goran kent wrote:
> Missing required param searcher
> 	S_extract_from_sv at xs/XSBind.c line 467

It would seem that $searcher is undef.

> Here's the sequence, for completeness:
> 
> my $searcher = eval {LucyX::Remote::ClusterSearcher->new(schema =>
> $schema,shards => qw(...

What is in $@ after this eval?

Marvin Humphrey


Re: [lucy-user] Concurrent searching

Posted by goran kent <go...@gmail.com>.
On Mon, Nov 14, 2011 at 3:15 PM, Marvin Humphrey <ma...@rectangular.com> wrote:
>> -  LucyX::Remote::SearchClient to perform remote searches in parallel
>> as opposed to serially, and
>
> As of this moment (r1201554), ClusterSearcher's interface, documentation, and
> result aggregation logic are done.  The internals are not yet complete, but it
> should be an improvement over the PolySearcher/SearchClient combo.  At present
> it sends requests to all remote nodes before trying to retrieve the response from
> any, allowing the remotes to do their work in parallel -- that's better than
> PolySearcher, which had to wait on each remote's response before sending the
> next request.

using r1203082

I'm using QueryParser/parse/make_compiler, iirc, for the highlighting
issue, and when I make the changes for ClusterSearcher, I'm getting
this burp:

Missing required param searcher
	S_extract_from_sv at xs/XSBind.c line 467
	cfish_XSBind_allot_params at xs/XSBind.c line 536
	XS_Lucy_Search_ORQuery__make_compiler at lib/Lucy.xs line 15381
	at /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi/Lucy.pm line 331
	Lucy::Search::Query::make_compiler('Lucy::Search::ORQuery=SCALAR(0x8f42d60)',
'searcher', 'undef') called at ./search line 288

Line 288 is

my $query_compiler =
      $parsed_query->make_compiler( searcher => $searcher );
      #$parsed_query->make_compiler( searcher => $poly_searcher );

Should I roll back the dependency on $query_compiler (in
$searcher->hits() and Highlighter) since the highlighter bug is
pending (iirc), and you did say there's some backend stuff for
ClusterSearcher which is pending, or stick with it?

Here's the sequence, for completeness:

my $searcher = eval {LucyX::Remote::ClusterSearcher->new(schema =>
$schema,shards => qw(...
...
my $query_parser = Lucy::Search::QueryParser->new(...
...
my $parsed_query = $query_parser->parse($query);
my $query_compiler =
      $parsed_query->make_compiler( searcher => $searcher );
...
$hits = eval { $searcher->hits(query => $query_compiler,...
...
my $body_highlighter = Lucy::Highlight::Highlighter->new(
    searcher => $searcher, query    => $query_compiler,...

...et cetera


thanks

Re: [lucy-user] Concurrent searching

Posted by Nathan Kurz <na...@verse.com>.
On Mon, Nov 14, 2011 at 5:15 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
> On Mon, Nov 14, 2011 at 12:59:30PM +0200, goran kent wrote:
>> If these two could be addressed:
>>
>> -  LucyX::Remote::SearchClient to perform remote searches in parallel
>> as opposed to serially, and
>
> As of this moment (r1201554), ClusterSearcher's interface, documentation, and
> result aggregation logic are done.

Wow Marvin, that's fabulous!

>> -  LucyX::Remote::SearchServer to fork on each new client or otherwise
>> allow multiple search clients at once (ie, typical TCP/IP
>> client/server behaviour)
>
> Hacking fork() into the SearchServer#serve loop is probably the logical choice
> to solve your immediate problem -- we aren't giving you much to work with,
> after all!  But hiding a fork() call in a library function can't be our final
> solution for Lucy -- that takes SearchServer from dubiously architected to
> laughable. :)

Laughable, and as you point out not right for Core, but it might
actually work just fine for 10-or-so of requests per second.  Fork is
fast on Linux, and we don't actually want too many concurrent requests
happening at the same time.  We're presuming that these searches are
processor bound, so it would hurt us to have too many more than we
have cores.

But rather than forking in the handler, I think it might be easier to
write a wrapper that does the fork.  You might even be able to just
configure "inetd" to launch a new server every time you get a search
request.  If that doesn't handle it, POE (as Goran mentioned earlier)
wouldn't be a bad  choice. Or perhaps
http://search.cpan.org/~rhandom/Net-Server-0.94/lib/Net/Server.pm.

If you use one of those two,  you might as well prefork.  The idea is
that you fire off a bunch of servers, all listening on the same port,
and they fight to see who accepts first.  You need a manager process
that makes sure you have the right number of children, and to restart
them in case they die.  It's very smooth, and should work out of the
box with Marvin's code.

Good luck (he calls from the sidelines while Marvin does the real work),

--nate

Re: [lucy-user] Concurrent searching

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Nov 14, 2011 at 12:59:30PM +0200, goran kent wrote:
> If these two could be addressed:
> 
> -  LucyX::Remote::SearchClient to perform remote searches in parallel
> as opposed to serially, and

As of this moment (r1201554), ClusterSearcher's interface, documentation, and
result aggregation logic are done.  The internals are not yet complete, but it
should be an improvement over the PolySearcher/SearchClient combo.  At present
it sends requests to all remote nodes before trying to retrieve the response from
any, allowing the remotes to do their work in parallel -- that's better than
PolySearcher, which had to wait on each remote's response before sending the
next request.

In my local checkout, I've got a ClusterSearcher select() loop and
non-blocking socket i/o working which should theoretically allow the boss node
to do more while waiting for responses from the remotes, and potentially to
deal with timeouts and failovers at some point in the future.  I'm not done
tuning that yet, but you don't have to wait on it -- you can start
test-driving ClusterSearcher now if you like.
    
> -  LucyX::Remote::SearchServer to fork on each new client or otherwise
> allow multiple search clients at once (ie, typical TCP/IP
> client/server behaviour)

Hacking fork() into the SearchServer#serve loop is probably the logical choice
to solve your immediate problem -- we aren't giving you much to work with,
after all!  But hiding a fork() call in a library function can't be our final
solution for Lucy -- that takes SearchServer from dubiously architected to
laughable. :) 

I'm trying to figure out how to refactor SearchServer so that it continues to
encapsulate the protocol used for communicating with ClusterSearcher and
SearchClient, and continues to handle the socket communication and the
select() logic, but moves other logic to userland allowing forking servers,
preforked servers, etc.  Perhaps split up serve() into incoming() and
handle_request(), allowing userland code like this?

    my $searcher = Lucy::Search::IndexSearcher->new(index => $path);
    my $search_server = LucyX::Remote::SearchServer->new(
        port     => 7890,
        searcher => $searcher,
    );

    while ($search_server->incoming) {
        my $pid = fork();
        ...
        $search_server->handle_request;
        ...
    }

Marvin Humphrey