You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2017/02/23 03:52:40 UTC

[lucy-user] Custom Analyzer [was Chinese support?]

Marvin, Nick, others with more Lucy-fu than I possess:

The example below is failing. I have created a similar test case to create a PR 
but am having trouble running tests within my local git checkout at the moment.

----------------------------------------------------------
#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;
use Lucy;

package MyAnalyzer {
     use base qw( Lucy::Analysis::Analyzer );
     sub transform { $_[1] }
}

package main;

use Lucy::Plan::Schema;
use Lucy::Plan::FullTextType;
use Lucy::Index::Indexer;

my $path_to_index = shift(@ARGV) or die "usage: $0 path/to/index";

for my $try ( ( 1 .. 3 ) ) {
     my $schema = Lucy::Plan::Schema->new;

     my $my_analyzer = MyAnalyzer->new();

     my $raw_type = Lucy::Plan::FullTextType->new( analyzer => $my_analyzer, );

     $schema->spec_field( name => 'body', type => $raw_type );

     my $indexer = Lucy::Index::Indexer->new(
         index  => $path_to_index,
         schema => $schema,
         create => 1,
     );

     my $doc = { body => 'test' };
     $indexer->add_doc($doc);

     $indexer->commit;

     say "finished $try";
}
--------------------------------------------------------

Example above ^^ based on the gist below.

Hao Wu wrote on 2/20/17 11:40 PM:
> Hi Peter,
>
> Thanks for spending time in the script.
>
> I clean it up a bit, so there is no dependency now.
>
> https://gist.github.com/swuecho/1b960ae17a1f47466be006fd14e3b7ff
>
> still do not work.
>




-- 
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman

Re: [lucy-user] Custom Analyzer [was Chinese support?]

Posted by Hao Wu <ec...@gmail.com>.
Hi Peter,

works great. The document is significantly better now.

Thanks everyone for taking care of this issues.

Best,

Hao

On Thu, Feb 23, 2017 at 8:23 AM, Peter Karman <pe...@peknet.com> wrote:

> Nick Wellnhofer wrote on 2/23/17 6:49 AM:
>
>> On 23/02/2017 04:52, Peter Karman wrote:
>>
>>> package MyAnalyzer {
>>>     use base qw( Lucy::Analysis::Analyzer );
>>>     sub transform { $_[1] }
>>> }
>>>
>>
>> Every Analyzer needs an `equals` method. For simple Analyzers, it can
>> simply
>> check whether the class of the other object matches:
>>
>>     package MyAnalyzer {
>>         use base qw( Lucy::Analysis::Analyzer );
>>         sub transform { $_[1] }
>>         sub equals { $_[1]->isa(__PACKAGE__) }
>>     }
>>
>> If the Analyzer uses (inside-out) member variables, you'll also need dump
>> and
>> load methods. Unfortunately, we don't have good documentation for writing
>> custom
>> analyzers yet.
>>
>>
>
> Thanks for the quick response and accurate diagnosis, Nick. I see you've
> already committed changes to the POD so that will be very helpful in future.
>
> Hao, if you add the `sub equals` method to your ChineseAnalyzer, I think
> that should fix your problem. I have confirmed that locally with my own
> tests.
>
>
>
> --
> Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman
>

Re: [lucy-user] Custom Analyzer [was Chinese support?]

Posted by Peter Karman <pe...@peknet.com>.
Nick Wellnhofer wrote on 2/23/17 6:49 AM:
> On 23/02/2017 04:52, Peter Karman wrote:
>> package MyAnalyzer {
>>     use base qw( Lucy::Analysis::Analyzer );
>>     sub transform { $_[1] }
>> }
>
> Every Analyzer needs an `equals` method. For simple Analyzers, it can simply
> check whether the class of the other object matches:
>
>     package MyAnalyzer {
>         use base qw( Lucy::Analysis::Analyzer );
>         sub transform { $_[1] }
>         sub equals { $_[1]->isa(__PACKAGE__) }
>     }
>
> If the Analyzer uses (inside-out) member variables, you'll also need dump and
> load methods. Unfortunately, we don't have good documentation for writing custom
> analyzers yet.
>


Thanks for the quick response and accurate diagnosis, Nick. I see you've already 
committed changes to the POD so that will be very helpful in future.

Hao, if you add the `sub equals` method to your ChineseAnalyzer, I think that 
should fix your problem. I have confirmed that locally with my own tests.


-- 
Peter Karman  .  https://peknet.com/  .  https://keybase.io/peterkarman

Re: [lucy-user] Custom Analyzer [was Chinese support?]

Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/02/2017 04:52, Peter Karman wrote:
> package MyAnalyzer {
>     use base qw( Lucy::Analysis::Analyzer );
>     sub transform { $_[1] }
> }

Every Analyzer needs an `equals` method. For simple Analyzers, it can simply 
check whether the class of the other object matches:

     package MyAnalyzer {
         use base qw( Lucy::Analysis::Analyzer );
         sub transform { $_[1] }
         sub equals { $_[1]->isa(__PACKAGE__) }
     }

If the Analyzer uses (inside-out) member variables, you'll also need dump and 
load methods. Unfortunately, we don't have good documentation for writing 
custom analyzers yet.

> *** Error in `perl': corrupted double-linked list: 0x00000000021113a0 ***

This is something that should be fixed. I think that the following happens:

- The exception thrown in the Lucy code causes a refcount leak.
- Because of the leak, the object still exists in Perl's global destruction
   phase where the DESTROY method is invoked on the remaining objects in
   random order.
- So it can happen that Clownfish object A is destroyed with object B still
   referencing it. When B is destroyed, it tries to decrease the refcount of
   A, causing memory corruption.

We'll need a custom DESTROY implementation for Perl that ignores objects with 
a non-zero refcount or checks whether we're in the global destruction phase.

Nick