You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Peter Karman <pe...@peknet.com> on 2017/02/23 03:52:40 UTC
[lucy-user] Custom Analyzer [was Chinese support?]
Marvin, Nick, others with more Lucy-fu than I possess:
The example below is failing. I have created a similar test case to create a PR
but am having trouble running tests within my local git checkout at the moment.
----------------------------------------------------------
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Lucy;
package MyAnalyzer {
use base qw( Lucy::Analysis::Analyzer );
sub transform { $_[1] }
}
package main;
use Lucy::Plan::Schema;
use Lucy::Plan::FullTextType;
use Lucy::Index::Indexer;
my $path_to_index = shift(@ARGV) or die "usage: $0 path/to/index";
for my $try ( ( 1 .. 3 ) ) {
my $schema = Lucy::Plan::Schema->new;
my $my_analyzer = MyAnalyzer->new();
my $raw_type = Lucy::Plan::FullTextType->new( analyzer => $my_analyzer, );
$schema->spec_field( name => 'body', type => $raw_type );
my $indexer = Lucy::Index::Indexer->new(
index => $path_to_index,
schema => $schema,
create => 1,
);
my $doc = { body => 'test' };
$indexer->add_doc($doc);
$indexer->commit;
say "finished $try";
}
--------------------------------------------------------
Example above ^^ based on the gist below.
Hao Wu wrote on 2/20/17 11:40 PM:
> Hi Peter,
>
> Thanks for spending time in the script.
>
> I clean it up a bit, so there is no dependency now.
>
> https://gist.github.com/swuecho/1b960ae17a1f47466be006fd14e3b7ff
>
> still do not work.
>
--
Peter Karman . https://peknet.com/ . https://keybase.io/peterkarman
Re: [lucy-user] Custom Analyzer [was Chinese support?]
Posted by Hao Wu <ec...@gmail.com>.
Hi Peter,
works great. The document is significantly better now.
Thanks everyone for taking care of this issues.
Best,
Hao
On Thu, Feb 23, 2017 at 8:23 AM, Peter Karman <pe...@peknet.com> wrote:
> Nick Wellnhofer wrote on 2/23/17 6:49 AM:
>
>> On 23/02/2017 04:52, Peter Karman wrote:
>>
>>> package MyAnalyzer {
>>> use base qw( Lucy::Analysis::Analyzer );
>>> sub transform { $_[1] }
>>> }
>>>
>>
>> Every Analyzer needs an `equals` method. For simple Analyzers, it can
>> simply
>> check whether the class of the other object matches:
>>
>> package MyAnalyzer {
>> use base qw( Lucy::Analysis::Analyzer );
>> sub transform { $_[1] }
>> sub equals { $_[1]->isa(__PACKAGE__) }
>> }
>>
>> If the Analyzer uses (inside-out) member variables, you'll also need dump
>> and
>> load methods. Unfortunately, we don't have good documentation for writing
>> custom
>> analyzers yet.
>>
>>
>
> Thanks for the quick response and accurate diagnosis, Nick. I see you've
> already committed changes to the POD so that will be very helpful in future.
>
> Hao, if you add the `sub equals` method to your ChineseAnalyzer, I think
> that should fix your problem. I have confirmed that locally with my own
> tests.
>
>
>
> --
> Peter Karman . https://peknet.com/ . https://keybase.io/peterkarman
>
Re: [lucy-user] Custom Analyzer [was Chinese support?]
Posted by Peter Karman <pe...@peknet.com>.
Nick Wellnhofer wrote on 2/23/17 6:49 AM:
> On 23/02/2017 04:52, Peter Karman wrote:
>> package MyAnalyzer {
>> use base qw( Lucy::Analysis::Analyzer );
>> sub transform { $_[1] }
>> }
>
> Every Analyzer needs an `equals` method. For simple Analyzers, it can simply
> check whether the class of the other object matches:
>
> package MyAnalyzer {
> use base qw( Lucy::Analysis::Analyzer );
> sub transform { $_[1] }
> sub equals { $_[1]->isa(__PACKAGE__) }
> }
>
> If the Analyzer uses (inside-out) member variables, you'll also need dump and
> load methods. Unfortunately, we don't have good documentation for writing custom
> analyzers yet.
>
Thanks for the quick response and accurate diagnosis, Nick. I see you've already
committed changes to the POD so that will be very helpful in future.
Hao, if you add the `sub equals` method to your ChineseAnalyzer, I think that
should fix your problem. I have confirmed that locally with my own tests.
--
Peter Karman . https://peknet.com/ . https://keybase.io/peterkarman
Re: [lucy-user] Custom Analyzer [was Chinese support?]
Posted by Nick Wellnhofer <we...@aevum.de>.
On 23/02/2017 04:52, Peter Karman wrote:
> package MyAnalyzer {
> use base qw( Lucy::Analysis::Analyzer );
> sub transform { $_[1] }
> }
Every Analyzer needs an `equals` method. For simple Analyzers, it can simply
check whether the class of the other object matches:
package MyAnalyzer {
use base qw( Lucy::Analysis::Analyzer );
sub transform { $_[1] }
sub equals { $_[1]->isa(__PACKAGE__) }
}
If the Analyzer uses (inside-out) member variables, you'll also need dump and
load methods. Unfortunately, we don't have good documentation for writing
custom analyzers yet.
> *** Error in `perl': corrupted double-linked list: 0x00000000021113a0 ***
This is something that should be fixed. I think that the following happens:
- The exception thrown in the Lucy code causes a refcount leak.
- Because of the leak, the object still exists in Perl's global destruction
phase where the DESTROY method is invoked on the remaining objects in
random order.
- So it can happen that Clownfish object A is destroyed with object B still
referencing it. When B is destroyed, it tries to decrease the refcount of
A, causing memory corruption.
We'll need a custom DESTROY implementation for Perl that ignores objects with
a non-zero refcount or checks whether we're in the global destruction phase.
Nick