You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Saurabh Vasekar <sv...@listenlogic.com> on 2012/06/13 21:29:56 UTC

[lucy-user] How to add more languages in an analyzer and change path to store indexed documents

Hello,

I am a beginner to Lucy. This is the first time I am using a Search
library. I went through the tutorial at lucy.apache.org. I am confused over
the following things mentioned in the tutorial.

The tutorial mentions that we can specify the language in which the
documents are. Hence while indexing how can I specify multiple languages in
the analyzers if my documents are in different languages.

my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
       language => 'en',
       )

How can I specify multiple languages such as Danish, German, Finnish etc
etc.

Secondly the path_to_index given in the tutorial is '/store/lucy_test'.
This path was given in the sample tutorial when I downloaded the
apache-lucy-0.3.1 library. Now if I want to change this path meaning I want
to store my indexed documents at a different location how can I do that?
The location /store/lucy_test contains different sub-folder viz. locks,
schema_26.json, set_26, snapshot_26.json

The code in indexer.pl is

my $path_to_index = '/store/lucy_test';

use Lucy::Index::Indexer;

my $indexer = Lucy::Index::Indexer->new(
      index => $path_to_index,
      schema => $schema,
      create => 1,
      truncate => 1,
);

Also what do the 'create'  and 'truncate' parameters specify in this case?

I changed the default path in the tutorial '/store/lucy_test' to
'/store_test'. The script indexer.pl ran perfectly. Then in the
cgi-search.pl I made the same change to the $path_to_index variable. It
gave the following error -

*Index doesn't seem to contain any data*
* lucy_IxReader_do_open at
/root/apache-lucy-0.3.1/perl/../core/Lucy/Index/IndexReader.c*
*
*
After I made the change to the $path_to_index variable in indexer.pl and
ran the script the following folders were created in the path - locks and
seg_1.

I am terribly stuck and I am not able to go forward. Please bear with the
question. Thank you for you patience.

Thank you.

RE: [lucy-user] How to add more languages in an analyzer and change path to store indexed documents

Posted by "Zebrowski, Zak" <za...@mitre.org>.
#my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
 #      language => 'en',
 #     )
#
#How can I specify multiple languages such as Danish, German, Finnish etc
#etc.

Use the appropriate 2 letter country code from the polyanalyzer perldoc, eg:
    en => English,
    da => Danish,
    de => German,
    es => Spanish,
    fi => Finnish,
    fr => French,
    hu => Hungarian,
    it => Italian,
    nl => Dutch,
    no => Norwegian,
    pt => Portuguese,
    ro => Romanian,
    ru => Russian,
    sv => Swedish,
    tr => Turkish,

or you can create your own polyanalyzer, which I do.

#Secondly the path_to_index given in the tutorial is '/store/lucy_test'.
#This path was given in the sample tutorial when I downloaded the
#apache-lucy-0.3.1 library. Now if I want to change this path meaning I want
#to store my indexed documents at a different location how can I do that?

Just set it.    Update the path to the path structure you want to use.  There's nothing special here.  Just make sure the directory structure exists before you start creating your index. 

#Also what do the 'create'  and 'truncate' parameters specify in this case?

Create simply creates an index if it doesn't exist.  " truncate - If true, proceed with the intention of discarding all previous indexing data. The old data will remain intact and visible until commit() succeeds. "  (See : http://search.cpan.org/~logie/Lucy-0.3.1/lib/Lucy/Index/Indexer.pod )

When you have questions about a particular perl module, (for an object which you're using), look at the perldoc for that module for more information beyond that which is contained in the walk through.

# *Index doesn't seem to contain any data*

After you generate your index, be sure to call $indexer->commit(); otherwise, there will be no documents found.  Then, make sure that your cgi has permission to read the index directory structure.  If you don't have read permissions, it will appear to the cgi that no index was created.

Good luck.
Zak


Re: [lucy-user] How to add more languages in an analyzer and change path to store indexed documents

Posted by Saurabh Vasekar <sv...@listenlogic.com>.
Thanks a lot for your help!

On Wed, Jun 13, 2012 at 12:55 PM, Peter Karman <pe...@peknet.com> wrote:

> Saurabh Vasekar wrote on 6/13/12 2:29 PM:
> > Hello,
> >
> > I am a beginner to Lucy. This is the first time I am using a Search
> > library. I went through the tutorial at lucy.apache.org. I am confused
> over
> > the following things mentioned in the tutorial.
> >
> > The tutorial mentions that we can specify the language in which the
> > documents are. Hence while indexing how can I specify multiple languages
> in
> > the analyzers if my documents are in different languages.
> >
> > my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
> >        language => 'en',
> >        )
> >
>
> note that you likely don't want to specify multiple languages for a single
> index, because the stemming (for example) rules applied will be
> confused/confusing. I.e., Lucy doesn't do language *detection* -- it just
> performs language-specific analysis based on the kind of documents you
> hand to
> the analyzer.
>
>
> --
> Peter Karman  .  http://peknet.com/  .  peter@peknet.com
>

Re: [lucy-user] How to add more languages in an analyzer and change path to store indexed documents

Posted by Peter Karman <pe...@peknet.com>.
Saurabh Vasekar wrote on 6/13/12 2:29 PM:
> Hello,
> 
> I am a beginner to Lucy. This is the first time I am using a Search
> library. I went through the tutorial at lucy.apache.org. I am confused over
> the following things mentioned in the tutorial.
> 
> The tutorial mentions that we can specify the language in which the
> documents are. Hence while indexing how can I specify multiple languages in
> the analyzers if my documents are in different languages.
> 
> my $polyanalyzer = Lucy::Analysis::PolyAnalyzer->new(
>        language => 'en',
>        )
> 

note that you likely don't want to specify multiple languages for a single
index, because the stemming (for example) rules applied will be
confused/confusing. I.e., Lucy doesn't do language *detection* -- it just
performs language-specific analysis based on the kind of documents you hand to
the analyzer.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com