You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Bogdan Litescu <bo...@gmail.com> on 2011/11/13 16:36:31 UTC

[Lucene.Net] ngram filter

Hello,

I have a requirement where I need to also be able to search parts of words.
Doing some research I found that there are some NGrams filters capable of
extracting parts of the words and index those as well. But I couldn't find
any reference to this in Lucene.Net source code. Is this part of a separate
package o has it not been ported to .NET yet?

Thanks,
Bogdan

Re: [Lucene.Net] inheritance in lucene .net

Posted by Anders Lybecker <an...@lybecker.com>.
Hi Christian,

This is NHibernate specific, so I guess you have a better chance in there
forum.

:-)
Anders Lybecker

On Mon, Nov 14, 2011 at 5:24 PM, Christian Setzkorn
<ch...@setzkorn.eu>wrote:

> I have a class Publication which has a child PublicationX. I am trying to
> index all instances of Publication like this:
>
> public void BuildSearchIndex()
> {
>    FSDirectory entityDirectory = null;
>    IndexWriter writer = null;
>
>    var entityType = typeof(Publication);
>
>    var indexDirectory = new DirectoryInfo(GetIndexDirectory());
>
>    if (indexDirectory.Exists)
>    {
>    indexDirectory.Delete(true);
>    }
>
>    try
>    {
>    var dir = new DirectoryInfo(Path.Combine(indexDirectory.FullName,
> entityType.Name));
>    entityDirectory = FSDirectory.Open(dir);
>    writer = new IndexWriter(entityDirectory, new
> StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true,
> IndexWriter.MaxFieldLength.UNLIMITED);
>    }
>    finally
>    {
>    if (entityDirectory != null)
>    {
>        entityDirectory.Close();
>    }
>
>    if (writer != null)
>    {
>        writer.Close();
>    }
>    }
>
>    var fullTextSession =
> Search.CreateFullTextSession(NHibernateSession.Current);
>
>    int totalCount =
>
> NHibernateSession.Current.CreateCriteria(typeof(Publication)).SetProjection(
> Projections.RowCount()).FutureValue<Int32>().Value;
>
>    int pageSize = 500;
>    int totalPages = totalCount / pageSize + 1;
>    int currentPage = 0;
>
>    do
>    {
>    IList<Publication> list =
>
> NHibernateSession.Current.CreateCriteria(typeof(Publication)).SetFirstResult
> (pageSize * (currentPage - 1)).SetMaxResults(pageSize *
> currentPage).Future<Publication>().ToList();
>
>    foreach (Publication p in list)
>    {
>        fullTextSession.Index(p);
>    }
>    currentPage++;
>    }
>    while (currentPage < totalPages);
> }
>
> The idea is to do the indexing in batches (improvement suggestions welcome)
> as there are quite a lot of publications already in the relational
> database.
> Unfortunately, I am getting this exception:
>
> NHibernate.Search.Impl.SearchException was unhandled by user code
>  Message=Unable to open IndexReader for EID2.Domain.PublicationX
>  Source=NHibernate.Search
>
> I have marked both classes with [Indexed].
>
> Thanks.
>
> Christian
>
>

[Lucene.Net] inheritance in lucene .net

Posted by Christian Setzkorn <ch...@setzkorn.eu>.
I have a class Publication which has a child PublicationX. I am trying to
index all instances of Publication like this:

public void BuildSearchIndex()
{
    FSDirectory entityDirectory = null;
    IndexWriter writer = null;

    var entityType = typeof(Publication);

    var indexDirectory = new DirectoryInfo(GetIndexDirectory());

    if (indexDirectory.Exists)
    {
    indexDirectory.Delete(true);
    }

    try
    {
    var dir = new DirectoryInfo(Path.Combine(indexDirectory.FullName,
entityType.Name));
    entityDirectory = FSDirectory.Open(dir);
    writer = new IndexWriter(entityDirectory, new
StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29), true,
IndexWriter.MaxFieldLength.UNLIMITED);
    }
    finally
    {
    if (entityDirectory != null)
    {
        entityDirectory.Close();
    }

    if (writer != null)
    {
        writer.Close();
    }
    }

    var fullTextSession =
Search.CreateFullTextSession(NHibernateSession.Current);

    int totalCount =
NHibernateSession.Current.CreateCriteria(typeof(Publication)).SetProjection(
Projections.RowCount()).FutureValue<Int32>().Value;

    int pageSize = 500;
    int totalPages = totalCount / pageSize + 1;
    int currentPage = 0;

    do
    {
    IList<Publication> list =
NHibernateSession.Current.CreateCriteria(typeof(Publication)).SetFirstResult
(pageSize * (currentPage - 1)).SetMaxResults(pageSize *
currentPage).Future<Publication>().ToList();

    foreach (Publication p in list)
    {
        fullTextSession.Index(p);
    }
    currentPage++;
    }
    while (currentPage < totalPages);
}

The idea is to do the indexing in batches (improvement suggestions welcome)
as there are quite a lot of publications already in the relational database.
Unfortunately, I am getting this exception:

NHibernate.Search.Impl.SearchException was unhandled by user code
  Message=Unable to open IndexReader for EID2.Domain.PublicationX
  Source=NHibernate.Search

I have marked both classes with [Indexed].

Thanks.

Christian


RE: [Lucene.Net] ngram filter

Posted by Digy <di...@gmail.com>.
I don't know how others (github, nuget etc.) are maintained.
The only official location for Lucene.Net source is
https://svn.apache.org/repos/asf/incubator/lucene.net/

DIGY

-----Original Message-----
From: Bogdan Litescu [mailto:bogdan.litescu@avatar-soft.ro] 
Sent: Sunday, November 13, 2011 8:47 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] ngram filter

Thanks, I found it, I guess I had an older version of the source code.
By the way, I did a pull request some months ago on github, is this the
correct way to do it?
https://github.com/apache/lucene.net/pulls

Bogdan

On Sun, Nov 13, 2011 at 8:14 PM, Digy <di...@gmail.com> wrote:

> It is in contrib, not in the core. You can download the source (using a
svn
> client) from
>
>
>
http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/contrib/Analyzer
> s/
> or
>
>
http://svn.apache.org/viewvc/incubator/lucene.net/branches/Lucene.Net_2_9_4g
> /src/contrib/Analyzers/
>
>
> DIGY
>
> -----Original Message-----
> From: Bogdan Litescu [mailto:bogdan.litescu@gmail.com]
> Sent: Sunday, November 13, 2011 5:37 PM
> To: lucene-net-user@lucene.apache.org
> Subject: [Lucene.Net] ngram filter
>
> Hello,
>
> I have a requirement where I need to also be able to search parts of
words.
> Doing some research I found that there are some NGrams filters capable of
> extracting parts of the words and index those as well. But I couldn't find
> any reference to this in Lucene.Net source code. Is this part of a
separate
> package o has it not been ported to .NET yet?
>
> Thanks,
> Bogdan
>
> -----
>
> Checked by AVG - www.avg.com
> Version: 2012.0.1869 / Virus Database: 2092/4614 - Release Date: 11/13/11
>
>


-- 
Bogdan Litescu
Avatar Software <http://www.avatar-soft.ro>
twitter.com/AvtSoft

-----

Checked by AVG - www.avg.com
Version: 2012.0.1869 / Virus Database: 2092/4614 - Release Date: 11/13/11


Re: [Lucene.Net] ngram filter

Posted by Bogdan Litescu <bo...@avatar-soft.ro>.
Thanks, I found it, I guess I had an older version of the source code.
By the way, I did a pull request some months ago on github, is this the
correct way to do it?
https://github.com/apache/lucene.net/pulls

Bogdan

On Sun, Nov 13, 2011 at 8:14 PM, Digy <di...@gmail.com> wrote:

> It is in contrib, not in the core. You can download the source (using a svn
> client) from
>
>
> http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/contrib/Analyzer
> s/
> or
>
> http://svn.apache.org/viewvc/incubator/lucene.net/branches/Lucene.Net_2_9_4g
> /src/contrib/Analyzers/
>
>
> DIGY
>
> -----Original Message-----
> From: Bogdan Litescu [mailto:bogdan.litescu@gmail.com]
> Sent: Sunday, November 13, 2011 5:37 PM
> To: lucene-net-user@lucene.apache.org
> Subject: [Lucene.Net] ngram filter
>
> Hello,
>
> I have a requirement where I need to also be able to search parts of words.
> Doing some research I found that there are some NGrams filters capable of
> extracting parts of the words and index those as well. But I couldn't find
> any reference to this in Lucene.Net source code. Is this part of a separate
> package o has it not been ported to .NET yet?
>
> Thanks,
> Bogdan
>
> -----
>
> Checked by AVG - www.avg.com
> Version: 2012.0.1869 / Virus Database: 2092/4614 - Release Date: 11/13/11
>
>


-- 
Bogdan Litescu
Avatar Software <http://www.avatar-soft.ro>
twitter.com/AvtSoft

RE: [Lucene.Net] ngram filter

Posted by Digy <di...@gmail.com>.
It is in contrib, not in the core. You can download the source (using a svn
client) from

http://svn.apache.org/viewvc/incubator/lucene.net/trunk/src/contrib/Analyzer
s/
or 
http://svn.apache.org/viewvc/incubator/lucene.net/branches/Lucene.Net_2_9_4g
/src/contrib/Analyzers/


DIGY

-----Original Message-----
From: Bogdan Litescu [mailto:bogdan.litescu@gmail.com] 
Sent: Sunday, November 13, 2011 5:37 PM
To: lucene-net-user@lucene.apache.org
Subject: [Lucene.Net] ngram filter

Hello,

I have a requirement where I need to also be able to search parts of words.
Doing some research I found that there are some NGrams filters capable of
extracting parts of the words and index those as well. But I couldn't find
any reference to this in Lucene.Net source code. Is this part of a separate
package o has it not been ported to .NET yet?

Thanks,
Bogdan

-----

Checked by AVG - www.avg.com
Version: 2012.0.1869 / Virus Database: 2092/4614 - Release Date: 11/13/11