You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by "Sérgio Araújo (JIRA)" <ji...@apache.org> on 2009/10/12 18:51:31 UTC

[jira] Created: (LUCENENET-195) IndexWriter.Optimize(); return an exception

IndexWriter.Optimize(); return an exception
-------------------------------------------

                 Key: LUCENENET-195
                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
             Project: Lucene.Net
          Issue Type: Bug
         Environment: Framework 1.1 .NET
            Reporter: Sérgio Araújo


We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.

We are using the your "Lucene.net.dll"API version 2.0.0.4. 

We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.

During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:

"Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."

Here you can find some parts of my code:

Document doc; doc = null;
IndexWriter writer; writer = null;
writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);

writer.SetMergeFactor(1000);
writer.SetMaxMergeDocs(10000);

foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
{
doc = new Document();
doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
....

writer.AddDocument(doc);
}

if (System.DateTime.Now.Hour == 21)
{
writer.Optimize();
}
writer.Close();


If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
Has my code any kind of problem?

We will appreciate your help.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Created: (LUCENENET-195) IndexWriter.Optimize(); return an exception

Posted by Michael Garski <mg...@myspace-inc.com>.
Sérgio,

What is the stack trace on that exception?  That will help point to where in the optimize process the issue is occurring.

I noticed similar behavior during an optimization on a large index on 1.9 & 2.0 only when term vectors were enabled.  As I didn't really need term vectors I disabled them and then everything was fine.  With version 2.3 and beyond I have not encountered any issues during an optimize when term vectors were enabled (we use them for faceting and a few other things).  I'd suggest going with a newer version of Lucene.Net in a test environment to see if it is reproducible there.

Michael

-----Original Message-----
From: Sérgio Araújo (JIRA) [mailto:jira@apache.org] 
Sent: Monday, October 12, 2009 9:52 AM
To: lucene-net-dev@incubator.apache.org
Subject: [jira] Created: (LUCENENET-195) IndexWriter.Optimize(); return an exception

IndexWriter.Optimize(); return an exception
-------------------------------------------

                 Key: LUCENENET-195
                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
             Project: Lucene.Net
          Issue Type: Bug
         Environment: Framework 1.1 .NET
            Reporter: Sérgio Araújo


We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.

We are using the your "Lucene.net.dll"API version 2.0.0.4. 

We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.

During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:

"Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."

Here you can find some parts of my code:

Document doc; doc = null;
IndexWriter writer; writer = null;
writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);

writer.SetMergeFactor(1000);
writer.SetMaxMergeDocs(10000);

foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
{
doc = new Document();
doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
....

writer.AddDocument(doc);
}

if (System.DateTime.Now.Hour == 21)
{
writer.Optimize();
}
writer.Close();


If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
Has my code any kind of problem?

We will appreciate your help.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LUCENENET-195) IndexWriter.Optimize(); return an exception

Posted by "Digy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENENET-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764761#action_12764761 ] 

Digy commented on LUCENENET-195:
--------------------------------

>>  if (System.DateTime.Now.Hour == 21)

Are you sure that your code does not call optimize (for ex, at 21:00:01 ,  21:00:02 etc,) while another optimization is in progress?

DIGY


> IndexWriter.Optimize(); return an exception
> -------------------------------------------
>
>                 Key: LUCENENET-195
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: Framework 1.1 .NET
>            Reporter: Sérgio Araújo
>
> We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.
> We are using the your "Lucene.net.dll"API version 2.0.0.4. 
> We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.
> During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:
> "Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."
> Here you can find some parts of my code:
> Document doc; doc = null;
> IndexWriter writer; writer = null;
> writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);
> writer.SetMergeFactor(1000);
> writer.SetMaxMergeDocs(10000);
> foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
> {
> doc = new Document();
> doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
> ....
> writer.AddDocument(doc);
> }
> if (System.DateTime.Now.Hour == 21)
> {
> writer.Optimize();
> }
> writer.Close();
> If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
> Has my code any kind of problem?
> We will appreciate your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (LUCENENET-195) IndexWriter.Optimize(); return an exception

Posted by "George Aroush (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENENET-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

George Aroush closed LUCENENET-195.
-----------------------------------

    Resolution: Invalid

Let's not abuse JIRA to discuss usage.  If an when you find issues, then start a JIRA discussion around it.  Use lucene-net-user@ mailing list to continue discussing this topic.  Thanks.

> IndexWriter.Optimize(); return an exception
> -------------------------------------------
>
>                 Key: LUCENENET-195
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: Framework 1.1 .NET
>            Reporter: Sérgio Araújo
>
> We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.
> We are using the your "Lucene.net.dll"API version 2.0.0.4. 
> We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.
> During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:
> "Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."
> Here you can find some parts of my code:
> Document doc; doc = null;
> IndexWriter writer; writer = null;
> writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);
> writer.SetMergeFactor(1000);
> writer.SetMaxMergeDocs(10000);
> foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
> {
> doc = new Document();
> doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
> ....
> writer.AddDocument(doc);
> }
> if (System.DateTime.Now.Hour == 21)
> {
> writer.Optimize();
> }
> writer.Close();
> If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
> Has my code any kind of problem?
> We will appreciate your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LUCENENET-195) IndexWriter.Optimize(); return an exception

Posted by "Digy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENENET-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765184#action_12765184 ] 

Digy commented on LUCENENET-195:
--------------------------------

With so little info about your case, I can only suggest to upgrade to 2.3.2 or to 2.4.0(trunk) and try again, since Lucene.Net  >=2.3 is supposed to be thread safe(you can run multiple indexing or searching threads to on the same index).

DIGY


> IndexWriter.Optimize(); return an exception
> -------------------------------------------
>
>                 Key: LUCENENET-195
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: Framework 1.1 .NET
>            Reporter: Sérgio Araújo
>
> We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.
> We are using the your "Lucene.net.dll"API version 2.0.0.4. 
> We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.
> During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:
> "Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."
> Here you can find some parts of my code:
> Document doc; doc = null;
> IndexWriter writer; writer = null;
> writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);
> writer.SetMergeFactor(1000);
> writer.SetMaxMergeDocs(10000);
> foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
> {
> doc = new Document();
> doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
> ....
> writer.AddDocument(doc);
> }
> if (System.DateTime.Now.Hour == 21)
> {
> writer.Optimize();
> }
> writer.Close();
> If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
> Has my code any kind of problem?
> We will appreciate your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (LUCENENET-195) IndexWriter.Optimize(); return an exception

Posted by "Sérgio Araújo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENENET-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765005#action_12765005 ] 

Sérgio Araújo commented on LUCENENET-195:
-----------------------------------------

Thanks for your fast reply.

I'm sure, only an optimization process is called, this process is managed for a windows service.

Sergio

> IndexWriter.Optimize(); return an exception
> -------------------------------------------
>
>                 Key: LUCENENET-195
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-195
>             Project: Lucene.Net
>          Issue Type: Bug
>         Environment: Framework 1.1 .NET
>            Reporter: Sérgio Araújo
>
> We are using the Lucene search engine a couple of months, on the first approach seems a very good and high-performance engine.
> We are using the your "Lucene.net.dll"API version 2.0.0.4. 
> We have an index with 20GB approximately, all hours are added news docs to index and a time per day the optimization is done at 9 pm.
> During a couple of days everything ran fine even a day that optimization process "writer.Optimize();" return the following exception:
> "Source array was not long enough. Check srcIndex and length, and the array´s lower bounds."
> Here you can find some parts of my code:
> Document doc; doc = null;
> IndexWriter writer; writer = null;
> writer = new IndexWriter(strArticleIndexFolder, new StandardAnalyzer(), isNew);
> writer.SetMergeFactor(1000);
> writer.SetMaxMergeDocs(10000);
> foreach (ArticleIndexFull objArticleIndex in lstArticleIndexFull)
> {
> doc = new Document();
> doc.Add(newField(O4kFreeSearchTag.ArticleLuceneId,objArticleIndex.ArticleIndexFullId.ToString(), Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ArticleId,objArticleIndex.ArticleId.ToString(),Field.Store.YES, Field.Index.TOKENIZED, Field.TermVector.YES));
> doc.Add(newField(O4kFreeSearchTag.ProductionDate,FactoryBLL.ArticleIndex.ClearCharStream(AlphaNumeric.ConvertToString(objArticleIndex.ProductionDate.ToString("yyyyMMdd",System.Globalization.CultureInfo.GetCultureInfo("en-US")),String.Empty)),Field.Store.NO, Field.Index.TOKENIZED, Field.TermVector.YES));
> ....
> writer.AddDocument(doc);
> }
> if (System.DateTime.Now.Hour == 21)
> {
> writer.Optimize();
> }
> writer.Close();
> If we migrate to last version available in this case the 2.4.3 my problem will be fixed?
> Has my code any kind of problem?
> We will appreciate your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.