You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by "Gupta, Rajiv" <Ra...@netapp.com> on 2016/12/06 16:17:02 UTC

[lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Any idea why I'm getting this error.

Error Invalid path: 'seg_9i/lextemp'
20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92


In another log file getting different error

Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429

When committing the indexer object.


In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

-Rajiv

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 19/12/2016 04:21, Gupta, Rajiv wrote:
> In release 0.6.1 we have fix for below bug right?
>
>> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
>> S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Yes, this is fixed in 0.6.1.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Thanks Peter. For now I'm using copydir. Not seen any problem so far except indexes are not available during copy which is expected and for that I have put the retry.

-----Original Message-----
From: peknet@gmail.com [mailto:peknet@gmail.com] On Behalf Of Peter Karman
Sent: Wednesday, January 04, 2017 8:31 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I use rsync to copy indexes from one machine to another. Copy probably works too.

Another approach is to have a single indexer and some kind of queue, so that separate worker machines can push documents-to-be-indexed to the queue and the indexer runs periodically to injest them. Same idea, but performance may vary depending on the number of workers and frequency of updates.

On Wed, Jan 4, 2017 at 8:22 AM, Gupta, Rajiv <Ra...@netapp.com> wrote:

> I think you may not have liked the approach :(
>
> However,  I tried that and it seems working fine. I gave 20+ big runs 
> and they all seems went through.
>
> Just checking should I use raw copy or is there better way to copy 
> indexes without losing any transit data, such as 
> ($indexer->add_index($index);)
>
> Thanks,
> Rajiv Gupta
>
> -----Original Message-----
> From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com]
> Sent: Monday, January 02, 2017 7:47 PM
> To: user@lucy.apache.org
> Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
>
> Till now we are under the impression of - http://lucene.472066.n3.
> nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-
> using-Lucy-td4160395.html so avoiding any kind of parallel indexing.
>
> Let me know your thoughts on this approach. Run all indexing in 
> parallel and save indexes at /tmp (local fs location) and periodically 
> copy it to shared location. Why to copy because from servers where I'm 
> performing search need access to the indexes. Insertion will happen 
> only from one server however searches can be performed from different 
> servers using indexed data.
>
> -Rajiv
>
> -----Original Message-----
> From: Nick Wellnhofer [mailto:wellnhofer@aevum.de]
> Sent: Monday, December 19, 2016 7:09 PM
> To: user@lucy.apache.org
> Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
>
> On 19/12/2016 04:21, Gupta, Rajiv wrote:
> > Rajiv>>>All parallel processes are child process of one process and
> running from the same host. Would you think giving host name 
> uniqueness with some random number would help for multiple processes.
>
> If you access an index on a shared volume only from a single host, 
> there's actually no need to set a hostname at all, although it's good practice.
> It's all explained in Lucy::Docs::FileLocking:
>
>      http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html
>
> But you should never use different or even random `host` values on the 
> same machine. This can lead to stale lock files not being deleted 
> after a crash.
>
> > Rajiv>>> Going to local file system is not possible for my case. 
> > Rajiv>>> This is
> a test framework that generate lot of logs and I'm doing indexing per 
> test runs and all these logs needs to be on shared volume for other 
> triaging purpose.
>
> It doesn't matter where the log files are. I'm talking about the 
> location of your Lucy index directory.
>
> > The next thing I'm going to try is create a watcher per directory 
> > and
> index all files under that directory serially. Currently I'm creating 
> watchers on all the files and some time multiple files in the same 
> directory may try to get indexed at the same time.  And as you stated 
> this might be the issue. I'm not sure how it will perform with the 
> current time limits.
>
> By Lucy's design, indexing files in parallel shouldn't cause any 
> problems, especially if it all happens on a single machine. The worst 
> thing that could happen are lock errors which can be addressed by 
> changing timeouts or retrying. But without code to reproduce the 
> problem, I can't tell whether it's a Lucy bug.
>
> If you can't provide a test case, it's a good idea to test whether the 
> problems are caused by parallel indexing at all. I'd also try to move 
> your indices to a local file system to see whether it makes a difference.
>
> > Creating Indexer manager adding overhead to the search process.
>
> You only have to use IndexManagers for searchers to avoid errors like 
> "Stale NFS filehandle". If you have another way to handle such errors, 
> there might be no need for IndexManagers at all. Again, see 
> Lucy::Docs:FileLocking.
>
> Nick
>
>


--
Peter Karman . peter@peknet.com . http://peknet.com/

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Peter Karman <pe...@peknet.com>.
I use rsync to copy indexes from one machine to another. Copy probably
works too.

Another approach is to have a single indexer and some kind of queue, so
that separate worker machines can push documents-to-be-indexed to the queue
and the indexer runs periodically to injest them. Same idea, but
performance may vary depending on the number of workers and frequency of
updates.

On Wed, Jan 4, 2017 at 8:22 AM, Gupta, Rajiv <Ra...@netapp.com> wrote:

> I think you may not have liked the approach :(
>
> However,  I tried that and it seems working fine. I gave 20+ big runs and
> they all seems went through.
>
> Just checking should I use raw copy or is there better way to copy indexes
> without losing any transit data, such as ($indexer->add_index($index);)
>
> Thanks,
> Rajiv Gupta
>
> -----Original Message-----
> From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com]
> Sent: Monday, January 02, 2017 7:47 PM
> To: user@lucy.apache.org
> Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> Till now we are under the impression of - http://lucene.472066.n3.
> nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-
> using-Lucy-td4160395.html so avoiding any kind of parallel indexing.
>
> Let me know your thoughts on this approach. Run all indexing in parallel
> and save indexes at /tmp (local fs location) and periodically copy it to
> shared location. Why to copy because from servers where I'm performing
> search need access to the indexes. Insertion will happen only from one
> server however searches can be performed from different servers using
> indexed data.
>
> -Rajiv
>
> -----Original Message-----
> From: Nick Wellnhofer [mailto:wellnhofer@aevum.de]
> Sent: Monday, December 19, 2016 7:09 PM
> To: user@lucy.apache.org
> Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at
> core/Lucy/Store/Folder.c line 119
>
> On 19/12/2016 04:21, Gupta, Rajiv wrote:
> > Rajiv>>>All parallel processes are child process of one process and
> running from the same host. Would you think giving host name uniqueness
> with some random number would help for multiple processes.
>
> If you access an index on a shared volume only from a single host, there's
> actually no need to set a hostname at all, although it's good practice.
> It's all explained in Lucy::Docs::FileLocking:
>
>      http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html
>
> But you should never use different or even random `host` values on the
> same machine. This can lead to stale lock files not being deleted after a
> crash.
>
> > Rajiv>>> Going to local file system is not possible for my case. This is
> a test framework that generate lot of logs and I'm doing indexing per test
> runs and all these logs needs to be on shared volume for other triaging
> purpose.
>
> It doesn't matter where the log files are. I'm talking about the location
> of your Lucy index directory.
>
> > The next thing I'm going to try is create a watcher per directory and
> index all files under that directory serially. Currently I'm creating
> watchers on all the files and some time multiple files in the same
> directory may try to get indexed at the same time.  And as you stated this
> might be the issue. I'm not sure how it will perform with the current time
> limits.
>
> By Lucy's design, indexing files in parallel shouldn't cause any problems,
> especially if it all happens on a single machine. The worst thing that
> could happen are lock errors which can be addressed by changing timeouts or
> retrying. But without code to reproduce the problem, I can't tell whether
> it's a Lucy bug.
>
> If you can't provide a test case, it's a good idea to test whether the
> problems are caused by parallel indexing at all. I'd also try to move your
> indices to a local file system to see whether it makes a difference.
>
> > Creating Indexer manager adding overhead to the search process.
>
> You only have to use IndexManagers for searchers to avoid errors like
> "Stale NFS filehandle". If you have another way to handle such errors,
> there might be no need for IndexManagers at all. Again, see
> Lucy::Docs:FileLocking.
>
> Nick
>
>


-- 
Peter Karman . peter@peknet.com . http://peknet.com/

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
I think you may not have liked the approach :(

However,  I tried that and it seems working fine. I gave 20+ big runs and they all seems went through. 

Just checking should I use raw copy or is there better way to copy indexes without losing any transit data, such as ($indexer->add_index($index);)

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Monday, January 02, 2017 7:47 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Till now we are under the impression of - http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html so avoiding any kind of parallel indexing. 

Let me know your thoughts on this approach. Run all indexing in parallel and save indexes at /tmp (local fs location) and periodically copy it to shared location. Why to copy because from servers where I'm performing search need access to the indexes. Insertion will happen only from one server however searches can be performed from different servers using indexed data. 

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Monday, December 19, 2016 7:09 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's actually no need to set a hostname at all, although it's good practice. It's all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, especially if it all happens on a single machine. The worst thing that could happen are lock errors which can be addressed by changing timeouts or retrying. But without code to reproduce the problem, I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems are caused by parallel indexing at all. I'd also try to move your indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale NFS filehandle". If you have another way to handle such errors, there might be no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Till now we are under the impression of - http://lucene.472066.n3.nabble.com/lucy-user-Parallel-indexing-in-same-index-dir-using-Lucy-td4160395.html so avoiding any kind of parallel indexing. 

Let me know your thoughts on this approach. Run all indexing in parallel and save indexes at /tmp (local fs location) and periodically copy it to shared location. Why to copy because from servers where I'm performing search need access to the indexes. Insertion will happen only from one server however searches can be performed from different servers using indexed data. 

-Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Monday, December 19, 2016 7:09 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's actually no need to set a hostname at all, although it's good practice. It's all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, especially if it all happens on a single machine. The worst thing that could happen are lock errors which can be addressed by changing timeouts or retrying. But without code to reproduce the problem, I can't tell whether it's a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the problems are caused by parallel indexing at all. I'd also try to move your indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale NFS filehandle". If you have another way to handle such errors, there might be no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick


Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 19/12/2016 04:21, Gupta, Rajiv wrote:
> Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes.

If you access an index on a shared volume only from a single host, there's 
actually no need to set a hostname at all, although it's good practice. It's 
all explained in Lucy::Docs::FileLocking:

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

But you should never use different or even random `host` values on the same 
machine. This can lead to stale lock files not being deleted after a crash.

> Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose.

It doesn't matter where the log files are. I'm talking about the location of 
your Lucy index directory.

> The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits.

By Lucy's design, indexing files in parallel shouldn't cause any problems, 
especially if it all happens on a single machine. The worst thing that could 
happen are lock errors which can be addressed by changing timeouts or 
retrying. But without code to reproduce the problem, I can't tell whether it's 
a Lucy bug.

If you can't provide a test case, it's a good idea to test whether the 
problems are caused by parallel indexing at all. I'd also try to move your 
indices to a local file system to see whether it makes a difference.

> Creating Indexer manager adding overhead to the search process.

You only have to use IndexManagers for searchers to avoid errors like "Stale 
NFS filehandle". If you have another way to handle such errors, there might be 
no need for IndexManagers at all. Again, see Lucy::Docs:FileLocking.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Thanks Nick for your reply and taking time on this. One quick question before you lost on below email. In release 0.6.1 we have fix for below bug right?

> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high 
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Thanks,
Rajiv g

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Saturday, December 17, 2016 2:52 AM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 13/12/2016 18:05, Gupta, Rajiv wrote:
> After I create directory by myself I'm getting this error:

Which directory do you try to create? I wouldn't try to make manual changes inside Lucy's index directory. This will only make things worse.

        $indexer = Lucy::Index::Indexer->new(
                index    => $saveindexlocation,
                schema   => $schema,
                manager  => Lucy::Index::IndexManager->new(host=>$self->{_hostname}),
                create   => $dir_create_flag,
                truncate => 0,
            );

The "create" flag initially set to 1 so that $saveindexlocation can get created after I got the error I make sure directory is created and made create flag always 0.

> Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
> 20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
> 20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
> 20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75
>
> There are two more failures they also failed due so similar reasons
>
> rename from 
> '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode
> _1of1/.lucyindex/1/schema.temp' to 
> '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode
> _1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
>
> Can't delete 'lexicon-3.ix'
>
> I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.
>
> 20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
> 20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102
>
> So my all retry attempts were also failed.

These errors still look like multiple processes are modifying the index at the same time. Are you really sure that every indexer is created with an IndexManager and that every IndexManager is created with a `host` argument that is unique to each machine?

Rajiv>>>All parallel processes are child process of one process and running from the same host. Would you think giving host name uniqueness with some random number would help for multiple processes. 

All these errors mean that there's something fundamentally wrong with your code or that you hit a bug in Lucy. The only type of error where it makes sense to retry is LockErr. All other errors are mostly fatal and could result in index corruption. Retrying will only mask an underlying problem in this case.

Unfortunately, I'm unable to help unless you provide some kind of self-contained, reproducible test case. I'm aware that this isn't easy, especially with multiple clients writing to a shared volume.

As I already hinted at, you might want to reconsider your architecture and use some kind of search server that uses an index on a local filesystem. There are ready-made platforms on top of Lucy like Dezi, but it isn't too hard to roll your own solution. This should result in better performance and makes behavior of your code more predictable.

Rajiv>>> Going to local file system is not possible for my case. This is a test framework that generate lot of logs and I'm doing indexing per test runs and all these logs needs to be on shared volume for other triaging purpose. The next thing I'm going to try is create a watcher per directory and index all files under that directory serially. Currently I'm creating watchers on all the files and some time multiple files in the same directory may try to get indexed at the same time.  And as you stated this might be the issue. I'm not sure how it will perform with the current time limits. Creating Indexer manager adding overhead to the search process. 

Nick


Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 13/12/2016 18:05, Gupta, Rajiv wrote:
> After I create directory by myself I'm getting this error:

Which directory do you try to create? I wouldn't try to make manual changes 
inside Lucy's index directory. This will only make things worse.

> Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
> 20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
> 20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
> 20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75
>
> There are two more failures they also failed due so similar reasons
>
> rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
>
> Can't delete 'lexicon-3.ix'
>
> I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt.
>
> 20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
> 20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102
>
> So my all retry attempts were also failed.

These errors still look like multiple processes are modifying the index at the 
same time. Are you really sure that every indexer is created with an 
IndexManager and that every IndexManager is created with a `host` argument 
that is unique to each machine?

All these errors mean that there's something fundamentally wrong with your 
code or that you hit a bug in Lucy. The only type of error where it makes 
sense to retry is LockErr. All other errors are mostly fatal and could result 
in index corruption. Retrying will only mask an underlying problem in this case.

Unfortunately, I'm unable to help unless you provide some kind of 
self-contained, reproducible test case. I'm aware that this isn't easy, 
especially with multiple clients writing to a shared volume.

As I already hinted at, you might want to reconsider your architecture and use 
some kind of search server that uses an index on a local filesystem. There are 
ready-made platforms on top of Lucy like Dezi, but it isn't too hard to roll 
your own solution. This should result in better performance and makes behavior 
of your code more predictable.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Another thing I could not able to figure out how the time zone is getting changed after the failure.

20161211 125058 [] [(_start_indexing_file) File to index: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/001_check_log_size.log Save Index location: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1 FileSeek pointer start : 17794 Final Flag: 1
20161211 175059 [] ****************************************
20161211 175059 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1 :  rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory
20161211 175059 [] *    LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429
20161211 175059 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3278, <$fhlogfile> line 46.
20161211 175059 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3265
20161211 175059 [] *    NATE::LucyIndexerUtils::_lucy_add_doc('NATE::LucyIndexerUtils=HASH(0x2917050)', 'HASH(0x3116398)') called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3169

It is exactly shifting by 5 hrs. from EST. These logs are nothing but the STDOUT moving to one file and I'm using localtime(time()) and format it to generate the time string.

Thanks,
Rajiv Gutpta



-----Original Message-----
From: Gupta, Rajiv 
Sent: Monday, December 12, 2016 10:08 AM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons 

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt. 

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed. 


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Sunday, December 11, 2016 3:58 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
After I create directory by myself I'm getting this error:

20161213 164633 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481639632.57959_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1 :  Folder '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481639632.57959_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1' failed check
20161213 164633 [] *    S_init_folder at core/Lucy/Index/Indexer.c line 263
20161213 164633 [] *    at /usr/software/lib/perl5/site_perl/5.14.0/x86_64-linux-thread-multi/Lucy.pm line 122.

Please help I'm badly struck now. This error is intermittent I see mostly when NFS is loaded. Ideally retry should work but that is also not working. I also tried to limit the number of files in a directories to be scanned to 10 by putting a hack in my code but that is also not working. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Monday, December 12, 2016 10:08 AM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons 

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt. 

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed. 


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Sunday, December 11, 2016 3:58 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Can't open '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1/seg_fd/lexicon-7.ixix': Invalid argument
20161211 182109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118
20161211 182109 [] *    LUCY_Folder_Local_Open_In_IMP at core/Lucy/Store/Folder.c line 101
20161211 182109 [] *    LUCY_Folder_Open_In_IMP at core/Lucy/Store/Folder.c line 75

There are two more failures they also failed due so similar reasons 

rename from '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema.temp' to '/u/smoke/presub/logs/cit-fg-adr-ndo-rtp.rajivg.1481473039.49384_cmode_1of1/.lucyindex/1/schema_e4.json' failed: No such file or directory

Can't delete 'lexicon-3.ix'

I believe all three are related to race condition while doing parallel indexing and should go away with retries. However my retries started failing with different error which is strange to me as if directory already exists shouldn't it skip from create attempt. 

20161211 182109 [] * FAIL: [FAILED]: Retrying to add doc at path: /u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1 :  Couldn't create directory '/u/smoke/presub/logs/cit-fg-adr-neg-rtp.rajivg.1481473130.41339_cmode_1of1/.lucyindex/1': No such file or directory
20161211 182109 [] *    LUCY_FSFolder_Initialize_IMP at core/Lucy/Store/FSFolder.c line 102

So my all retry attempts were also failed. 


Now I have put one more condition on before Index creation if directory exists or not before retry :(

My failure rate is now 7/10. Target to achieve at least 9/10.

-Rajiv


-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Sunday, December 11, 2016 3:58 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick


Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 10/12/2016 17:25, Gupta, Rajiv wrote:
> Any timeline when 0.6.1 will be released?

0.6.1 is on schedule to be released in a few days.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Any timeline when 0.6.1 will be released?

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Saturday, December 10, 2016 9:37 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 10/12/2016 15:26, Gupta, Rajiv wrote:
> I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

Note that the fix to IndexManager isn't in any released version of Lucy yet. 
You'll get the same error with 0.6.0. Either compile from the 0.6 Git branch or wait until 0.6.1 is released.

Nick

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 10/12/2016 15:26, Gupta, Rajiv wrote:
> I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4.

Note that the fix to IndexManager isn't in any released version of Lucy yet. 
You'll get the same error with 0.6.0. Either compile from the 0.6 Git branch 
or wait until 0.6.1 is released.

Nick

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
I tried with below workaround now I got below error at multiple places where adding doc. It bailed out after 20 retries (default) without adding any doc. 

:  input 51 too high
20161210 064458 [] *    S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Am I doing something wrong? The approach I'm following is the while reading the document use LightMergeManager and one it comes to end of file use regular index manger to do the final commit. In some cases the number of docs buffered using LightMergeManger becoming too high.  I can also put the logic that if number of docs buffered cross certain limit call Index Manager commit.

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Friday, December 09, 2016 11:47 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Thanks Nick for your help and workaround. I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4. 

I stopped using LightMergeManager and I did not get that error any more however now performance more sucks. I'm going to try few things now:

1. Try the workaround provided by you. (I don't use background merger) 2. Try to use background merge in an another loop with above option.
2. Try to store information in-memory/storable/db instead of using search everytime. I think when I'm merging search with doc indexing under same process it is creating problems. If other system using search I don't see any problem. 
3. Try to serialize the index directories to avoid overlap anyway they all are running as parallel process.  

Hope one of above should work out. 

Thanks,
Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de]
Sent: Friday, December 09, 2016 8:51 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 09/12/2016 15:01, Gupta, Rajiv wrote:
> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high 
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to get one of these branches with Git and recompile Lucy. If you can't do that, either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise. 
The performance drop is probably caused by supplying an IndexManager to IndexReader or PolyReader which results in additional overhead from read locks. You should move the index to a local filesystem if you're concerned about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Thanks Nick for your help and workaround. I will ask my infra team to pick up latest 0.6 and install. I hope 0.6 works out better than 0.4. 

I stopped using LightMergeManager and I did not get that error any more however now performance more sucks. I'm going to try few things now:

1. Try the workaround provided by you. (I don't use background merger)
2. Try to use background merge in an another loop with above option.
2. Try to store information in-memory/storable/db instead of using search everytime. I think when I'm merging search with doc indexing under same process it is creating problems. If other system using search I don't see any problem. 
3. Try to serialize the index directories to avoid overlap anyway they all are running as parallel process.  

Hope one of above should work out. 

Thanks,
Rajiv

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Friday, December 09, 2016 8:51 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 09/12/2016 15:01, Gupta, Rajiv wrote:
> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high 
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to get one of these branches with Git and recompile Lucy. If you can't do that, either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise. 
The performance drop is probably caused by supplying an IndexManager to IndexReader or PolyReader which results in additional overhead from read locks. You should move the index to a local filesystem if you're concerned about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick

Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 09/12/2016 15:01, Gupta, Rajiv wrote:
> I'm getting this error very frequently now :(
>
> BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high
> S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> Is there any workaround?
>
> I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that?
>
> Please help. Very frequently I'm getting it now.

I committed a fix to the 0.4, 0.5, and 0.6 branches. Your best option is to 
get one of these branches with Git and recompile Lucy. If you can't do that, 
either stop using LightMergeManager, or try the following untested workaround.

Modify LightMergeManager to not call SUPER::recycle:

     package LightMergeManager;
     use base qw( Lucy::Index::IndexManager );

     sub recycle {
         my ( $self, %args ) = @_;
         my $seg_readers = $args{reader}->get_seg_readers;
         @$seg_readers = grep { $_->doc_max < 10 } @$seg_readers;
         return $seg_readers;
     }

Make BackgroundMerger always "optimize" the index before committing:

     $bg_merger->optimize;
     $bg_merger->commit;

> However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue.
>
> Any tips I can improve performance while using IndexReader?

Using PolyReader or IndexReader shouldn't make a difference performance-wise. 
The performance drop is probably caused by supplying an IndexManager to 
IndexReader or PolyReader which results in additional overhead from read 
locks. You should move the index to a local filesystem if you're concerned 
about performance.

> However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

No, simply use the hostname without a suffix.

Nick

RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
I'm getting this error very frequently now :(

BasicFlexGroup0_io_workload/pm_io/.lucyindex/1 :  input 47 too high 
S_fibonacci at core/Lucy/Index/IndexManager.c line 129

Is there any workaround? 

I'm using LightMergeManager I'm not sure if it is because of that. Should I stop that? 

Please help. Very frequently I'm getting it now.

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Thursday, December 08, 2016 4:00 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Fired multiple runs 10+ and all went through well except one where I got this error after replacing default manager to 

manager =>  LightMergeManager->new( host => $self->{_hostname}."DEL"),

ERROR:
20161208 033346 [] * FAIL: FAILED AT initializing the IndexSearcher Couldn't get deletion lock
20161208 033346 [] *    lucy_PolyReader_do_open at core/Lucy/Index/PolyReader.c line 344
20161208 033346 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3387.
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3383
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3381

All indexer operations except delete I wrapped around retries. So now I put retry against this as well.

However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue. 

Any tips I can improve performance while using IndexReader?

Thanks much for all your support. 

Thanks,
Rajiv Gupta


-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Wednesday, December 07, 2016 9:55 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Wednesday, December 07, 2016 9:51 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that. 

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs. 

I will update the results here. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Wednesday, December 07, 2016 9:38 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Fired multiple runs 10+ and all went through well except one where I got this error after replacing default manager to 

manager =>  LightMergeManager->new( host => $self->{_hostname}."DEL"),

ERROR:
20161208 033346 [] * FAIL: FAILED AT initializing the IndexSearcher Couldn't get deletion lock
20161208 033346 [] *    lucy_PolyReader_do_open at core/Lucy/Index/PolyReader.c line 344
20161208 033346 [] *    at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3387.
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3383
20161208 033346 [] *    eval {...} called at /x/eng/bbrtp/users/rajivg/dotdev_052309_4015413_1612060522/test/nate/bin/../lib/NATE/LucyIndexerUtils.pm line 3381

All indexer operations except delete I wrapped around retries. So now I put retry against this as well.

However, the search is now slower (after adding PolyReader/IndexReader). I used PolyReader as in one of the forum it was mentioned that PolyReader has protection against some mem leak issue. 

Any tips I can improve performance while using IndexReader?

Thanks much for all your support. 

Thanks,
Rajiv Gupta


-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Wednesday, December 07, 2016 9:55 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Wednesday, December 07, 2016 9:51 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that. 

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs. 

I will update the results here. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Wednesday, December 07, 2016 9:38 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
However, since I'm searching and indexing the files from the same process and same system should they need to be unique? Should I append something like <hostname>_search, <hostname>_index, <hostname>_delete?

-Rajiv

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Wednesday, December 07, 2016 9:51 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

I just now did that. 

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs. 

I will update the results here. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Wednesday, December 07, 2016 9:38 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
I just now did that. 

For my doc regular update, search and delete I'm using LightMergemanager with my host name. For adding end file marker I'm using regular manager with my host name added to conclude. I have also put retries around almost all commits where I was getting errors. Small runs were fine (there were any way fine) I have 5 large runs. 

I will update the results here. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Wednesday, December 07, 2016 9:38 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance problems you reported earlier.

Nick


Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 07/12/2016 13:23, Gupta, Rajiv wrote:
> * Indexer and log files are on NFS mount.

Have you read and understood Lucy::Docs::FileLocking? With NFS, you have to 
pass an IndexManager object to every indexer and searcher.

     http://lucy.apache.org/docs/perl/Lucy/Docs/FileLocking.html

The fact that the index is on NFS probably also explains the performance 
problems you reported earlier.

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Adding more different type of errors and they all are indicating that indexer cannot update the doc. Now I have put the eval around indexer->commit to catch errors. 

Can't delete 'documents.ix'
20161207 145559 [] *    S_do_consolidate at core/Lucy/Store/CompoundFileWriter.c line 173

Can't open '/u/smoke/presub/logs/cit-cr-setup-rtp.rajivg.1481116120.70557_cmode_1of1/010_cleanup/06_did_bad_happen/.lucyindex/1/seg_a9/documents.ix': Invalid argument
20161207 144109 [] *    LUCY_FSFolder_Local_Open_FileHandle_IMP at core/Lucy/Store/FSFolder.c line 118

What do you suggest should I put retry around it or apply FastUpdate mechanism. 

Thanks,
Rajiv gupta


-----Original Message-----
From: Gupta, Rajiv 
Sent: Wednesday, December 07, 2016 5:53 PM
To: user@lucy.apache.org
Subject: RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Thanks Nick for your reply. Thanks Peter too. 

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

Yes, I have an unusual setup. Let me try to explain the setup. 

* My application is a test application. That runs too many test cases in parallel, which generates lot of log files. I'm using Lucy to index those log files for faster search, pagination and generating summary. 
* From my application I kickoff a lucyindexer script using Open3 which is primarily responsible for indexing all the files while tests are progressing. The output & error of lucyindexer goes to STDOUT that is redirected to a log file. 
* My application generates log files from 4 different sources. The information of all the log files that are newly created and end of files are stored in 4 different tables in our database.
* In my lucyindexer main module I use EV watchers. To monitor the tables I use EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 1sec for file changes (however is just like periodic since that EV::stat won't work on NFS) and EV::IO to check the broken pipe so that I exit from indexer process once my test application ends. 
*With each watcher when I get a new log files it follows following workflow. Scan through the file with very limited keywords, doing file open and reading line by line and create a Lucy doc base of defined regular expression. If it got the end time from db then insert another special doc end marker indicating the end of the file. That file gets removed from my list after adding end marker. End marker also stores the last line number and last seek pointer.  If no end time got for that file then it keeps 1 sec stat for new changes and add Lucy docs incrementally. With every new next file in I use Lucy Search to search if that file was opened previously, if I found that file name then I get its last line number and the seek pointer from the end marker. I delete that doc (end marker) using Lucy indexer delete by query and start reading that file for further changes. Once that file is aging closed I again insert the end marker. Once I get the pipe broken from my parent test application i keep buffer of 2 mins to insert end marker for in process  files.
* The index directory for all these files is under same folder name with name .lucyindexer/1 (I fixed it). There are multiple files in the same folder but it is rare (I never see it) that they conflicts in creating docs. Why I'm saying it is because one version of application is already out which is generating the docs however, it has a problem that when same file opens again it re-index it full which takes time and creates duplicates. That is the reason I tried to insert Search before adding doc for those files. I can also keep them in memory but since sometime list of file goes in 100k (for long running tests) the system get out of memory and become very slow.
* Indexer and log files are on NFS mount.
* I also observed that EV sometime getting premature ends (without calling break) but I'm not sure it is because of Indexer error. That time there is no error reported.
* In my Viewer application I run Forked LucySearch to consolidate data from all the folders. The list of folders sometime goes 1000. I used polysearcher but not found it faster than fork. 

My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which may take a month of so.  

Here is what happening in parallel most of the time.

Search->Found->delete doc->add doc->commit
add doc->commit

Thanks for reading till here. I'm open for any suggestions. I really liked this framework and see big opportunity in my company internal triaging strategy with linking it with product logs for more effective results. 

You guys rock!

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de]
Sent: Wednesday, December 07, 2016 4:46 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 06/12/2016 17:17, Gupta, Rajiv wrote:
> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c
> line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> 
> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c 
> line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick


RE: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
Thanks Nick for your reply. Thanks Peter too. 

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

Yes, I have an unusual setup. Let me try to explain the setup. 

* My application is a test application. That runs too many test cases in parallel, which generates lot of log files. I'm using Lucy to index those log files for faster search, pagination and generating summary. 
* From my application I kickoff a lucyindexer script using Open3 which is primarily responsible for indexing all the files while tests are progressing. The output & error of lucyindexer goes to STDOUT that is redirected to a log file. 
* My application generates log files from 4 different sources. The information of all the log files that are newly created and end of files are stored in 4 different tables in our database.
* In my lucyindexer main module I use EV watchers. To monitor the tables I use EV::periodic (5sec) for new entries and completion of file (10sec), and EV:stat 1sec for file changes (however is just like periodic since that EV::stat won't work on NFS) and EV::IO to check the broken pipe so that I exit from indexer process once my test application ends. 
*With each watcher when I get a new log files it follows following workflow. Scan through the file with very limited keywords, doing file open and reading line by line and create a Lucy doc base of defined regular expression. If it got the end time from db then insert another special doc end marker indicating the end of the file. That file gets removed from my list after adding end marker. End marker also stores the last line number and last seek pointer.  If no end time got for that file then it keeps 1 sec stat for new changes and add Lucy docs incrementally. With every new next file in I use Lucy Search to search if that file was opened previously, if I found that file name then I get its last line number and the seek pointer from the end marker. I delete that doc (end marker) using Lucy indexer delete by query and start reading that file for further changes. Once that file is aging closed I again insert the end marker. Once I get the pipe broken from my parent test application i keep buffer of 2 mins to insert end marker for in process  files.
* The index directory for all these files is under same folder name with name .lucyindexer/1 (I fixed it). There are multiple files in the same folder but it is rare (I never see it) that they conflicts in creating docs. Why I'm saying it is because one version of application is already out which is generating the docs however, it has a problem that when same file opens again it re-index it full which takes time and creates duplicates. That is the reason I tried to insert Search before adding doc for those files. I can also keep them in memory but since sometime list of file goes in 100k (for long running tests) the system get out of memory and become very slow.
* Indexer and log files are on NFS mount.
* I also observed that EV sometime getting premature ends (without calling break) but I'm not sure it is because of Indexer error. That time there is no error reported.
* In my Viewer application I run Forked LucySearch to consolidate data from all the folders. The list of folders sometime goes 1000. I used polysearcher but not found it faster than fork. 

My Lucy library version is 0.4.2 I have asked my infra team to upgrade, which may take a month of so.  

Here is what happening in parallel most of the time.

Search->Found->delete doc->add doc->commit
add doc->commit

Thanks for reading till here. I'm open for any suggestions. I really liked this framework and see big opportunity in my company internal triaging strategy with linking it with product logs for more effective results. 

You guys rock!

Thanks,
Rajiv Gupta

-----Original Message-----
From: Nick Wellnhofer [mailto:wellnhofer@aevum.de] 
Sent: Wednesday, December 07, 2016 4:46 PM
To: user@lucy.apache.org
Subject: Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

On 06/12/2016 17:17, Gupta, Rajiv wrote:
> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at 
> core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c 
> line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> 
> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c 
> line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't happen unless something with our locking mechanism is broken. Do you have an unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick


Re: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 06/12/2016 17:17, Gupta, Rajiv wrote:
> Any idea why I'm getting this error.
>
> Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
> 20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
> 20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92
>
>
> In another log file getting different error
>
> Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
> 20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429
>
> When committing the indexer object.

This looks like two processes are writing to the index at once. This shouldn't 
happen unless something with our locking mechanism is broken. Do you have an 
unusual setup? Are you perhaps running on NFS?

> In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

I don't fully understand this paragraph. Can you clarify?

Nick


Re: [lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Nick Wellnhofer <we...@aevum.de>.
On 06/12/2016 22:16, Gupta, Rajiv wrote:
> I thought since I'm doing read and write together I may be getting file error so I tried to use FastUpdate method described here - http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html

You shouldn't get any filesystem errors when searching and indexing 
simultaneously. The only error you might get under normal operation is lock 
timeouts. In this case, you should consider fast updates.

But the errors you're seeing indicate a different problem that probably won't 
be cured by fast updates.

> But now I'm more frequently getting below error.
> Error input 57 too high
> 20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129

This is caused by a known bug. Unfortunately, the fix wasn't committed when it 
came up for the first time:

https://lists.apache.org/thread.html/0465759f6eae2108be30c70b490d0f94ab2b5c66bfac2b32c76eb41f@1362406649@%3Cuser.lucy.apache.org%3E

I'll make sure that the fix gets into the next release.

> How many docs I should limit to commit together?

If you're (re)indexing thousands of documents and don't want searchers to be 
locked out, you should consider indexing batches of documents and sleep 
between each batch to allow concurrent searches. I'd start with several 
hundred documents per batch and and sleep for maybe 2 seconds. This thread 
contains more details:

https://lists.apache.org/thread.html/0adbe8d0b5dd6c7491a8f28008428a39f485fae58ed606475f94c636@1355773975@%3Cuser.lucy.apache.org%3E

Nick


Re: [lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by Peter Karman <pe...@peknet.com>.
Gupta, Rajiv wrote on 12/6/16 3:16 PM:
> I thought since I'm doing read and write together I may be getting file error
> so I tried to use FastUpdate method described here -
> http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html
>
> But now I'm more frequently getting below error. Error input 57 too high
> 20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129
>
> My use case is. While my application generating multiple logs, I'm indexing
> them parallelly. To achieve this I'm storing docs at multiple locations at
> each directory level. In a directory there could be multiple log files so for
> that directory I'm having one indexer directory. When file get closed I
> insert an end marker doc to indicate that indexing on that file is done.
> However, sometimes same file get open multiple times with additional data. In
> such case I search in existing indexing directory if there is any end marker
> is set, if there is end marker then I delete that end marker, and index
> additional data and again insert end marker. In this process I search as well
> as write at the same time.
>
> I started seeing these type of error only after I inserted the logic of
> search and deleting. Should I catch this and retry? How many docs I should
> limit to commit together?


It wasn't clear to me from your description, but you should be running only one 
indexer at time per invindex.

You should also destroy any open searchers once the invindex changes (the 
indexer commits).

You can see here how Dezi approaches this:

https://metacpan.org/source/KARMAN/Dezi-App-0.014/lib/Dezi/Lucy/Searcher.pm#L406

TL;DR Dezi keeps track of a invindex header file with a UUID in it, which 
changes whenever the indexer finishes. Both the UUID and a md5 checksum of the 
header file are checked on every search, and the searcher is destroyed and a new 
one created if the old searcher is stale.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

[lucy-user] RE: LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Posted by "Gupta, Rajiv" <Ra...@netapp.com>.
I thought since I'm doing read and write together I may be getting file error so I tried to use FastUpdate method described here - http://lucy.apache.org/docs/perl/Lucy/Docs/Cookbook/FastUpdates.html

But now I'm more frequently getting below error. 
Error input 57 too high
20161206 150630 []  S_fibonacci at core/Lucy/Index/IndexManager.c line 129

My use case is. While my application generating multiple logs, I'm indexing them parallelly. To achieve this I'm storing docs at multiple locations at each directory level. In a directory there could be multiple log files so for that directory I'm having one indexer directory. When file get closed I insert an end marker doc to indicate that indexing on that file is done. However, sometimes same file get open multiple times with additional data. In such case I search in existing indexing directory if there is any end marker is set, if there is end marker then I delete that end marker, and index additional data and again insert end marker. In this process I search as well as write at the same time. 

I started seeing these type of error only after I inserted the logic of search and deleting. Should I catch this and retry? How many docs I should limit to commit together?

I will highly appreciate any help. 

Thanks,
Rajiv Gupta

-----Original Message-----
From: Gupta, Rajiv [mailto:Rajiv.Gupta@netapp.com] 
Sent: Tuesday, December 06, 2016 9:47 PM
To: user@lucy.apache.org
Subject: [lucy-user] LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119

Any idea why I'm getting this error.

Error Invalid path: 'seg_9i/lextemp'
20161205 184114 [] [event_check_for_logfile_completion_in_db][FAILED at DB Query to check logfile completion][Error Invalid path: 'seg_9i/lextemp'
20161205 184114 []  LUCY_Folder_Open_Out_IMP at core/Lucy/Store/Folder.c line 119
20161205 184114 []  S_lazy_init at core/Lucy/Index/PostingListWriter.c line 92


In another log file getting different error

Error rename from '<Dir>/.lucyindex/1/schema.temp' to '<Dir> /.lucyindex/1/schema_an.json' failed: Invalid argument
20161205 174146 []  LUCY_Schema_Write_IMP at core/Lucy/Plan/Schema.c line 429

When committing the indexer object.


In both the case I'm seeing one common pattern that time is getting skewed up in the STDOUT log file by 5-6 hrs before starting the process on this file. In actual system time is not changed.

-Rajiv