You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Mario Kevo <ma...@est.tech> on 2019/11/04 08:05:29 UTC

Lucene upgrade

Hi geode dev,

I'm working on upgrade lucene to a newer version. (
https://issues.apache.org/jira/browse/GEODE-7309)

I followed instruction from 
https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
Also add some other changes that is needed for lucene 8.2.0.

I found some problems with tests:
 * geode-
   lucene/src/test/java/org/apache/geode/cache/lucene/internal/distribu
   ted/DistributedScoringJUnitTest.java:  


 * geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.java:
 * geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled.java:
 * ./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java:
 * ./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion.java:

      -> failed due to 
Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.
	at
org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:21
3)
	at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:305)
	at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
	at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
	at
org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishCom
putingRepository(IndexRepositoryFactory.java:123)
	at
org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIn
dexRepository(IndexRepositoryFactory.java:66)
	at
org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.com
puteRepository(PartitionedRepositoryManager.java:151)
	at
org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.lam
bda$computeRepository$1(PartitionedRepositoryManager.java:170)
	... 16 more


 * geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java: 
      -> failed with the same exception as previous tests


I found this on web 
https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
, but not have an idea how to proceed with that.

Does anyone has any idea how to fix it?

BR,
Mario





Re: Lucene upgrade

Posted by Dan Smith <ds...@pivotal.io>.
>
> 1.) We add some product code/lucene listener to detect whether we have old
> versions of geode and if so, do not write to lucene on the newly updated
> node until all versions are up to date.


Elaborating on this option a little more, this might be as simple as
something like the below at the beginning of LuceneEventListener.process.
Maybe there is a better way to cache/check whether there are old members.

The danger with this approach is that the queues will grow until the
upgrade is complete. But maybe that is the only way to successfully do a
rolling upgrade with lucene indexes.

boolean hasOldMember = cache.getMembers().stream()
    .map(InternalDistributedMember.class::cast)
    .map(InternalDistributedMember::getVersionObject)
    .anyMatch(version -> version.compareTo(Version.GEODE_1_11_0) <0);

if(hasOldMember) {
  return false;
}


On Wed, Nov 6, 2019 at 2:16 PM Jason Huynh <jh...@pivotal.io> wrote:

> Jake, -from my understanding, the implementation details of geode-lucene is
> that we are using a partitioned region as a "file-system" for lucene
> files.  As new servers are rolled, the issue is that the new servers have
> the new codec.  As puts occur on the users data region, the async listeners
> are processing on new/old servers alike.  If a new server writes using the
> new codec, it's written into the partitioned region but if an old server
> with the old codec needs to read that file, it will blow up because it
> doesn't know about the new codec.
> Option 1 is to not have the new servers process/write if it detects
> different geode systems (pre-codec changes).
> Option 2 is similar but requires users to pause the aeq/lucene listeners
>
> Deleting the indexes and recreating them can be quite expensive.  Mostly
> due to tombstone creation when creating a new lucene index, but could be
> considered Option 3.  It also would probably require
> https://issues.apache.org/jira/browse/GEODE-3924 to be completed.
>
> Gester - I may be wrong but I think option 1 is still doable.  We just need
> to not write using the new codec until after all servers are upgraded.
>
> There was also some upgrade challenge with scoring from what I remember,
> but that's a different topic...
>
>
> On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
>
> > He tried to upgrade lucene version from current 6.6.4 to 8.2. There're
> some
> > challenges. One challenge is the codec changed, which caused the format
> of
> > index is also changed.
> >
> > That's why we did not implement it.
> >
> > If he resolved the coding challenges, then rolling upgrade will probably
> > need option-2 to workaround it.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett <jb...@pivotal.io>
> wrote:
> >
> > > What about “versioning” the region that backs the indexes? Old servers
> > > with old license would continue to read/write to old region. New
> servers
> > > would start re-indexing with the new version. Given the async nature of
> > the
> > > indexing would the mismatch in indexing for some period of time have an
> > > impact?
> > >
> > > Not an ideal solution but it’s something.
> > >
> > > In my previous life we just deleted the indexes and rebuilt them on
> > > upgrade but that was specific to our application.
> > >
> > > -Jake
> > >
> > >
> > > > On Nov 6, 2019, at 11:18 AM, Jason Huynh <jh...@pivotal.io> wrote:
> > > >
> > > > Hi Mario,
> > > >
> > > > I think there are a few ways to accomplish what Dan was
> > suggesting...Dan
> > > or
> > > > other's, please chime in with more options/solutions.
> > > >
> > > > 1.) We add some product code/lucene listener to detect whether we
> have
> > > old
> > > > versions of geode and if so, do not write to lucene on the newly
> > updated
> > > > node until all versions are up to date.
> > > >
> > > > 2.)  We document it and provide instructions (and a way) to pause
> > lucene
> > > > indexing before someone attempts to do a rolling upgrade.
> > > >
> > > > I'd prefer option 1 or some other robust solution, because I think
> > > option 2
> > > > has many possible issues.
> > > >
> > > >
> > > > -Jason
> > > >
> > > >
> > > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo <ma...@est.tech>
> > wrote:
> > > >>
> > > >> Hi Dan,
> > > >>
> > > >> thanks for suggestions.
> > > >> I didn't found a way to write lucene in older format. They only
> > support
> > > >> reading old format indexes with newer version by using
> > lucene-backward-
> > > >> codec.
> > > >>
> > > >> Regarding to freeze writes to the lucene index, that means that we
> > need
> > > >> to start locators and servers, create lucene index on the server,
> roll
> > > >> it to current and then do puts. In this case tests passed. Is it ok?
> > > >>
> > > >>
> > > >> BR,
> > > >> Mario
> > > >>
> > > >>
> > > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > > >>> I think the issue probably has to do with doing a rolling upgrade
> > > >>> from an
> > > >>> old version of geode (with an old version of lucene) to the new
> > > >>> version of
> > > >>> geode.
> > > >>>
> > > >>> Geode's lucene integration works by writing the lucene index to a
> > > >>> colocated
> > > >>> region. So lucene index data that was generated on one server can
> be
> > > >>> replicated or rebalanced to other servers.
> > > >>>
> > > >>> I think what may be happening is that data written by a geode
> member
> > > >>> with a
> > > >>> newer version is being read by a geode member with an old version.
> > > >>> Because
> > > >>> this is a rolling upgrade test, members with multiple versions will
> > > >>> be
> > > >>> running as part of the same cluster.
> > > >>>
> > > >>> I think to really fix this rolling upgrade issue we would need to
> > > >>> somehow
> > > >>> configure the new version of lucene to write data in the old
> format,
> > > >>> at
> > > >>> least until the rolling upgrade is complete. I'm not sure if that
> is
> > > >>> possible with lucene or not - but perhaps? Another option might be
> to
> > > >>> freeze writes to the lucene index during the rolling upgrade
> process.
> > > >>> Lucene indexes are asynchronous, so this wouldn't necessarily
> require
> > > >>> blocking all puts. But it would require queueing up a lot of
> updates.
> > > >>>
> > > >>> -Dan
> > > >>>
> > > >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi geode dev,
> > > >>>>
> > > >>>> I'm working on upgrade lucene to a newer version. (
> > > >>>> https://issues.apache.org/jira/browse/GEODE-7309)
> > > >>>>
> > > >>>> I followed instruction from
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > > >>>> Also add some other changes that is needed for lucene 8.2.0.
> > > >>>>
> > > >>>> I found some problems with tests:
> > > >>>> * geode-
> > > >>>>   lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > > >>>> ribu
> > > >>>>   ted/DistributedScoringJUnitTest.java:
> > > >>>>
> > > >>>>
> > > >>>> *
> > > >>>> geode-
> > > >>>>
> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > >>>>
> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > > >>>> ava:
> > > >>>> *
> > > >>>> geode-
> > > >>>>
> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > >>>>
> gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > > >>>> ed.java:
> > > >>>> *
> > > >>>> ./geode-
> > > >>>>
> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > >>>>
> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > > >>>> n.java:
> > > >>>> *
> > > >>>> ./geode-
> > > >>>>
> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > >>>>
> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > > >>>> itionRegion.java:
> > > >>>>
> > > >>>>      -> failed due to
> > > >>>> Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > > >>>> Format
> > > >>>> version is not supported (resource
> > > >>>> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > > >>>> and
> > > >>>> 9). This version of Lucene only supports indexes created with
> > > >>>> release
> > > >>>> 6.0 and later.
> > > >>>>        at
> > > >>>>
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > > >>>> a:21
> > > >>>> 3)
> > > >>>>        at
> > > >>>>
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > > >>>> 05)
> > > >>>>        at
> > > >>>>
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > > >>>> 89)
> > > >>>>        at
> > > >>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> > > >>>>        at
> > > >>>>
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > > >>>> hCom
> > > >>>> putingRepository(IndexRepositoryFactory.java:123)
> > > >>>>        at
> > > >>>>
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > > >>>> teIn
> > > >>>> dexRepository(IndexRepositoryFactory.java:66)
> > > >>>>        at
> > > >>>>
> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > > >>>> .com
> > > >>>> puteRepository(PartitionedRepositoryManager.java:151)
> > > >>>>        at
> > > >>>>
> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > > >>>> .lam
> > > >>>> bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> > > >>>>        ... 16 more
> > > >>>>
> > > >>>>
> > > >>>> *
> > > >>>> geode-
> > > >>>>
> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > >>>>
> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > > >>>> lBucketsCreated.java:
> > > >>>>
> > > >>>>      -> failed with the same exception as previous tests
> > > >>>>
> > > >>>>
> > > >>>> I found this on web
> > > >>>>
> > > >>>>
> > > >>
> > > >>
> > >
> >
> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > > >>>> , but not have an idea how to proceed with that.
> > > >>>>
> > > >>>> Does anyone has any idea how to fix it?
> > > >>>>
> > > >>>> BR,
> > > >>>> Mario
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>
> > >
> >
>

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi all,

Please could someone review #4395<https://github.com/apache/geode/pull/4395>.

BR,
Mario
________________________________
Šalje: Mario Kevo <ma...@est.tech>
Poslano: 17. prosinca 2019. 14:30
Prima: Jason Huynh <jh...@pivotal.io>
Kopija: geode <de...@geode.apache.org>
Predmet: Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario
________________________________
Šalje: Jason Huynh <jh...@pivotal.io>
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit off ( it expected reindex features to be complete by a certain release).  I have a PR on develop to adjust that calculation (https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data already in it) is enabled - any query will now throw the LuceneIndexingInProgressException instead of possibly waiting a very long time to receive a query result.  The tests themselves are coded to retry 10 times, knowing it will take awhile to reindex.  If you bump this number up or, better yet, make it time based (awaitility, etc), it should get you past this problem (once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing queries. Also tried to wait until isIndexingInProgress return false, but reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario

________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which one?
If it's a rolling upgrade test, then we might have to mark this as expected behavior and modify the tests to waitForFlush (wait until the queue is drained).  As long as the test is able to roll all the servers and not get stuck waiting for a queue to flush (which will only happen once all the servers are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, then we'd probably have to modify the test to not do the query in the middle or expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of entries in tests?

BR,
Mario



________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past the Index too new exception.  Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on.

The pr against your branch just prevents the repo from being constructed until all old members are upgraded.
This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check.  In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)

It looks like the fix is not good.

What I see (from RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java) is when it doing upgrade of a locator it will shutdown and started on the newer version. The problem is that server2 become a lead and cannot read lucene index on the newer version(Lucene index format has changed between 6 and 7 versions).

Another problem is after the rolling upgrade of locator and server1 when verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,
    15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario


________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io>> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io>> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Odg: Odg: Odg: Odg: Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi Jason,

Nice catch! I tried with larger number of retries(with your changes) and it passed.
I will try to make it time based.

Thanks for a help!

BR,
Mario
________________________________
Šalje: Jason Huynh <jh...@pivotal.io>
Poslano: 13. prosinca 2019. 23:10
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>
Predmet: Re: Odg: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a bit off ( it expected reindex features to be complete by a certain release).  I have a PR on develop to adjust that calculation (https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data already in it) is enabled - any query will now throw the LuceneIndexingInProgressException instead of possibly waiting a very long time to receive a query result.  The tests themselves are coded to retry 10 times, knowing it will take awhile to reindex.  If you bump this number up or, better yet, make it time based (awaitility, etc), it should get you past this problem (once the pull request gets checked in and pulled into your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing queries. Also tried to wait until isIndexingInProgress return false, but reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario

________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which one?
If it's a rolling upgrade test, then we might have to mark this as expected behavior and modify the tests to waitForFlush (wait until the queue is drained).  As long as the test is able to roll all the servers and not get stuck waiting for a queue to flush (which will only happen once all the servers are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, then we'd probably have to modify the test to not do the query in the middle or expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of entries in tests?

BR,
Mario



________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past the Index too new exception.  Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on.

The pr against your branch just prevents the repo from being constructed until all old members are upgraded.
This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check.  In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)

It looks like the fix is not good.

What I see (from RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java) is when it doing upgrade of a locator it will shutdown and started on the newer version. The problem is that server2 become a lead and cannot read lucene index on the newer version(Lucene index format has changed between 6 and 7 versions).

Another problem is after the rolling upgrade of locator and server1 when verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,
    15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario


________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io>> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io>> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Re: Odg: Odg: Odg: Odg: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Hi Mario,

I think I see what is going on here.  The logic for "reindex" code was a
bit off ( it expected reindex features to be complete by a certain
release).  I have a PR on develop to adjust that calculation (
https://github.com/apache/geode/pull/4466)

The expectation is that when lucene reindex (indexing a region with a data
already in it) is enabled - any query will now throw the
LuceneIndexingInProgressException instead of possibly waiting a very long
time to receive a query result.  The tests themselves are coded to retry 10
times, knowing it will take awhile to reindex.  If you bump this number up
or, better yet, make it time based (awaitility, etc), it should get you
past this problem (once the pull request gets checked in and pulled into
your branch)

Thanks!
-Jason


On Thu, Dec 12, 2019 at 5:07 AM Mario Kevo <ma...@est.tech> wrote:

> Hi Jason,
>
> Yes, the same tests failed:
>
> RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled
>
> RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion
>
> Sometimes this tests passed but more times it failed.
> As I said when change tests to put lower number of entries it passed
> every time or set to wait for repo in LuceneQueryFunction.java.
>
> *waitUntilFlushed* is called by *verifyLuceneQueryResults* before
> executing queries. Also tried to wait until *isIndexingInProgress* return
> false, but reached timeout and failed.
> In tests it tried to execute a query after all members are rolled.
>
> BR,
> Mario
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 11. prosinca 2019. 23:08
> *Prima:* Mario Kevo <ma...@est.tech>
> *Kopija:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> Is the same test failing?  If it's a different test, could you tell us
> which one?
> If it's a rolling upgrade test, then we might have to mark this as
> expected behavior and modify the tests to waitForFlush (wait until the
> queue is drained).  As long as the test is able to roll all the servers and
> not get stuck waiting for a queue to flush (which will only happen once all
> the servers are rolled now).
>
> If the test hasn't rolled all the servers and is trying to execute a
> query, then we'd probably have to modify the test to not do the query in
> the middle or expect that exception to occur.
>
> Thanks,
> -Jason
>
> On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo <ma...@est.tech> wrote:
>
> Hi Jason,
>
> This change fix IndexFormatTooNewException, but now we have
>
>  org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing
>
>
> So this means that query doesn't wait until all indexes are created.
> In *LuceneQueryFunction.java* it is set to not wait for repo [*execute(context,
> false)*]. If we have a bigger queue(like in the test) it will failed as
> it will not wait until indexes are created. I also tried to put just few
> objects and it passed as it had enough time to create indexes.
> Do we need to change this part to wait for repo, or put a lower number of
> entries in tests?
>
> BR,
> Mario
>
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 6. prosinca 2019. 20:53
> *Prima:* Mario Kevo <ma...@est.tech>
> *Kopija:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> I made a PR against your branch for some of the changes I had to do to get
> past the Index too new exception.  Summary - repo creation, even if no
> writes occur, appear to create some meta data that the old node attempts to
> read and blow up on.
>
> The pr against your branch just prevents the repo from being constructed
> until all old members are upgraded.
> This requires test changes to not try to validate using queries (since we
> prevent draining and repo creation, the query will just wait)
>
> The reason why you probably were seeing unsuccessful dispatches, is
> because we kind of intended for that with the oldMember check.  In-between
> the server rolls, the test was trying to verify, but because not all
> servers had upgraded, the LuceneEventListener wasn't allowing the queue to
> drain on the new member.
>
> I am not sure if the changes I added are acceptable or not -maybe if this
> ends up working then we can discuss on the dev list.
>
> There will probably be other "gotcha's" along the way...
>
>
> On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
>
> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,    15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > ________________________________
> > Šalje: Xiaojian Zhou <gz...@pivotal.io>
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode <de...@geode.apache.org>
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read the
> > > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> > >
> > > I think Option 2 is going to leave a lot of people unhappy when they
> get
> > > stuck with what Mario is experiencing right now and all we can say is
> > "you
> > > should have read the doc". Not to say Option 2 isn't valid and it's
> > > definitely the least amount of work to do, I still vote option 1.
> > >
> > > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> > >
> > > > Usually re-creating region and index are expensive and customers are
> > > > reluctant to do it, according to my memory.
> > > >
> > > > We do have an offline reindex scripts or steps (written by Barry?).
> If
> > > that
> > > > could be an option, they can try that offline tool.
> > > >
> > > > I saw from Mario's email, he said: "I didn't found a way to write
> > lucene
> > > in
> > > > older format. They only support
> > > > reading old format indexes with newer version by using
> lucene-backward-
> > > > codec."
> > > >
> > > > That's why I think option-1 is not feasible.
> > > >
> > > > Option-2 will cause the queue to be filled. But usually customer will
> > > hold
> > > > on, silence or reduce their business throughput when
> > > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > > >
> > > > Overall, after compared all the 3 options, I still think option-2 is
> > the
> > > > best bet.
> > > >
> > > > Regards
> > > > Gester
> > > >
> > > >
> > > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>
> wrote:
> > > > > >
> > > > > > Jake - there is a side effect to this in that the user would have
> > to
> > > > > > reimport all their data into the user defined region too.  Client
> > > apps
> > > > > > would also have to know which of the regions to put into.. also,
> I
> > > may
> > > > be
> > > > > > misunderstanding this suggestion, completely.  In either case,
> I'll
> > > > > support
> > > > > > whoever implements the changes :-P
> > > > >
> > > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > > thought.
> > > > >
> > > > > -Jake
> > > > >
> > > > >
> > > >
> > >
> >
>
>

Odg: Odg: Odg: Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi Jason,

Yes, the same tests failed:

RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled

RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion

Sometimes this tests passed but more times it failed.
As I said when change tests to put lower number of entries it passed every time or set to wait for repo in LuceneQueryFunction.java.

waitUntilFlushed is called by verifyLuceneQueryResults before executing queries. Also tried to wait until isIndexingInProgress return false, but reached timeout and failed.
In tests it tried to execute a query after all members are rolled.

BR,
Mario

________________________________
Šalje: Jason Huynh <jh...@pivotal.io>
Poslano: 11. prosinca 2019. 23:08
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>
Predmet: Re: Odg: Odg: Odg: Lucene upgrade

Hi Mario,

Is the same test failing?  If it's a different test, could you tell us which one?
If it's a rolling upgrade test, then we might have to mark this as expected behavior and modify the tests to waitForFlush (wait until the queue is drained).  As long as the test is able to roll all the servers and not get stuck waiting for a queue to flush (which will only happen once all the servers are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query, then we'd probably have to modify the test to not do the query in the middle or expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of entries in tests?

BR,
Mario



________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past the Index too new exception.  Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on.

The pr against your branch just prevents the repo from being constructed until all old members are upgraded.
This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check.  In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)

It looks like the fix is not good.

What I see (from RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java) is when it doing upgrade of a locator it will shutdown and started on the newer version. The problem is that server2 become a lead and cannot read lucene index on the newer version(Lucene index format has changed between 6 and 7 versions).

Another problem is after the rolling upgrade of locator and server1 when verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,
    15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario


________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io>> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io>> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Re: Odg: Odg: Odg: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Hi Mario,

Is the same test failing?  If it's a different test, could you tell us
which one?
If it's a rolling upgrade test, then we might have to mark this as expected
behavior and modify the tests to waitForFlush (wait until the queue is
drained).  As long as the test is able to roll all the servers and not get
stuck waiting for a queue to flush (which will only happen once all the
servers are rolled now).

If the test hasn't rolled all the servers and is trying to execute a query,
then we'd probably have to modify the test to not do the query in the
middle or expect that exception to occur.

Thanks,
-Jason

On Wed, Dec 11, 2019 at 6:43 AM Mario Kevo <ma...@est.tech> wrote:

> Hi Jason,
>
> This change fix IndexFormatTooNewException, but now we have
>
>  org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing
>
>
> So this means that query doesn't wait until all indexes are created.
> In * LuceneQueryFunction.java* it is set to not wait for repo [*execute(context,
> false)*]. If we have a bigger queue(like in the test) it will failed as
> it will not wait until indexes are created. I also tried to put just few
> objects and it passed as it had enough time to create indexes.
> Do we need to change this part to wait for repo, or put a lower number of
> entries in tests?
>
> BR,
> Mario
>
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 6. prosinca 2019. 20:53
> *Prima:* Mario Kevo <ma...@est.tech>
> *Kopija:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Odg: Lucene upgrade
>
> Hi Mario,
>
> I made a PR against your branch for some of the changes I had to do to get
> past the Index too new exception.  Summary - repo creation, even if no
> writes occur, appear to create some meta data that the old node attempts to
> read and blow up on.
>
> The pr against your branch just prevents the repo from being constructed
> until all old members are upgraded.
> This requires test changes to not try to validate using queries (since we
> prevent draining and repo creation, the query will just wait)
>
> The reason why you probably were seeing unsuccessful dispatches, is
> because we kind of intended for that with the oldMember check.  In-between
> the server rolls, the test was trying to verify, but because not all
> servers had upgraded, the LuceneEventListener wasn't allowing the queue to
> drain on the new member.
>
> I am not sure if the changes I added are acceptable or not -maybe if this
> ends up working then we can discuss on the dev list.
>
> There will probably be other "gotcha's" along the way...
>
>
> On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
>
> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,    15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > ________________________________
> > Šalje: Xiaojian Zhou <gz...@pivotal.io>
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode <de...@geode.apache.org>
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read the
> > > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> > >
> > > I think Option 2 is going to leave a lot of people unhappy when they
> get
> > > stuck with what Mario is experiencing right now and all we can say is
> > "you
> > > should have read the doc". Not to say Option 2 isn't valid and it's
> > > definitely the least amount of work to do, I still vote option 1.
> > >
> > > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> > >
> > > > Usually re-creating region and index are expensive and customers are
> > > > reluctant to do it, according to my memory.
> > > >
> > > > We do have an offline reindex scripts or steps (written by Barry?).
> If
> > > that
> > > > could be an option, they can try that offline tool.
> > > >
> > > > I saw from Mario's email, he said: "I didn't found a way to write
> > lucene
> > > in
> > > > older format. They only support
> > > > reading old format indexes with newer version by using
> lucene-backward-
> > > > codec."
> > > >
> > > > That's why I think option-1 is not feasible.
> > > >
> > > > Option-2 will cause the queue to be filled. But usually customer will
> > > hold
> > > > on, silence or reduce their business throughput when
> > > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > > >
> > > > Overall, after compared all the 3 options, I still think option-2 is
> > the
> > > > best bet.
> > > >
> > > > Regards
> > > > Gester
> > > >
> > > >
> > > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>
> wrote:
> > > > > >
> > > > > > Jake - there is a side effect to this in that the user would have
> > to
> > > > > > reimport all their data into the user defined region too.  Client
> > > apps
> > > > > > would also have to know which of the regions to put into.. also,
> I
> > > may
> > > > be
> > > > > > misunderstanding this suggestion, completely.  In either case,
> I'll
> > > > > support
> > > > > > whoever implements the changes :-P
> > > > >
> > > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > > thought.
> > > > >
> > > > > -Jake
> > > > >
> > > > >
> > > >
> > >
> >
>
>

Odg: Odg: Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi Jason,

This change fix IndexFormatTooNewException, but now we have

 org.apache.geode.cache.lucene.LuceneQueryException: Lucene Index is not available, currently indexing

So this means that query doesn't wait until all indexes are created.
In LuceneQueryFunction.java it is set to not wait for repo [execute(context, false)]. If we have a bigger queue(like in the test) it will failed as it will not wait until indexes are created. I also tried to put just few objects and it passed as it had enough time to create indexes.
Do we need to change this part to wait for repo, or put a lower number of entries in tests?

BR,
Mario



________________________________
Šalje: Jason Huynh <jh...@pivotal.io>
Poslano: 6. prosinca 2019. 20:53
Prima: Mario Kevo <ma...@est.tech>
Kopija: geode <de...@geode.apache.org>
Predmet: Re: Odg: Odg: Lucene upgrade

Hi Mario,

I made a PR against your branch for some of the changes I had to do to get past the Index too new exception.  Summary - repo creation, even if no writes occur, appear to create some meta data that the old node attempts to read and blow up on.

The pr against your branch just prevents the repo from being constructed until all old members are upgraded.
This requires test changes to not try to validate using queries (since we prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because we kind of intended for that with the oldMember check.  In-between the server rolls, the test was trying to verify, but because not all servers had upgraded, the LuceneEventListener wasn't allowing the queue to drain on the new member.

I am not sure if the changes I added are acceptable or not -maybe if this ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)

It looks like the fix is not good.

What I see (from RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java) is when it doing upgrade of a locator it will shutdown and started on the newer version. The problem is that server2 become a lead and cannot read lucene index on the newer version(Lucene index format has changed between 6 and 7 versions).

Another problem is after the rolling upgrade of locator and server1 when verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,
    15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario


________________________________
Šalje: Jason Huynh <jh...@pivotal.io>>
Poslano: 2. prosinca 2019. 18:32
Prima: geode <de...@geode.apache.org>>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io>> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io>> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Re: Odg: Odg: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Hi Mario,

I made a PR against your branch for some of the changes I had to do to get
past the Index too new exception.  Summary - repo creation, even if no
writes occur, appear to create some meta data that the old node attempts to
read and blow up on.

The pr against your branch just prevents the repo from being constructed
until all old members are upgraded.
This requires test changes to not try to validate using queries (since we
prevent draining and repo creation, the query will just wait)

The reason why you probably were seeing unsuccessful dispatches, is because
we kind of intended for that with the oldMember check.  In-between the
server rolls, the test was trying to verify, but because not all servers
had upgraded, the LuceneEventListener wasn't allowing the queue to drain on
the new member.

I am not sure if the changes I added are acceptable or not -maybe if this
ends up working then we can discuss on the dev list.

There will probably be other "gotcha's" along the way...


On Fri, Dec 6, 2019 at 1:12 AM Mario Kevo <ma...@est.tech> wrote:

> Hi Jason,
>
> I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:
>
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)
>
> It looks like the fix is not good.
>
> What I see (from
> *RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion*
> *.java*) is when it doing upgrade of a *locator* it will shutdown and
> started on the newer version. The problem is that *server2* become a lead
> and cannot read lucene index on the newer version(Lucene index format has
> changed between 6 and 7 versions).
>
> Another problem is after the rolling upgrade of *locator* and *server1*
> when verifying region size on VMs. For example,
>
>
>
> *expectedRegionSize += 5;putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,    15, server2, server3);*
>
> First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
> From logs it can be seen that two batches are unsuccessfully dispatched:
>
>
> *[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
>
>
> *[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)*
> For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.
>
> I don't know why some events are successfully dispatched, some not.
> Do you have any idea?
>
> BR,
> Mario
>
>
> ------------------------------
> *Šalje:* Jason Huynh <jh...@pivotal.io>
> *Poslano:* 2. prosinca 2019. 18:32
> *Prima:* geode <de...@geode.apache.org>
> *Predmet:* Re: Odg: Lucene upgrade
>
> Hi Mario,
>
> Sorry I reread the original email and see that the exception points to a
> different problem.. I think your fix addresses an old version seeing an
> unknown new lucene format, which looks good.  The following exception looks
> like it's the new lucene library not being able to read the older files
> (Just a guess from the message)...
>
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>
> The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
> incorrect (stating needs to be release 6.0 and later) or if it requires an
> intermediate upgrade between 6.6.2 -> 7.x -> 8.
>
>
>
>
>
> On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:
>
> >
> > I started with implementation of Option-1.
> > As I understood the idea is to block all puts(put them in the queue)
> until
> > all members are upgraded. After that it will process all queued events.
> >
> > I tried with Dan's proposal to check on start of
> > LuceneEventListener.process() if all members are upgraded, also changed
> > test to verify lucene indexes only after all members are upgraded, but
> got
> > the same error with incompatibilities between lucene versions.
> > Changes are visible on https://github.com/apache/geode/pull/4198.
> >
> > Please add comments and suggestions.
> >
> > BR,
> > Mario
> >
> >
> > ________________________________
> > Šalje: Xiaojian Zhou <gz...@pivotal.io>
> > Poslano: 7. studenog 2019. 18:27
> > Prima: geode <de...@geode.apache.org>
> > Predmet: Re: Lucene upgrade
> >
> > Oh, I misunderstood option-1 and option-2. What I vote is Jason's
> option-1.
> >
> > On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:
> >
> > > Gester, I don't think we need to write in the old format, we just need
> > the
> > > new format not to be written while old members can potentially read the
> > > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> > >
> > > I think Option 2 is going to leave a lot of people unhappy when they
> get
> > > stuck with what Mario is experiencing right now and all we can say is
> > "you
> > > should have read the doc". Not to say Option 2 isn't valid and it's
> > > definitely the least amount of work to do, I still vote option 1.
> > >
> > > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> > >
> > > > Usually re-creating region and index are expensive and customers are
> > > > reluctant to do it, according to my memory.
> > > >
> > > > We do have an offline reindex scripts or steps (written by Barry?).
> If
> > > that
> > > > could be an option, they can try that offline tool.
> > > >
> > > > I saw from Mario's email, he said: "I didn't found a way to write
> > lucene
> > > in
> > > > older format. They only support
> > > > reading old format indexes with newer version by using
> lucene-backward-
> > > > codec."
> > > >
> > > > That's why I think option-1 is not feasible.
> > > >
> > > > Option-2 will cause the queue to be filled. But usually customer will
> > > hold
> > > > on, silence or reduce their business throughput when
> > > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > > >
> > > > Overall, after compared all the 3 options, I still think option-2 is
> > the
> > > > best bet.
> > > >
> > > > Regards
> > > > Gester
> > > >
> > > >
> > > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io>
> wrote:
> > > > > >
> > > > > > Jake - there is a side effect to this in that the user would have
> > to
> > > > > > reimport all their data into the user defined region too.  Client
> > > apps
> > > > > > would also have to know which of the regions to put into.. also,
> I
> > > may
> > > > be
> > > > > > misunderstanding this suggestion, completely.  In either case,
> I'll
> > > > > support
> > > > > > whoever implements the changes :-P
> > > > >
> > > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > > thought.
> > > > >
> > > > > -Jake
> > > > >
> > > > >
> > > >
> > >
> >
>

Odg: Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi Jason,

I tried to upgrade from 6.6.2 to 7.1.0 and got the following exception:

org.apache.lucene.index.IndexFormatTooNewException: Format version is not supported (resource BufferedChecksumIndexInput(segments_2)): 7 (needs to be between 4 and 6)

It looks like the fix is not good.

What I see (from RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java) is when it doing upgrade of a locator it will shutdown and started on the newer version. The problem is that server2 become a lead and cannot read lucene index on the newer version(Lucene index format has changed between 6 and 7 versions).

Another problem is after the rolling upgrade of locator and server1 when verifying region size on VMs. For example,

expectedRegionSize += 5;
putSerializableObjectAndVerifyLuceneQueryResult(server1, regionName, expectedRegionSize, 5,
    15, server2, server3);

First it checks if region has expected size for VMs and it passed(has 15 entries). The problem is while executing verifyLuceneQueryResults, for VM1(server2) it has 13 entries and assertion failed.
From logs it can be seen that two batches are unsuccessfully dispatched:

[vm0] [warn 2019/12/06 08:31:39.956 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_0> tid=0x42] During normal processing, unsuccessfully dispatched 1 events (batch #0)

[vm0] [warn 2019/12/06 08:31:40.103 CET <Event Processor for GatewaySender_AsyncEventQueue_index#_aRegion_2> tid=0x46] During normal processing, unsuccessfully dispatched 1 events (batch #0)

For VM0(server1) and VM2(server3) it has 14 entries, one is unsuccessfully dispatched.

I don't know why some events are successfully dispatched, some not.
Do you have any idea?

BR,
Mario


________________________________
Šalje: Jason Huynh <jh...@pivotal.io>
Poslano: 2. prosinca 2019. 18:32
Prima: geode <de...@geode.apache.org>
Predmet: Re: Odg: Lucene upgrade

Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Re: Odg: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Hi Mario,

Sorry I reread the original email and see that the exception points to a
different problem.. I think your fix addresses an old version seeing an
unknown new lucene format, which looks good.  The following exception looks
like it's the new lucene library not being able to read the older files
(Just a guess from the message)...

Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
version is not supported (resource
BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
9). This version of Lucene only supports indexes created with release
6.0 and later.

The upgrade is from 6.6.2 -> 8.x though, so I am not sure if the message is
incorrect (stating needs to be release 6.0 and later) or if it requires an
intermediate upgrade between 6.6.2 -> 7.x -> 8.





On Mon, Dec 2, 2019 at 2:00 AM Mario Kevo <ma...@est.tech> wrote:

>
> I started with implementation of Option-1.
> As I understood the idea is to block all puts(put them in the queue) until
> all members are upgraded. After that it will process all queued events.
>
> I tried with Dan's proposal to check on start of
> LuceneEventListener.process() if all members are upgraded, also changed
> test to verify lucene indexes only after all members are upgraded, but got
> the same error with incompatibilities between lucene versions.
> Changes are visible on https://github.com/apache/geode/pull/4198.
>
> Please add comments and suggestions.
>
> BR,
> Mario
>
>
> ________________________________
> Šalje: Xiaojian Zhou <gz...@pivotal.io>
> Poslano: 7. studenog 2019. 18:27
> Prima: geode <de...@geode.apache.org>
> Predmet: Re: Lucene upgrade
>
> Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.
>
> On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:
>
> > Gester, I don't think we need to write in the old format, we just need
> the
> > new format not to be written while old members can potentially read the
> > lucene files.  Option 1 can be very similar to Dan's snippet of code.
> >
> > I think Option 2 is going to leave a lot of people unhappy when they get
> > stuck with what Mario is experiencing right now and all we can say is
> "you
> > should have read the doc". Not to say Option 2 isn't valid and it's
> > definitely the least amount of work to do, I still vote option 1.
> >
> > On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
> >
> > > Usually re-creating region and index are expensive and customers are
> > > reluctant to do it, according to my memory.
> > >
> > > We do have an offline reindex scripts or steps (written by Barry?). If
> > that
> > > could be an option, they can try that offline tool.
> > >
> > > I saw from Mario's email, he said: "I didn't found a way to write
> lucene
> > in
> > > older format. They only support
> > > reading old format indexes with newer version by using lucene-backward-
> > > codec."
> > >
> > > That's why I think option-1 is not feasible.
> > >
> > > Option-2 will cause the queue to be filled. But usually customer will
> > hold
> > > on, silence or reduce their business throughput when
> > > doing rolling upgrade. I wonder if it's a reasonable assumption.
> > >
> > > Overall, after compared all the 3 options, I still think option-2 is
> the
> > > best bet.
> > >
> > > Regards
> > > Gester
> > >
> > >
> > > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> > wrote:
> > >
> > > >
> > > >
> > > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> > > > >
> > > > > Jake - there is a side effect to this in that the user would have
> to
> > > > > reimport all their data into the user defined region too.  Client
> > apps
> > > > > would also have to know which of the regions to put into.. also, I
> > may
> > > be
> > > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > > support
> > > > > whoever implements the changes :-P
> > > >
> > > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> > thought.
> > > >
> > > > -Jake
> > > >
> > > >
> > >
> >
>

Odg: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
I started with implementation of Option-1.
As I understood the idea is to block all puts(put them in the queue) until all members are upgraded. After that it will process all queued events.

I tried with Dan's proposal to check on start of LuceneEventListener.process() if all members are upgraded, also changed test to verify lucene indexes only after all members are upgraded, but got the same error with incompatibilities between lucene versions.
Changes are visible on https://github.com/apache/geode/pull/4198.

Please add comments and suggestions.

BR,
Mario


________________________________
Šalje: Xiaojian Zhou <gz...@pivotal.io>
Poslano: 7. studenog 2019. 18:27
Prima: geode <de...@geode.apache.org>
Predmet: Re: Lucene upgrade

Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.

On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:

> Gester, I don't think we need to write in the old format, we just need the
> new format not to be written while old members can potentially read the
> lucene files.  Option 1 can be very similar to Dan's snippet of code.
>
> I think Option 2 is going to leave a lot of people unhappy when they get
> stuck with what Mario is experiencing right now and all we can say is "you
> should have read the doc". Not to say Option 2 isn't valid and it's
> definitely the least amount of work to do, I still vote option 1.
>
> On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
>
> > Usually re-creating region and index are expensive and customers are
> > reluctant to do it, according to my memory.
> >
> > We do have an offline reindex scripts or steps (written by Barry?). If
> that
> > could be an option, they can try that offline tool.
> >
> > I saw from Mario's email, he said: "I didn't found a way to write lucene
> in
> > older format. They only support
> > reading old format indexes with newer version by using lucene-backward-
> > codec."
> >
> > That's why I think option-1 is not feasible.
> >
> > Option-2 will cause the queue to be filled. But usually customer will
> hold
> > on, silence or reduce their business throughput when
> > doing rolling upgrade. I wonder if it's a reasonable assumption.
> >
> > Overall, after compared all the 3 options, I still think option-2 is the
> > best bet.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> wrote:
> >
> > >
> > >
> > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> > > >
> > > > Jake - there is a side effect to this in that the user would have to
> > > > reimport all their data into the user defined region too.  Client
> apps
> > > > would also have to know which of the regions to put into.. also, I
> may
> > be
> > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > support
> > > > whoever implements the changes :-P
> > >
> > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> thought.
> > >
> > > -Jake
> > >
> > >
> >
>

Re: Lucene upgrade

Posted by Xiaojian Zhou <gz...@pivotal.io>.
Oh, I misunderstood option-1 and option-2. What I vote is Jason's option-1.

On Thu, Nov 7, 2019 at 9:19 AM Jason Huynh <jh...@pivotal.io> wrote:

> Gester, I don't think we need to write in the old format, we just need the
> new format not to be written while old members can potentially read the
> lucene files.  Option 1 can be very similar to Dan's snippet of code.
>
> I think Option 2 is going to leave a lot of people unhappy when they get
> stuck with what Mario is experiencing right now and all we can say is "you
> should have read the doc". Not to say Option 2 isn't valid and it's
> definitely the least amount of work to do, I still vote option 1.
>
> On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:
>
> > Usually re-creating region and index are expensive and customers are
> > reluctant to do it, according to my memory.
> >
> > We do have an offline reindex scripts or steps (written by Barry?). If
> that
> > could be an option, they can try that offline tool.
> >
> > I saw from Mario's email, he said: "I didn't found a way to write lucene
> in
> > older format. They only support
> > reading old format indexes with newer version by using lucene-backward-
> > codec."
> >
> > That's why I think option-1 is not feasible.
> >
> > Option-2 will cause the queue to be filled. But usually customer will
> hold
> > on, silence or reduce their business throughput when
> > doing rolling upgrade. I wonder if it's a reasonable assumption.
> >
> > Overall, after compared all the 3 options, I still think option-2 is the
> > best bet.
> >
> > Regards
> > Gester
> >
> >
> > On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io>
> wrote:
> >
> > >
> > >
> > > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> > > >
> > > > Jake - there is a side effect to this in that the user would have to
> > > > reimport all their data into the user defined region too.  Client
> apps
> > > > would also have to know which of the regions to put into.. also, I
> may
> > be
> > > > misunderstanding this suggestion, completely.  In either case, I'll
> > > support
> > > > whoever implements the changes :-P
> > >
> > > Ah… there isn’t a way to re-index the existing data. Eh… just a
> thought.
> > >
> > > -Jake
> > >
> > >
> >
>

Re: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Gester, I don't think we need to write in the old format, we just need the
new format not to be written while old members can potentially read the
lucene files.  Option 1 can be very similar to Dan's snippet of code.

I think Option 2 is going to leave a lot of people unhappy when they get
stuck with what Mario is experiencing right now and all we can say is "you
should have read the doc". Not to say Option 2 isn't valid and it's
definitely the least amount of work to do, I still vote option 1.

On Wed, Nov 6, 2019 at 5:16 PM Xiaojian Zhou <gz...@pivotal.io> wrote:

> Usually re-creating region and index are expensive and customers are
> reluctant to do it, according to my memory.
>
> We do have an offline reindex scripts or steps (written by Barry?). If that
> could be an option, they can try that offline tool.
>
> I saw from Mario's email, he said: "I didn't found a way to write lucene in
> older format. They only support
> reading old format indexes with newer version by using lucene-backward-
> codec."
>
> That's why I think option-1 is not feasible.
>
> Option-2 will cause the queue to be filled. But usually customer will hold
> on, silence or reduce their business throughput when
> doing rolling upgrade. I wonder if it's a reasonable assumption.
>
> Overall, after compared all the 3 options, I still think option-2 is the
> best bet.
>
> Regards
> Gester
>
>
> On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io> wrote:
>
> >
> >
> > > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> > >
> > > Jake - there is a side effect to this in that the user would have to
> > > reimport all their data into the user defined region too.  Client apps
> > > would also have to know which of the regions to put into.. also, I may
> be
> > > misunderstanding this suggestion, completely.  In either case, I'll
> > support
> > > whoever implements the changes :-P
> >
> > Ah… there isn’t a way to re-index the existing data. Eh… just a thought.
> >
> > -Jake
> >
> >
>

Re: Lucene upgrade

Posted by Xiaojian Zhou <gz...@pivotal.io>.
Usually re-creating region and index are expensive and customers are
reluctant to do it, according to my memory.

We do have an offline reindex scripts or steps (written by Barry?). If that
could be an option, they can try that offline tool.

I saw from Mario's email, he said: "I didn't found a way to write lucene in
older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec."

That's why I think option-1 is not feasible.

Option-2 will cause the queue to be filled. But usually customer will hold
on, silence or reduce their business throughput when
doing rolling upgrade. I wonder if it's a reasonable assumption.

Overall, after compared all the 3 options, I still think option-2 is the
best bet.

Regards
Gester


On Wed, Nov 6, 2019 at 3:38 PM Jacob Barrett <jb...@pivotal.io> wrote:

>
>
> > On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> >
> > Jake - there is a side effect to this in that the user would have to
> > reimport all their data into the user defined region too.  Client apps
> > would also have to know which of the regions to put into.. also, I may be
> > misunderstanding this suggestion, completely.  In either case, I'll
> support
> > whoever implements the changes :-P
>
> Ah… there isn’t a way to re-index the existing data. Eh… just a thought.
>
> -Jake
>
>

Re: Lucene upgrade

Posted by Jacob Barrett <jb...@pivotal.io>.

> On Nov 6, 2019, at 3:36 PM, Jason Huynh <jh...@pivotal.io> wrote:
> 
> Jake - there is a side effect to this in that the user would have to
> reimport all their data into the user defined region too.  Client apps
> would also have to know which of the regions to put into.. also, I may be
> misunderstanding this suggestion, completely.  In either case, I'll support
> whoever implements the changes :-P

Ah… there isn’t a way to re-index the existing data. Eh… just a thought.

-Jake


Re: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Dan - LGTM check it in! ;-) (kidding of course)

Jake - there is a side effect to this in that the user would have to
reimport all their data into the user defined region too.  Client apps
would also have to know which of the regions to put into.. also, I may be
misunderstanding this suggestion, completely.  In either case, I'll support
whoever implements the changes :-P


On Wed, Nov 6, 2019 at 2:53 PM Jacob Barrett <jb...@pivotal.io> wrote:

>
>
> > On Nov 6, 2019, at 2:16 PM, Jason Huynh <jh...@pivotal.io> wrote:
> >
> > Jake, -from my understanding, the implementation details of geode-lucene
> is
> > that we are using a partitioned region as a "file-system" for lucene
> > files.
>
> Yeah, I didn’t explain well. I mean to say literally create a new region
> for the new version of lucene and effectively start over. Yes this is
> expensive but its also functional. So new members would create region
> `lucene-whatever-v8` and start over there. Then when all nodes are upgraded
> the old `lucent-whatever` region could be deleted.
>
> Just tossing out alternatives to what’s already been posed.
>
> -Jake
>
>

Re: Lucene upgrade

Posted by Jacob Barrett <jb...@pivotal.io>.

> On Nov 6, 2019, at 2:16 PM, Jason Huynh <jh...@pivotal.io> wrote:
> 
> Jake, -from my understanding, the implementation details of geode-lucene is
> that we are using a partitioned region as a "file-system" for lucene
> files. 

Yeah, I didn’t explain well. I mean to say literally create a new region for the new version of lucene and effectively start over. Yes this is expensive but its also functional. So new members would create region `lucene-whatever-v8` and start over there. Then when all nodes are upgraded the old `lucent-whatever` region could be deleted.

Just tossing out alternatives to what’s already been posed.

-Jake


Re: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Jake, -from my understanding, the implementation details of geode-lucene is
that we are using a partitioned region as a "file-system" for lucene
files.  As new servers are rolled, the issue is that the new servers have
the new codec.  As puts occur on the users data region, the async listeners
are processing on new/old servers alike.  If a new server writes using the
new codec, it's written into the partitioned region but if an old server
with the old codec needs to read that file, it will blow up because it
doesn't know about the new codec.
Option 1 is to not have the new servers process/write if it detects
different geode systems (pre-codec changes).
Option 2 is similar but requires users to pause the aeq/lucene listeners

Deleting the indexes and recreating them can be quite expensive.  Mostly
due to tombstone creation when creating a new lucene index, but could be
considered Option 3.  It also would probably require
https://issues.apache.org/jira/browse/GEODE-3924 to be completed.

Gester - I may be wrong but I think option 1 is still doable.  We just need
to not write using the new codec until after all servers are upgraded.

There was also some upgrade challenge with scoring from what I remember,
but that's a different topic...


On Wed, Nov 6, 2019 at 1:00 PM Xiaojian Zhou <gz...@pivotal.io> wrote:

> He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some
> challenges. One challenge is the codec changed, which caused the format of
> index is also changed.
>
> That's why we did not implement it.
>
> If he resolved the coding challenges, then rolling upgrade will probably
> need option-2 to workaround it.
>
> Regards
> Gester
>
>
> On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett <jb...@pivotal.io> wrote:
>
> > What about “versioning” the region that backs the indexes? Old servers
> > with old license would continue to read/write to old region. New servers
> > would start re-indexing with the new version. Given the async nature of
> the
> > indexing would the mismatch in indexing for some period of time have an
> > impact?
> >
> > Not an ideal solution but it’s something.
> >
> > In my previous life we just deleted the indexes and rebuilt them on
> > upgrade but that was specific to our application.
> >
> > -Jake
> >
> >
> > > On Nov 6, 2019, at 11:18 AM, Jason Huynh <jh...@pivotal.io> wrote:
> > >
> > > Hi Mario,
> > >
> > > I think there are a few ways to accomplish what Dan was
> suggesting...Dan
> > or
> > > other's, please chime in with more options/solutions.
> > >
> > > 1.) We add some product code/lucene listener to detect whether we have
> > old
> > > versions of geode and if so, do not write to lucene on the newly
> updated
> > > node until all versions are up to date.
> > >
> > > 2.)  We document it and provide instructions (and a way) to pause
> lucene
> > > indexing before someone attempts to do a rolling upgrade.
> > >
> > > I'd prefer option 1 or some other robust solution, because I think
> > option 2
> > > has many possible issues.
> > >
> > >
> > > -Jason
> > >
> > >
> > >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo <ma...@est.tech>
> wrote:
> > >>
> > >> Hi Dan,
> > >>
> > >> thanks for suggestions.
> > >> I didn't found a way to write lucene in older format. They only
> support
> > >> reading old format indexes with newer version by using
> lucene-backward-
> > >> codec.
> > >>
> > >> Regarding to freeze writes to the lucene index, that means that we
> need
> > >> to start locators and servers, create lucene index on the server, roll
> > >> it to current and then do puts. In this case tests passed. Is it ok?
> > >>
> > >>
> > >> BR,
> > >> Mario
> > >>
> > >>
> > >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > >>> I think the issue probably has to do with doing a rolling upgrade
> > >>> from an
> > >>> old version of geode (with an old version of lucene) to the new
> > >>> version of
> > >>> geode.
> > >>>
> > >>> Geode's lucene integration works by writing the lucene index to a
> > >>> colocated
> > >>> region. So lucene index data that was generated on one server can be
> > >>> replicated or rebalanced to other servers.
> > >>>
> > >>> I think what may be happening is that data written by a geode member
> > >>> with a
> > >>> newer version is being read by a geode member with an old version.
> > >>> Because
> > >>> this is a rolling upgrade test, members with multiple versions will
> > >>> be
> > >>> running as part of the same cluster.
> > >>>
> > >>> I think to really fix this rolling upgrade issue we would need to
> > >>> somehow
> > >>> configure the new version of lucene to write data in the old format,
> > >>> at
> > >>> least until the rolling upgrade is complete. I'm not sure if that is
> > >>> possible with lucene or not - but perhaps? Another option might be to
> > >>> freeze writes to the lucene index during the rolling upgrade process.
> > >>> Lucene indexes are asynchronous, so this wouldn't necessarily require
> > >>> blocking all puts. But it would require queueing up a lot of updates.
> > >>>
> > >>> -Dan
> > >>>
> > >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
> > >>> wrote:
> > >>>
> > >>>> Hi geode dev,
> > >>>>
> > >>>> I'm working on upgrade lucene to a newer version. (
> > >>>> https://issues.apache.org/jira/browse/GEODE-7309)
> > >>>>
> > >>>> I followed instruction from
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > >>>> Also add some other changes that is needed for lucene 8.2.0.
> > >>>>
> > >>>> I found some problems with tests:
> > >>>> * geode-
> > >>>>   lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > >>>> ribu
> > >>>>   ted/DistributedScoringJUnitTest.java:
> > >>>>
> > >>>>
> > >>>> *
> > >>>> geode-
> > >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > >>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > >>>> ava:
> > >>>> *
> > >>>> geode-
> > >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > >>>> gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > >>>> ed.java:
> > >>>> *
> > >>>> ./geode-
> > >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > >>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > >>>> n.java:
> > >>>> *
> > >>>> ./geode-
> > >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > >>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > >>>> itionRegion.java:
> > >>>>
> > >>>>      -> failed due to
> > >>>> Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > >>>> Format
> > >>>> version is not supported (resource
> > >>>> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > >>>> and
> > >>>> 9). This version of Lucene only supports indexes created with
> > >>>> release
> > >>>> 6.0 and later.
> > >>>>        at
> > >>>> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > >>>> a:21
> > >>>> 3)
> > >>>>        at
> > >>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > >>>> 05)
> > >>>>        at
> > >>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > >>>> 89)
> > >>>>        at
> > >>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> > >>>>        at
> > >>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > >>>> hCom
> > >>>> putingRepository(IndexRepositoryFactory.java:123)
> > >>>>        at
> > >>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > >>>> teIn
> > >>>> dexRepository(IndexRepositoryFactory.java:66)
> > >>>>        at
> > >>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > >>>> .com
> > >>>> puteRepository(PartitionedRepositoryManager.java:151)
> > >>>>        at
> > >>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > >>>> .lam
> > >>>> bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> > >>>>        ... 16 more
> > >>>>
> > >>>>
> > >>>> *
> > >>>> geode-
> > >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > >>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > >>>> lBucketsCreated.java:
> > >>>>
> > >>>>      -> failed with the same exception as previous tests
> > >>>>
> > >>>>
> > >>>> I found this on web
> > >>>>
> > >>>>
> > >>
> > >>
> >
> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > >>>> , but not have an idea how to proceed with that.
> > >>>>
> > >>>> Does anyone has any idea how to fix it?
> > >>>>
> > >>>> BR,
> > >>>> Mario
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>
> >
>

Re: Lucene upgrade

Posted by Xiaojian Zhou <gz...@pivotal.io>.
He tried to upgrade lucene version from current 6.6.4 to 8.2. There're some
challenges. One challenge is the codec changed, which caused the format of
index is also changed.

That's why we did not implement it.

If he resolved the coding challenges, then rolling upgrade will probably
need option-2 to workaround it.

Regards
Gester


On Wed, Nov 6, 2019 at 11:47 AM Jacob Barrett <jb...@pivotal.io> wrote:

> What about “versioning” the region that backs the indexes? Old servers
> with old license would continue to read/write to old region. New servers
> would start re-indexing with the new version. Given the async nature of the
> indexing would the mismatch in indexing for some period of time have an
> impact?
>
> Not an ideal solution but it’s something.
>
> In my previous life we just deleted the indexes and rebuilt them on
> upgrade but that was specific to our application.
>
> -Jake
>
>
> > On Nov 6, 2019, at 11:18 AM, Jason Huynh <jh...@pivotal.io> wrote:
> >
> > Hi Mario,
> >
> > I think there are a few ways to accomplish what Dan was suggesting...Dan
> or
> > other's, please chime in with more options/solutions.
> >
> > 1.) We add some product code/lucene listener to detect whether we have
> old
> > versions of geode and if so, do not write to lucene on the newly updated
> > node until all versions are up to date.
> >
> > 2.)  We document it and provide instructions (and a way) to pause lucene
> > indexing before someone attempts to do a rolling upgrade.
> >
> > I'd prefer option 1 or some other robust solution, because I think
> option 2
> > has many possible issues.
> >
> >
> > -Jason
> >
> >
> >> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo <ma...@est.tech> wrote:
> >>
> >> Hi Dan,
> >>
> >> thanks for suggestions.
> >> I didn't found a way to write lucene in older format. They only support
> >> reading old format indexes with newer version by using lucene-backward-
> >> codec.
> >>
> >> Regarding to freeze writes to the lucene index, that means that we need
> >> to start locators and servers, create lucene index on the server, roll
> >> it to current and then do puts. In this case tests passed. Is it ok?
> >>
> >>
> >> BR,
> >> Mario
> >>
> >>
> >>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> >>> I think the issue probably has to do with doing a rolling upgrade
> >>> from an
> >>> old version of geode (with an old version of lucene) to the new
> >>> version of
> >>> geode.
> >>>
> >>> Geode's lucene integration works by writing the lucene index to a
> >>> colocated
> >>> region. So lucene index data that was generated on one server can be
> >>> replicated or rebalanced to other servers.
> >>>
> >>> I think what may be happening is that data written by a geode member
> >>> with a
> >>> newer version is being read by a geode member with an old version.
> >>> Because
> >>> this is a rolling upgrade test, members with multiple versions will
> >>> be
> >>> running as part of the same cluster.
> >>>
> >>> I think to really fix this rolling upgrade issue we would need to
> >>> somehow
> >>> configure the new version of lucene to write data in the old format,
> >>> at
> >>> least until the rolling upgrade is complete. I'm not sure if that is
> >>> possible with lucene or not - but perhaps? Another option might be to
> >>> freeze writes to the lucene index during the rolling upgrade process.
> >>> Lucene indexes are asynchronous, so this wouldn't necessarily require
> >>> blocking all puts. But it would require queueing up a lot of updates.
> >>>
> >>> -Dan
> >>>
> >>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
> >>> wrote:
> >>>
> >>>> Hi geode dev,
> >>>>
> >>>> I'm working on upgrade lucene to a newer version. (
> >>>> https://issues.apache.org/jira/browse/GEODE-7309)
> >>>>
> >>>> I followed instruction from
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> >>>> Also add some other changes that is needed for lucene 8.2.0.
> >>>>
> >>>> I found some problems with tests:
> >>>> * geode-
> >>>>   lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> >>>> ribu
> >>>>   ted/DistributedScoringJUnitTest.java:
> >>>>
> >>>>
> >>>> *
> >>>> geode-
> >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> >>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> >>>> ava:
> >>>> *
> >>>> geode-
> >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> >>>> gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> >>>> ed.java:
> >>>> *
> >>>> ./geode-
> >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> >>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> >>>> n.java:
> >>>> *
> >>>> ./geode-
> >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> >>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> >>>> itionRegion.java:
> >>>>
> >>>>      -> failed due to
> >>>> Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> >>>> Format
> >>>> version is not supported (resource
> >>>> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> >>>> and
> >>>> 9). This version of Lucene only supports indexes created with
> >>>> release
> >>>> 6.0 and later.
> >>>>        at
> >>>> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> >>>> a:21
> >>>> 3)
> >>>>        at
> >>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> >>>> 05)
> >>>>        at
> >>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> >>>> 89)
> >>>>        at
> >>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> >>>>        at
> >>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> >>>> hCom
> >>>> putingRepository(IndexRepositoryFactory.java:123)
> >>>>        at
> >>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> >>>> teIn
> >>>> dexRepository(IndexRepositoryFactory.java:66)
> >>>>        at
> >>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> >>>> .com
> >>>> puteRepository(PartitionedRepositoryManager.java:151)
> >>>>        at
> >>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> >>>> .lam
> >>>> bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> >>>>        ... 16 more
> >>>>
> >>>>
> >>>> *
> >>>> geode-
> >>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> >>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> >>>> lBucketsCreated.java:
> >>>>
> >>>>      -> failed with the same exception as previous tests
> >>>>
> >>>>
> >>>> I found this on web
> >>>>
> >>>>
> >>
> >>
> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> >>>> , but not have an idea how to proceed with that.
> >>>>
> >>>> Does anyone has any idea how to fix it?
> >>>>
> >>>> BR,
> >>>> Mario
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>
>

Re: Lucene upgrade

Posted by Jacob Barrett <jb...@pivotal.io>.
What about “versioning” the region that backs the indexes? Old servers with old license would continue to read/write to old region. New servers would start re-indexing with the new version. Given the async nature of the indexing would the mismatch in indexing for some period of time have an impact?

Not an ideal solution but it’s something. 

In my previous life we just deleted the indexes and rebuilt them on upgrade but that was specific to our application.

-Jake


> On Nov 6, 2019, at 11:18 AM, Jason Huynh <jh...@pivotal.io> wrote:
> 
> Hi Mario,
> 
> I think there are a few ways to accomplish what Dan was suggesting...Dan or
> other's, please chime in with more options/solutions.
> 
> 1.) We add some product code/lucene listener to detect whether we have old
> versions of geode and if so, do not write to lucene on the newly updated
> node until all versions are up to date.
> 
> 2.)  We document it and provide instructions (and a way) to pause lucene
> indexing before someone attempts to do a rolling upgrade.
> 
> I'd prefer option 1 or some other robust solution, because I think option 2
> has many possible issues.
> 
> 
> -Jason
> 
> 
>> On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo <ma...@est.tech> wrote:
>> 
>> Hi Dan,
>> 
>> thanks for suggestions.
>> I didn't found a way to write lucene in older format. They only support
>> reading old format indexes with newer version by using lucene-backward-
>> codec.
>> 
>> Regarding to freeze writes to the lucene index, that means that we need
>> to start locators and servers, create lucene index on the server, roll
>> it to current and then do puts. In this case tests passed. Is it ok?
>> 
>> 
>> BR,
>> Mario
>> 
>> 
>>> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
>>> I think the issue probably has to do with doing a rolling upgrade
>>> from an
>>> old version of geode (with an old version of lucene) to the new
>>> version of
>>> geode.
>>> 
>>> Geode's lucene integration works by writing the lucene index to a
>>> colocated
>>> region. So lucene index data that was generated on one server can be
>>> replicated or rebalanced to other servers.
>>> 
>>> I think what may be happening is that data written by a geode member
>>> with a
>>> newer version is being read by a geode member with an old version.
>>> Because
>>> this is a rolling upgrade test, members with multiple versions will
>>> be
>>> running as part of the same cluster.
>>> 
>>> I think to really fix this rolling upgrade issue we would need to
>>> somehow
>>> configure the new version of lucene to write data in the old format,
>>> at
>>> least until the rolling upgrade is complete. I'm not sure if that is
>>> possible with lucene or not - but perhaps? Another option might be to
>>> freeze writes to the lucene index during the rolling upgrade process.
>>> Lucene indexes are asynchronous, so this wouldn't necessarily require
>>> blocking all puts. But it would require queueing up a lot of updates.
>>> 
>>> -Dan
>>> 
>>> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
>>> wrote:
>>> 
>>>> Hi geode dev,
>>>> 
>>>> I'm working on upgrade lucene to a newer version. (
>>>> https://issues.apache.org/jira/browse/GEODE-7309)
>>>> 
>>>> I followed instruction from
>>>> 
>> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
>>>> Also add some other changes that is needed for lucene 8.2.0.
>>>> 
>>>> I found some problems with tests:
>>>> * geode-
>>>>   lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
>>>> ribu
>>>>   ted/DistributedScoringJUnitTest.java:
>>>> 
>>>> 
>>>> *
>>>> geode-
>>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
>>>> ava:
>>>> *
>>>> geode-
>>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>>>> gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
>>>> ed.java:
>>>> *
>>>> ./geode-
>>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
>>>> n.java:
>>>> *
>>>> ./geode-
>>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>>>> gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
>>>> itionRegion.java:
>>>> 
>>>>      -> failed due to
>>>> Caused by: org.apache.lucene.index.IndexFormatTooOldException:
>>>> Format
>>>> version is not supported (resource
>>>> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
>>>> and
>>>> 9). This version of Lucene only supports indexes created with
>>>> release
>>>> 6.0 and later.
>>>>        at
>>>> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
>>>> a:21
>>>> 3)
>>>>        at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
>>>> 05)
>>>>        at
>>>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
>>>> 89)
>>>>        at
>>>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
>>>>        at
>>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
>>>> hCom
>>>> putingRepository(IndexRepositoryFactory.java:123)
>>>>        at
>>>> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
>>>> teIn
>>>> dexRepository(IndexRepositoryFactory.java:66)
>>>>        at
>>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
>>>> .com
>>>> puteRepository(PartitionedRepositoryManager.java:151)
>>>>        at
>>>> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
>>>> .lam
>>>> bda$computeRepository$1(PartitionedRepositoryManager.java:170)
>>>>        ... 16 more
>>>> 
>>>> 
>>>> *
>>>> geode-
>>>> lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
>>>> gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
>>>> lBucketsCreated.java:
>>>> 
>>>>      -> failed with the same exception as previous tests
>>>> 
>>>> 
>>>> I found this on web
>>>> 
>>>> 
>> 
>> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
>>>> , but not have an idea how to proceed with that.
>>>> 
>>>> Does anyone has any idea how to fix it?
>>>> 
>>>> BR,
>>>> Mario
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 

Re: Lucene upgrade

Posted by Jason Huynh <jh...@pivotal.io>.
Hi Mario,

I think there are a few ways to accomplish what Dan was suggesting...Dan or
other's, please chime in with more options/solutions.

1.) We add some product code/lucene listener to detect whether we have old
versions of geode and if so, do not write to lucene on the newly updated
node until all versions are up to date.

2.)  We document it and provide instructions (and a way) to pause lucene
indexing before someone attempts to do a rolling upgrade.

I'd prefer option 1 or some other robust solution, because I think option 2
has many possible issues.


-Jason


On Wed, Nov 6, 2019 at 1:03 AM Mario Kevo <ma...@est.tech> wrote:

> Hi Dan,
>
> thanks for suggestions.
> I didn't found a way to write lucene in older format. They only support
> reading old format indexes with newer version by using lucene-backward-
> codec.
>
> Regarding to freeze writes to the lucene index, that means that we need
> to start locators and servers, create lucene index on the server, roll
> it to current and then do puts. In this case tests passed. Is it ok?
>
>
> BR,
> Mario
>
>
> On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> > I think the issue probably has to do with doing a rolling upgrade
> > from an
> > old version of geode (with an old version of lucene) to the new
> > version of
> > geode.
> >
> > Geode's lucene integration works by writing the lucene index to a
> > colocated
> > region. So lucene index data that was generated on one server can be
> > replicated or rebalanced to other servers.
> >
> > I think what may be happening is that data written by a geode member
> > with a
> > newer version is being read by a geode member with an old version.
> > Because
> > this is a rolling upgrade test, members with multiple versions will
> > be
> > running as part of the same cluster.
> >
> > I think to really fix this rolling upgrade issue we would need to
> > somehow
> > configure the new version of lucene to write data in the old format,
> > at
> > least until the rolling upgrade is complete. I'm not sure if that is
> > possible with lucene or not - but perhaps? Another option might be to
> > freeze writes to the lucene index during the rolling upgrade process.
> > Lucene indexes are asynchronous, so this wouldn't necessarily require
> > blocking all puts. But it would require queueing up a lot of updates.
> >
> > -Dan
> >
> > On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
> > wrote:
> >
> > > Hi geode dev,
> > >
> > > I'm working on upgrade lucene to a newer version. (
> > > https://issues.apache.org/jira/browse/GEODE-7309)
> > >
> > > I followed instruction from
> > >
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > > Also add some other changes that is needed for lucene 8.2.0.
> > >
> > > I found some problems with tests:
> > >  * geode-
> > >    lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > > ribu
> > >    ted/DistributedScoringJUnitTest.java:
> > >
> > >
> > >  *
> > > geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > > ava:
> > >  *
> > > geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > > ed.java:
> > >  *
> > > ./geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > > n.java:
> > >  *
> > > ./geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > > itionRegion.java:
> > >
> > >       -> failed due to
> > > Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > > Format
> > > version is not supported (resource
> > > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > > and
> > > 9). This version of Lucene only supports indexes created with
> > > release
> > > 6.0 and later.
> > >         at
> > > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > > a:21
> > > 3)
> > >         at
> > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > > 05)
> > >         at
> > > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > > 89)
> > >         at
> > > org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> > >         at
> > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > > hCom
> > > putingRepository(IndexRepositoryFactory.java:123)
> > >         at
> > > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > > teIn
> > > dexRepository(IndexRepositoryFactory.java:66)
> > >         at
> > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > > .com
> > > puteRepository(PartitionedRepositoryManager.java:151)
> > >         at
> > > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > > .lam
> > > bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> > >         ... 16 more
> > >
> > >
> > >  *
> > > geode-
> > > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > > lBucketsCreated.java:
> > >
> > >       -> failed with the same exception as previous tests
> > >
> > >
> > > I found this on web
> > >
> > >
>
> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > > , but not have an idea how to proceed with that.
> > >
> > > Does anyone has any idea how to fix it?
> > >
> > > BR,
> > > Mario
> > >
> > >
> > >
> > >
> > >
>

Re: Lucene upgrade

Posted by Mario Kevo <ma...@est.tech>.
Hi Dan,

thanks for suggestions.
I didn't found a way to write lucene in older format. They only support
reading old format indexes with newer version by using lucene-backward-
codec.

Regarding to freeze writes to the lucene index, that means that we need
to start locators and servers, create lucene index on the server, roll
it to current and then do puts. In this case tests passed. Is it ok?


BR,
Mario


On Mon, 2019-11-04 at 17:07 -0800, Dan Smith wrote:
> I think the issue probably has to do with doing a rolling upgrade
> from an
> old version of geode (with an old version of lucene) to the new
> version of
> geode.
> 
> Geode's lucene integration works by writing the lucene index to a
> colocated
> region. So lucene index data that was generated on one server can be
> replicated or rebalanced to other servers.
> 
> I think what may be happening is that data written by a geode member
> with a
> newer version is being read by a geode member with an old version.
> Because
> this is a rolling upgrade test, members with multiple versions will
> be
> running as part of the same cluster.
> 
> I think to really fix this rolling upgrade issue we would need to
> somehow
> configure the new version of lucene to write data in the old format,
> at
> least until the rolling upgrade is complete. I'm not sure if that is
> possible with lucene or not - but perhaps? Another option might be to
> freeze writes to the lucene index during the rolling upgrade process.
> Lucene indexes are asynchronous, so this wouldn't necessarily require
> blocking all puts. But it would require queueing up a lot of updates.
> 
> -Dan
> 
> On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech>
> wrote:
> 
> > Hi geode dev,
> > 
> > I'm working on upgrade lucene to a newer version. (
> > https://issues.apache.org/jira/browse/GEODE-7309)
> > 
> > I followed instruction from
> > 
https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> > Also add some other changes that is needed for lucene 8.2.0.
> > 
> > I found some problems with tests:
> >  * geode-
> >    lucene/src/test/java/org/apache/geode/cache/lucene/internal/dist
> > ribu
> >    ted/DistributedScoringJUnitTest.java:
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.j
> > ava:
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRoll
> > ed.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegio
> > n.java:
> >  *
> > ./geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPart
> > itionRegion.java:
> > 
> >       -> failed due to
> > Caused by: org.apache.lucene.index.IndexFormatTooOldException:
> > Format
> > version is not supported (resource
> > BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7
> > and
> > 9). This version of Lucene only supports indexes created with
> > release
> > 6.0 and later.
> >         at
> > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.jav
> > a:21
> > 3)
> >         at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:3
> > 05)
> >         at
> > org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:2
> > 89)
> >         at
> > org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
> >         at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finis
> > hCom
> > putingRepository(IndexRepositoryFactory.java:123)
> >         at
> > org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.compu
> > teIn
> > dexRepository(IndexRepositoryFactory.java:66)
> >         at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .com
> > puteRepository(PartitionedRepositoryManager.java:151)
> >         at
> > org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager
> > .lam
> > bda$computeRepository$1(PartitionedRepositoryManager.java:170)
> >         ... 16 more
> > 
> > 
> >  *
> > geode-
> > lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUp
> > gradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAl
> > lBucketsCreated.java:
> > 
> >       -> failed with the same exception as previous tests
> > 
> > 
> > I found this on web
> > 
> > 
https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> > , but not have an idea how to proceed with that.
> > 
> > Does anyone has any idea how to fix it?
> > 
> > BR,
> > Mario
> > 
> > 
> > 
> > 
> > 

Re: Lucene upgrade

Posted by Dan Smith <ds...@pivotal.io>.
I think the issue probably has to do with doing a rolling upgrade from an
old version of geode (with an old version of lucene) to the new version of
geode.

Geode's lucene integration works by writing the lucene index to a colocated
region. So lucene index data that was generated on one server can be
replicated or rebalanced to other servers.

I think what may be happening is that data written by a geode member with a
newer version is being read by a geode member with an old version. Because
this is a rolling upgrade test, members with multiple versions will be
running as part of the same cluster.

I think to really fix this rolling upgrade issue we would need to somehow
configure the new version of lucene to write data in the old format, at
least until the rolling upgrade is complete. I'm not sure if that is
possible with lucene or not - but perhaps? Another option might be to
freeze writes to the lucene index during the rolling upgrade process.
Lucene indexes are asynchronous, so this wouldn't necessarily require
blocking all puts. But it would require queueing up a lot of updates.

-Dan

On Mon, Nov 4, 2019 at 12:05 AM Mario Kevo <ma...@est.tech> wrote:

> Hi geode dev,
>
> I'm working on upgrade lucene to a newer version. (
> https://issues.apache.org/jira/browse/GEODE-7309)
>
> I followed instruction from
> https://cwiki.apache.org/confluence/display/GEODE/Upgrading+to+Lucene+7.1.0
> Also add some other changes that is needed for lucene 8.2.0.
>
> I found some problems with tests:
>  * geode-
>    lucene/src/test/java/org/apache/geode/cache/lucene/internal/distribu
>    ted/DistributedScoringJUnitTest.java:
>
>
>  *
> geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOver.java:
>  *
> geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultAfterTwoLocatorsWithTwoServersAreRolled.java:
>  *
> ./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPartitionRegion.java:
>  *
> ./geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterServersRollOverOnPersistentPartitionRegion.java:
>
>       -> failed due to
> Caused by: org.apache.lucene.index.IndexFormatTooOldException: Format
> version is not supported (resource
> BufferedChecksumIndexInput(segments_1)): 6 (needs to be between 7 and
> 9). This version of Lucene only supports indexes created with release
> 6.0 and later.
>         at
> org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:21
> 3)
>         at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:305)
>         at
> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>         at
> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:846)
>         at
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.finishCom
> putingRepository(IndexRepositoryFactory.java:123)
>         at
> org.apache.geode.cache.lucene.internal.IndexRepositoryFactory.computeIn
> dexRepository(IndexRepositoryFactory.java:66)
>         at
> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.com
> puteRepository(PartitionedRepositoryManager.java:151)
>         at
> org.apache.geode.cache.lucene.internal.PartitionedRepositoryManager.lam
> bda$computeRepository$1(PartitionedRepositoryManager.java:170)
>         ... 16 more
>
>
>  *
> geode-lucene/src/upgradeTest/java/org/apache/geode/cache/lucene/RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:
>
>       -> failed with the same exception as previous tests
>
>
> I found this on web
>
> https://stackoverflow.com/questions/47454434/solr-indexing-issue-after-upgrading-from-4-7-to-7-1
> , but not have an idea how to proceed with that.
>
> Does anyone has any idea how to fix it?
>
> BR,
> Mario
>
>
>
>
>