You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by VILA Jean-Louis <Je...@sword-group.com> on 2020/05/19 08:19:18 UTC

Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Dear all,

We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
                Context :
. Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252
. We can't reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.

Our upgrading strategy is based on indexUpgrader Tool.
                5.4.1 -> 5.5.5 : Ok
                5.5.5 -> 6.6.6 : Ok
                6.6.6 -> 7.7.3 : ok
                Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :

Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
        at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
        at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
        at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
        at org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)

But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
0.00% total deletions; 50756501 documents; 0 deleteions
Segments file=segments_2nz0 numSegments=1 version=7.7.3 id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
  1 of 1: name=_2rr9t maxDoc=50756501
    version=7.7.3
    id=9pubpiwgt38rzyxr7litvgcu5
    codec=Lucene70
    compound=false
    numFiles=10
    size (MB)=338,143.905
    diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
    no deletions
    test: open reader.........OK [took 2.779 sec]

When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.

What are my options?

Many many thanks for your help,
Jean-Louis



Jean-Louis Vila, PhD
Directeur technique
Sword SAS

d         +33 4 72 85 37 60
m        +33 6 17 81 14 69
t          +33 4 72 85 37 40
e         jean-louis.vila@sword-group.com<ma...@sword-group.com>

9 avenue Charles de Gaulle
69771, Saint Didier au Mont d'Or
France
www.sword-group.com<http://www.sword-group.com/>
P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.


RE: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by VILA Jean-Louis <Je...@sword-group.com>.
Thanks Walter, but I can't imagine that will work because if this could work, then the index Upgrader should work and it is not the case ☹
Because of the format, the index iv6 can't be rewrite whatever the process you use (add replica, optimize, etc...)
The only way I have is the full reindexing! 260 000 000 docs / 3TB indexes, a specific preprocessing, it will be very very long......


-----Original Message-----
From: Walter Underwood <wu...@wunderwood.org> 
Sent: mardi 19 mai 2020 17:43
To: solr-user@lucene.apache.org
Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Hmm, might be able to hack this with optimize (forced merge).

First, you would have to add enough extra documents to force a rewrite of all segments. That might be as many documents as are already in the index. You could set a “fake:true” field and filter them out with an fq. Or make sure they have no searchable text.

After adding all those, run optimize. This should rewrite all the segments in the new format.

Finally, delete all the extra documents. Might want to do another optimize after that.

No guarantee that this desperate hack will work.

wunder
Walter Underwood
wunder@wunderwood.org
https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C630d6fc16a954cac9c6008d7fc0b587b%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637254997968577639&amp;sdata=yPhyNyGjjJhKgu%2Bmvkp7%2Fwsx8%2FAR8x5rEnmWRjgmSv8%3D&amp;reserved=0  (my blog)

> On May 19, 2020, at 6:21 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
> 
> Many thanks for your answers Erik. 
> 
> Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.
> 
> More, the check tool just shows v7.7.3 index and there is no mention 
> about "real" segment version which v6! So forbid to open v7 lucene 
> indexes upgraded from v6, is quiet brutal and the rule about that we 
> can migrate only from previous major version is not completely true 
> :-( I'll stay into v7.7.3
> 
> Thanks again,
> Jean-Louis
> 
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com>
> Sent: mardi 19 mai 2020 15:00
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
> 
> This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.
> 
> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.
> 
> IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:
> 
> “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”
> 
> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
> 
> Best,
> Erick
> 
> 
>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>> 
>> Dear all,
>> 
>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>>               Context :
>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We 
>> can't reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
>> 
>> Our upgrading strategy is based on indexUpgrader Tool.
>>               5.4.1 -> 5.5.5 : Ok
>>               5.5.5 -> 6.6.6 : Ok
>>               6.6.6 -> 7.7.3 : ok
>>               Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
>> 
>> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>>       at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>       at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>       at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>       at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>       at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>       at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>       at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>       at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>       at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>       at
>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>> 
>> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments
>> file=segments_2nz0 numSegments=1 version=7.7.3 
>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>> 1 of 1: name=_2rr9t maxDoc=50756501
>>   version=7.7.3
>>   id=9pubpiwgt38rzyxr7litvgcu5
>>   codec=Lucene70
>>   compound=false
>>   numFiles=10
>>   size (MB)=338,143.905
>>   diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>>   no deletions
>>   test: open reader.........OK [took 2.779 sec]
>> 
>> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
>> 
>> What are my options?
>> 
>> Many many thanks for your help,
>> Jean-Louis
>> 
>> 
>> 
>> Jean-Louis Vila, PhD
>> Directeur technique
>> Sword SAS
>> 
>> d         +33 4 72 85 37 60
>> m        +33 6 17 81 14 69
>> t          +33 4 72 85 37 40
>> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
>> 
>> 9 avenue Charles de Gaulle
>> 69771, Saint Didier au Mont d'Or
>> France
>> http://www.sword-group.com/<http://www.sword-group.com/>
>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
>> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>> 
> 


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by Erick Erickson <er...@gmail.com>.
Jean-Louis:

One of the great advantages of open source is that it allows people to look at a problem with “fresh eyes” and add to the project in a way that help other people who aren’t steeped in the arcana of Lucene/Solr. So it’d be great if you could go ahead and make a patch and JIRA to put this information in a place that makes the most sense to someone coming in fresh.

And I fully appreciate that “it’ in the reference guide” isn’t adequate, it’s over 1,300 pages last I knew. So putting this information somewhere that someone like yourself is likely to find it is the best option…

If you create a JIRA and patch, use “@erick” in the comment and I’ll see it and we can incorporate the info.

Best,
Erick.

> On May 19, 2020, at 2:57 PM, VILA Jean-Louis <Je...@sword-group.com> wrote:
> 
> Erick
> 
> I just suggest a dedicated page to upgrade path because reading the page about indexUpgraderTool, we understand well that we can’t upgrade in one phase but 6->7->8 must be made and nowhere it is specified that from Lucene 6, the segments are marked V6 for ever. 
> Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my point of view, we must speak about “since Lucene 6, version is compatible previous major version of an index” not upgrading. The term is ambiguous.
> The thinks must be clear, I understand the problem :-)
> Jean louis 
> 
>> Le 19 mai 2020 à 19:03, Erick Erickson <er...@gmail.com> a écrit :
>> 
>> Jean-Louis:
>> 
>> One explication is here: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&amp;reserved=0, but then again the reference guide is very long, I’m not sure how to make it more findable. Or, for that matter, whether it should be part of the IndexUpgraderTool section or not. Please feel free to suggest (even better, submit a patch) if you can think of a place it’d be more easily findable. It’s always useful to have someone with fresh eyes weigh in.
>> 
>> Optimize won’t work. Under the covers, optimize is just a merge. It uses the exact same low-level merging code that background merging uses, including preserving the markers in the segment files. That’s why the Lucene devs use “forceMerge” rather than “optimize”, the latter is easy to interpret as something that does more than it really does.
>> 
>> This is also the same code that IndexUpgraderTool uses too for that matter. IndexUpgraderTool is, really, just a forceMerge down to one segment, which is all optimize is (assuming you specify maxSegments=1).
>> 
>> Best,
>> Erick
>> 
>>> On May 19, 2020, at 11:42 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>>> 
>>> Hmm, might be able to hack this with optimize (forced merge).
>>> 
>>> First, you would have to add enough extra documents to force a rewrite of all segments. That might be as many documents as are already in the index. You could set a “fake:true” field and filter them out with an fq. Or make sure they have no searchable text.
>>> 
>>> After adding all those, run optimize. This should rewrite all the segments in the new format.
>>> 
>>> Finally, delete all the extra documents. Might want to do another optimize after that.
>>> 
>>> No guarantee that this desperate hack will work.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&amp;reserved=0  (my blog)
>>> 
>>>>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>>>> 
>>>> Many thanks for your answers Erik. 
>>>> 
>>>> Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.
>>>> 
>>>> More, the check tool just shows v7.7.3 index and there is no mention about "real" segment version which v6! So forbid to open v7 lucene indexes upgraded from v6, is quiet brutal and the rule about that we can migrate only from previous major version is not completely true :-(
>>>> I'll stay into v7.7.3
>>>> 
>>>> Thanks again,
>>>> Jean-Louis
>>>> 
>>>> -----Original Message-----
>>>> From: Erick Erickson <er...@gmail.com> 
>>>> Sent: mardi 19 mai 2020 15:00
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>>>> 
>>>> This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.
>>>> 
>>>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.
>>>> 
>>>> IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:
>>>> 
>>>> “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”
>>>> 
>>>> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> 
>>>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>>>>> 
>>>>> Dear all,
>>>>> 
>>>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>>>>>            Context :
>>>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>>>>> reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
>>>>> 
>>>>> Our upgrading strategy is based on indexUpgrader Tool.
>>>>>            5.4.1 -> 5.5.5 : Ok
>>>>>            5.5.5 -> 6.6.6 : Ok
>>>>>            6.6.6 -> 7.7.3 : ok
>>>>>            Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
>>>>> 
>>>>> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>>>>>    at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>>>>    at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>>>>    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>>>>    at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>>>>    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>>>>    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>>>>    at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>>>>    at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>>>>    at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>>>>    at 
>>>>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>>>>> 
>>>>> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
>>>>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
>>>>> file=segments_2nz0 numSegments=1 version=7.7.3 
>>>>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>>>>> 1 of 1: name=_2rr9t maxDoc=50756501
>>>>> version=7.7.3
>>>>> id=9pubpiwgt38rzyxr7litvgcu5
>>>>> codec=Lucene70
>>>>> compound=false
>>>>> numFiles=10
>>>>> size (MB)=338,143.905
>>>>> diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>>>>> no deletions
>>>>> test: open reader.........OK [took 2.779 sec]
>>>>> 
>>>>> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
>>>>> 
>>>>> What are my options?
>>>>> 
>>>>> Many many thanks for your help,
>>>>> Jean-Louis
>>>>> 
>>>>> 
>>>>> 
>>>>> Jean-Louis Vila, PhD
>>>>> Directeur technique
>>>>> Sword SAS
>>>>> 
>>>>> d         +33 4 72 85 37 60
>>>>> m        +33 6 17 81 14 69
>>>>> t          +33 4 72 85 37 40
>>>>> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
>>>>> 
>>>>> 9 avenue Charles de Gaulle
>>>>> 69771, Saint Didier au Mont d'Or
>>>>> France
>>>>> http://www.sword-group.com/<http://www.sword-group.com/>
>>>>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
>>>>> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>>>>> 
>>>> 
>>> 
>> 


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by VILA Jean-Louis <Je...@sword-group.com>.
Erick

I just suggest a dedicated page to upgrade path because reading the page about indexUpgraderTool, we understand well that we can’t upgrade in one phase but 6->7->8 must be made and nowhere it is specified that from Lucene 6, the segments are marked V6 for ever. 
Naively, by transitivity, the upgrade path 6>7>8 is quiet natural. From my point of view, we must speak about “since Lucene 6, version is compatible previous major version of an index” not upgrading. The term is ambiguous.
The thinks must be clear, I understand the problem :-)
Jean louis 

> Le 19 mai 2020 à 19:03, Erick Erickson <er...@gmail.com> a écrit :
> 
> Jean-Louis:
> 
> One explication is here: https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F8_5%2Findexupgrader-tool.html&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=HapOVXDPluPWEC%2BSAVpTJju94od0y4X%2BNNoRd%2Beh2TE%3D&amp;reserved=0, but then again the reference guide is very long, I’m not sure how to make it more findable. Or, for that matter, whether it should be part of the IndexUpgraderTool section or not. Please feel free to suggest (even better, submit a patch) if you can think of a place it’d be more easily findable. It’s always useful to have someone with fresh eyes weigh in.
> 
> Optimize won’t work. Under the covers, optimize is just a merge. It uses the exact same low-level merging code that background merging uses, including preserving the markers in the segment files. That’s why the Lucene devs use “forceMerge” rather than “optimize”, the latter is easy to interpret as something that does more than it really does.
> 
> This is also the same code that IndexUpgraderTool uses too for that matter. IndexUpgraderTool is, really, just a forceMerge down to one segment, which is all optimize is (assuming you specify maxSegments=1).
> 
> Best,
> Erick
> 
>> On May 19, 2020, at 11:42 AM, Walter Underwood <wu...@wunderwood.org> wrote:
>> 
>> Hmm, might be able to hack this with optimize (forced merge).
>> 
>> First, you would have to add enough extra documents to force a rewrite of all segments. That might be as many documents as are already in the index. You could set a “fake:true” field and filter them out with an fq. Or make sure they have no searchable text.
>> 
>> After adding all those, run optimize. This should rewrite all the segments in the new format.
>> 
>> Finally, delete all the extra documents. Might want to do another optimize after that.
>> 
>> No guarantee that this desperate hack will work.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7CJean-Louis.VILA%40sword-group.com%7C26ae67c92e4149a2eb7c08d7fc167c6f%7C6adf23d8eabe44c8b68a0b8fb7aacef9%7C1%7C0%7C637255045819888866&amp;sdata=uLAG8jtE15ydynynxEgKEEhOeng08DdpKgaKU81RB%2Bk%3D&amp;reserved=0  (my blog)
>> 
>>>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>>> 
>>> Many thanks for your answers Erik. 
>>> 
>>> Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.
>>> 
>>> More, the check tool just shows v7.7.3 index and there is no mention about "real" segment version which v6! So forbid to open v7 lucene indexes upgraded from v6, is quiet brutal and the rule about that we can migrate only from previous major version is not completely true :-(
>>> I'll stay into v7.7.3
>>> 
>>> Thanks again,
>>> Jean-Louis
>>> 
>>> -----Original Message-----
>>> From: Erick Erickson <er...@gmail.com> 
>>> Sent: mardi 19 mai 2020 15:00
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>>> 
>>> This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.
>>> 
>>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.
>>> 
>>> IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:
>>> 
>>> “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”
>>> 
>>> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
>>> 
>>> Best,
>>> Erick
>>> 
>>> 
>>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>>>>             Context :
>>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>>>> reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
>>>> 
>>>> Our upgrading strategy is based on indexUpgrader Tool.
>>>>             5.4.1 -> 5.5.5 : Ok
>>>>             5.5.5 -> 6.6.6 : Ok
>>>>             6.6.6 -> 7.7.3 : ok
>>>>             Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
>>>> 
>>>> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>>>>     at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>>>     at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>>>     at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>>>     at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>>>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>>>     at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>>>     at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>>>     at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>>>     at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>>>     at 
>>>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>>>> 
>>>> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
>>>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
>>>> file=segments_2nz0 numSegments=1 version=7.7.3 
>>>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>>>> 1 of 1: name=_2rr9t maxDoc=50756501
>>>> version=7.7.3
>>>> id=9pubpiwgt38rzyxr7litvgcu5
>>>> codec=Lucene70
>>>> compound=false
>>>> numFiles=10
>>>> size (MB)=338,143.905
>>>> diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>>>> no deletions
>>>> test: open reader.........OK [took 2.779 sec]
>>>> 
>>>> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
>>>> 
>>>> What are my options?
>>>> 
>>>> Many many thanks for your help,
>>>> Jean-Louis
>>>> 
>>>> 
>>>> 
>>>> Jean-Louis Vila, PhD
>>>> Directeur technique
>>>> Sword SAS
>>>> 
>>>> d         +33 4 72 85 37 60
>>>> m        +33 6 17 81 14 69
>>>> t          +33 4 72 85 37 40
>>>> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
>>>> 
>>>> 9 avenue Charles de Gaulle
>>>> 69771, Saint Didier au Mont d'Or
>>>> France
>>>> http://www.sword-group.com/<http://www.sword-group.com/>
>>>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
>>>> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>>>> 
>>> 
>> 
> 

Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by Erick Erickson <er...@gmail.com>.
Jean-Louis:

One explication is here: https://lucene.apache.org/solr/guide/8_5/indexupgrader-tool.html, but then again the reference guide is very long, I’m not sure how to make it more findable. Or, for that matter, whether it should be part of the IndexUpgraderTool section or not. Please feel free to suggest (even better, submit a patch) if you can think of a place it’d be more easily findable. It’s always useful to have someone with fresh eyes weigh in.

Optimize won’t work. Under the covers, optimize is just a merge. It uses the exact same low-level merging code that background merging uses, including preserving the markers in the segment files. That’s why the Lucene devs use “forceMerge” rather than “optimize”, the latter is easy to interpret as something that does more than it really does.

This is also the same code that IndexUpgraderTool uses too for that matter. IndexUpgraderTool is, really, just a forceMerge down to one segment, which is all optimize is (assuming you specify maxSegments=1).

Best,
Erick

> On May 19, 2020, at 11:42 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> Hmm, might be able to hack this with optimize (forced merge).
> 
> First, you would have to add enough extra documents to force a rewrite of all segments. That might be as many documents as are already in the index. You could set a “fake:true” field and filter them out with an fq. Or make sure they have no searchable text.
> 
> After adding all those, run optimize. This should rewrite all the segments in the new format.
> 
> Finally, delete all the extra documents. Might want to do another optimize after that.
> 
> No guarantee that this desperate hack will work.
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 19, 2020, at 6:21 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>> 
>> Many thanks for your answers Erik. 
>> 
>> Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.
>> 
>> More, the check tool just shows v7.7.3 index and there is no mention about "real" segment version which v6! So forbid to open v7 lucene indexes upgraded from v6, is quiet brutal and the rule about that we can migrate only from previous major version is not completely true :-(
>> I'll stay into v7.7.3
>> 
>> Thanks again,
>> Jean-Louis
>> 
>> -----Original Message-----
>> From: Erick Erickson <er...@gmail.com> 
>> Sent: mardi 19 mai 2020 15:00
>> To: solr-user@lucene.apache.org
>> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
>> 
>> This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.
>> 
>> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.
>> 
>> IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:
>> 
>> “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”
>> 
>> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
>> 
>> Best,
>> Erick
>> 
>> 
>>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>>> 
>>> Dear all,
>>> 
>>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>>>              Context :
>>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>>> reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
>>> 
>>> Our upgrading strategy is based on indexUpgrader Tool.
>>>              5.4.1 -> 5.5.5 : Ok
>>>              5.5.5 -> 6.6.6 : Ok
>>>              6.6.6 -> 7.7.3 : ok
>>>              Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
>>> 
>>> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>>>      at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>>      at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>>      at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>>      at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>>      at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>>      at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>>      at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>>      at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>>      at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>>      at 
>>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>>> 
>>> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
>>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
>>> file=segments_2nz0 numSegments=1 version=7.7.3 
>>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>>> 1 of 1: name=_2rr9t maxDoc=50756501
>>>  version=7.7.3
>>>  id=9pubpiwgt38rzyxr7litvgcu5
>>>  codec=Lucene70
>>>  compound=false
>>>  numFiles=10
>>>  size (MB)=338,143.905
>>>  diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>>>  no deletions
>>>  test: open reader.........OK [took 2.779 sec]
>>> 
>>> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
>>> 
>>> What are my options?
>>> 
>>> Many many thanks for your help,
>>> Jean-Louis
>>> 
>>> 
>>> 
>>> Jean-Louis Vila, PhD
>>> Directeur technique
>>> Sword SAS
>>> 
>>> d         +33 4 72 85 37 60
>>> m        +33 6 17 81 14 69
>>> t          +33 4 72 85 37 40
>>> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
>>> 
>>> 9 avenue Charles de Gaulle
>>> 69771, Saint Didier au Mont d'Or
>>> France
>>> http://www.sword-group.com/<http://www.sword-group.com/>
>>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
>>> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>>> 
>> 
> 


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by Walter Underwood <wu...@wunderwood.org>.
Hmm, might be able to hack this with optimize (forced merge).

First, you would have to add enough extra documents to force a rewrite of all segments. That might be as many documents as are already in the index. You could set a “fake:true” field and filter them out with an fq. Or make sure they have no searchable text.

After adding all those, run optimize. This should rewrite all the segments in the new format.

Finally, delete all the extra documents. Might want to do another optimize after that.

No guarantee that this desperate hack will work.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 19, 2020, at 6:21 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
> 
> Many thanks for your answers Erik. 
> 
> Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.
> 
> More, the check tool just shows v7.7.3 index and there is no mention about "real" segment version which v6! So forbid to open v7 lucene indexes upgraded from v6, is quiet brutal and the rule about that we can migrate only from previous major version is not completely true :-(
> I'll stay into v7.7.3
> 
> Thanks again,
> Jean-Louis
> 
> -----Original Message-----
> From: Erick Erickson <er...@gmail.com> 
> Sent: mardi 19 mai 2020 15:00
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6
> 
> This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.
> 
> Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.
> 
> IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:
> 
> “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”
> 
> So you’ll have to re-index your corpus with Solr 8 I’m afraid.
> 
> Best,
> Erick
> 
> 
>> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
>> 
>> Dear all,
>> 
>> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>>               Context :
>> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
>> reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
>> 
>> Our upgrading strategy is based on indexUpgrader Tool.
>>               5.4.1 -> 5.5.5 : Ok
>>               5.5.5 -> 6.6.6 : Ok
>>               6.6.6 -> 7.7.3 : ok
>>               Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
>> 
>> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>>       at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>>       at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>>       at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>>       at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>>       at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>>       at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>>       at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>>       at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>>       at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>>       at 
>> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
>> 
>> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
>> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
>> file=segments_2nz0 numSegments=1 version=7.7.3 
>> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>> 1 of 1: name=_2rr9t maxDoc=50756501
>>   version=7.7.3
>>   id=9pubpiwgt38rzyxr7litvgcu5
>>   codec=Lucene70
>>   compound=false
>>   numFiles=10
>>   size (MB)=338,143.905
>>   diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>>   no deletions
>>   test: open reader.........OK [took 2.779 sec]
>> 
>> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
>> 
>> What are my options?
>> 
>> Many many thanks for your help,
>> Jean-Louis
>> 
>> 
>> 
>> Jean-Louis Vila, PhD
>> Directeur technique
>> Sword SAS
>> 
>> d         +33 4 72 85 37 60
>> m        +33 6 17 81 14 69
>> t          +33 4 72 85 37 40
>> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
>> 
>> 9 avenue Charles de Gaulle
>> 69771, Saint Didier au Mont d'Or
>> France
>> http://www.sword-group.com/<http://www.sword-group.com/>
>> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
>> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>> 
> 


RE: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by VILA Jean-Louis <Je...@sword-group.com>.
Many thanks for your answers Erik. 

Effectively, I've read this into many different threads that the migration path will not be guaranteed but, what's strange is that there's no formal information on this impossibility because clearly we can't migrate to v8 if indexes are not "pure" v7 indexes. I understand reason (y =f(x)) but al least a simple documentation about the fact that a Lucene 6 segments can't be upgrade into Lucene 8 would be appreciate.

More, the check tool just shows v7.7.3 index and there is no mention about "real" segment version which v6! So forbid to open v7 lucene indexes upgraded from v6, is quiet brutal and the rule about that we can migrate only from previous major version is not completely true :-(
I'll stay into v7.7.3

Thanks again,
Jean-Louis

-----Original Message-----
From: Erick Erickson <er...@gmail.com> 
Sent: mardi 19 mai 2020 15:00
To: solr-user@lucene.apache.org
Subject: Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.

Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.

IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:

“I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”

So you’ll have to re-index your corpus with Solr 8 I’m afraid.

 Best,
Erick


> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
> 
> Dear all,
> 
> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>                Context :
> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252 . We can't 
> reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
> 
> Our upgrading strategy is based on indexUpgrader Tool.
>                5.4.1 -> 5.5.5 : Ok
>                5.5.5 -> 6.6.6 : Ok
>                6.6.6 -> 7.7.3 : ok
>                Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
> 
> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>        at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>        at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>        at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>        at 
> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
> 
> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
> 0.00% total deletions; 50756501 documents; 0 deleteions Segments 
> file=segments_2nz0 numSegments=1 version=7.7.3 
> id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>  1 of 1: name=_2rr9t maxDoc=50756501
>    version=7.7.3
>    id=9pubpiwgt38rzyxr7litvgcu5
>    codec=Lucene70
>    compound=false
>    numFiles=10
>    size (MB)=338,143.905
>    diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>    no deletions
>    test: open reader.........OK [took 2.779 sec]
> 
> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
> 
> What are my options?
> 
> Many many thanks for your help,
> Jean-Louis
> 
> 
> 
> Jean-Louis Vila, PhD
> Directeur technique
> Sword SAS
> 
> d         +33 4 72 85 37 60
> m        +33 6 17 81 14 69
> t          +33 4 72 85 37 40
> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
> 
> 9 avenue Charles de Gaulle
> 69771, Saint Didier au Mont d'Or
> France
> http://www.sword-group.com/<http://www.sword-group.com/>
> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
> 


Re: Upgrade 5.5.5 to 8.5.1 / Segment stucked in lucene v6

Posted by Erick Erickson <er...@gmail.com>.
This will not work. Lucene has never promised this upgrade path would work, the “one major version back-compat” means that Lucene X has special handling for X-1, but for X-2, all bets are off. Starting with Solr 6, a marker is written into the segments recording the version of Lucene the segment was written with. That marker is preserved through all merges/upgrades/whatever.

Starting with Lucene 8, if any segment has a marker for Lucene 6 (or no marker at all for earlier versions), then Lucene will refuse to open the index.

IndexUpgraderTool and the like simply cannot synthesize the new index format, the most succinct explanation I’ve seen is from Robert Muir:

“I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically. In the norms case IndexWriter needs to re-analyze the text ("re-index") and compute stats to get back the value, so it can be re-encoded. The function is y = f(x) and if x is not available its not possible, so lucene can't do it.”

So you’ll have to re-index your corpus with Solr 8 I’m afraid.

 Best,
Erick


> On May 19, 2020, at 4:19 AM, VILA Jean-Louis <Je...@sword-group.com> wrote:
> 
> Dear all,
> 
> We start to upgrade a huge SolrCloud cluster from 5.4.1 to lastest version 8.5.1.
>                Context :
> . Ubuntu 16.04, 64b, JVM Oracle 8 101 and now OpenJDK 8 252
> . We can't reindex documents because old ones doesn't exist anymore, so no other choices than upgrading indexes.
> 
> Our upgrading strategy is based on indexUpgrader Tool.
>                5.4.1 -> 5.5.5 : Ok
>                5.5.5 -> 6.6.6 : Ok
>                6.6.6 -> 7.7.3 : ok
>                Unable to upgrade 7.7.3 to 8.5.1 : here my problem using 8.5.1, indexUpgrader :
> 
> Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(MMapIndexInput(path="/data2/solr/nodes/node1/solr/insight_dw_shard3_replica_n69/data/index/segments_2nz0"))): This index was initially created with Lucene 6.x while the current version is 8.5.1 and Lucene only supports reading the current and previous major versions.. This version of Lucene only supports indexes created with release 7.0 and later.
>        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:318)
>        at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:289)
>        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:432)
>        at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:429)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:680)
>        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:632)
>        at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:434)
>        at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:285)
>        at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:158)
>        at org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
> 
> But when I check the index version with 7.7.3, the segment seems to be 7.7.3!
> 0.00% total deletions; 50756501 documents; 0 deleteions
> Segments file=segments_2nz0 numSegments=1 version=7.7.3 id=ay2stfke7hwy9gippl8k77tdd userData={commitTimeMSec=1589314850951}
>  1 of 1: name=_2rr9t maxDoc=50756501
>    version=7.7.3
>    id=9pubpiwgt38rzyxr7litvgcu5
>    codec=Lucene70
>    compound=false
>    numFiles=10
>    size (MB)=338,143.905
>    diagnostics = {os=Linux, java.vendor=Oracle Corporation, java.version=1.8.0_101, java.vm.version=25.101-b13, lucene.version=7.7.3, mergeMaxNumSegments=1, os.arch=amd64, java.runtime.version=1.8.0_101-b13, source=merge, mergeFactor=2, os.version=3.13.0-147-generic, timestamp=1589484981711}
>    no deletions
>    test: open reader.........OK [took 2.779 sec]
> 
> When I read the different thread, some people say that when a segment is "marked as v6 lucene index", this mark remains across upgrading, so we are stucked in 7.7.3 version.
> 
> What are my options?
> 
> Many many thanks for your help,
> Jean-Louis
> 
> 
> 
> Jean-Louis Vila, PhD
> Directeur technique
> Sword SAS
> 
> d         +33 4 72 85 37 60
> m        +33 6 17 81 14 69
> t          +33 4 72 85 37 40
> e         jean-louis.vila@sword-group.com<ma...@sword-group.com>
> 
> 9 avenue Charles de Gaulle
> 69771, Saint Didier au Mont d'Or
> France
> www.sword-group.com<http://www.sword-group.com/>
> P Pensez à l'environnement avant d'imprimer ce message /  Please consider the environment before printing this mail note.
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur, merci de le détruire et d'en avertir immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse. Internet ne permettant pas d'assurer l'intégrité de ce message, le Groupe Sword (et ses filiales) décline(nt) toute responsabilité au titre de ce message, dans l'hypothèse où il aurait été modifié, altéré ou falsifié. Le Groupe Sword vous remercie de votre attention.
>