You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Michael Conrad <mi...@newsrx.com> on 2021/10/06 12:46:04 UTC

Is there an easy way to determine Lucene versions for segments?

Hello all,

Is there an easy way to determine Lucene versions for segments?

If we were to do a full reindex, rewriting all segments, would that 
update the segment version to match the current Lucene version in use?

We are working on upgrading from Solr 7.7.3 to Solr 8.x but have 
discovered that several of our collections have segments that are Lucene 6.

-Mike

Re: Is there an easy way to determine Lucene versions for segments?

Posted by David Hastings <ha...@gmail.com>.
Ah, my mistake then.  For some reason, I thought the optimize would
re-index the documents from the old segments into new ones, but I suppose
that would only be possible for stored=true fields.

The more you know!  Again, I always just did a full re-index from scratch
when upgrading.

On Wed, Oct 6, 2021 at 4:03 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/6/21 6:54 AM, Dave wrote:
> > Personally I always do a full reindex when going to a new version, just
> safer and you should always be able to do such at any point.  However if
> you got the time to spare you can do an optimize and it will force the
> segments all into the current version
>
> I'm pretty sure that even if you optimize down to 1 segment, that it
> will still retain the earliest Lucene version that touched any of the
> original segments.
>
> An index that has EVER been touched by a 6.x version will only work up
> to 7.x -- 8.x will refuse to open it.  This is how it's designed.
>
> Thanks,
> Shawn
>
>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/6/21 6:54 AM, Dave wrote:
> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version

I'm pretty sure that even if you optimize down to 1 segment, that it 
will still retain the earliest Lucene version that touched any of the 
original segments.

An index that has EVER been touched by a 6.x version will only work up 
to 7.x -- 8.x will refuse to open it.  This is how it's designed.

Thanks,
Shawn


Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.
I normally run an optimize to 16 segments after each major indexing 
period. Much more than that and the nodes "drag". Waiting on last 
collection to finished optimizing to 1 segment before trying again.

On 10/6/21 1:56 PM, Dave wrote:
> It’s ok. Worst case it just fails and kills the temporary index after you run out of space. Really optimize is almost not even supported (it still works) but a full reindex is always the best bet if you can destroy original and it doesn’t effect anything
>
>> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
>>
>> too late.... it's in progress.
>>
>>> On 10/6/21 9:11 AM, Dave wrote:
>>> Hold on that idea then. An optimize will use three times your index size possibly.
>>>
>>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>> Thanks,
>>>>
>>>> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
>>>>
>>>> -Mike
>>>>
>>>>> On 10/6/21 8:54 AM, Dave wrote:
>>>>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>>>>>
>>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> Is there an easy way to determine Lucene versions for segments?
>>>>>>
>>>>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>>>>>
>>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>>>>>
>>>>>> -Mike


How to use? Upgrade Solr Segments: UpgradeIndexMergePolicy

Posted by Michael Conrad <mi...@newsrx.com>.
It seems to cause my merge requests to become no-ops ?

I trying this for a single smaller collection.

<mergePolicyFactory 
class="org.apache.solr.index.UpgradeIndexMergePolicyFactory">
<str name="wrapped.prefix">mergePolicy</str>
<str 
name="mergePolicy.class">org.apache.solr.index.TieredMergePolicyFactory</str>
<double name="mergePolicy.noCFSRatio">0.1</double>
</mergePolicyFactory>



On 10/8/21 9:40 AM, Rahul Goswami wrote:
> Thanks. I will check this out. I remember going through the code a while
> back where there is an explicit check in one of the codec classes for
> versions older than 7.x and it throws an IndexFormatTooOldException. So I
> doubt this will help.
> But I will be glad to be proved wrong if this works.
>
>
> On Fri, Oct 8, 2021 at 8:48 AM Michael Conrad <mi...@newsrx.com> wrote:
>
>> Would this help?
>>
>> UpgradeIndexMergePolicy
>>
>> This |MergePolicy|
>> <
>> https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/MergePolicy.html>
>>
>> is used for upgrading all existing segments of an index when calling
>> |IndexWriter.forceMerge(int)|
>> <
>> https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/IndexWriter.html#forceMerge-int->.
>>
>> All other methods delegate to the base |MergePolicy| given to the
>> constructor. This allows for an as-cheap-as possible upgrade of an older
>> index by only upgrading segments that are created by previous Lucene
>> versions. forceMerge does no longer really merge; it is just used to
>> "forceMerge" older segment versions away.
>>
>>
>> On 10/7/21 8:46 AM, Rahul Goswami wrote:
>>> Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
>>> were originally written in 5.x and 6.x.
>>> We are scratching our heads to achieve this seamlessly since reindexing
>>> will take several weeks given the size of indexes for many of our
>> customers.
>>> -Rahul
>>>
>>> On Thu, Oct 7, 2021 at 8:35 AM Michael Conrad <mi...@newsrx.com>
>> wrote:
>>>> No, worst case is it closes the index writer and leaves the drive full.
>>>> 20k free space remaining.
>>>>
>>>> On 10/6/21 1:56 PM, Dave wrote:
>>>>> It’s ok. Worst case it just fails and kills the temporary index after
>>>> you run out of space. Really optimize is almost not even supported (it
>>>> still works) but a full reindex is always the best bet if you can
>> destroy
>>>> original and it doesn’t effect anything
>>>>>> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com>
>> wrote:
>>>>>> too late.... it's in progress.
>>>>>>
>>>>>>> On 10/6/21 9:11 AM, Dave wrote:
>>>>>>> Hold on that idea then. An optimize will use three times your index
>>>> size possibly.
>>>>>>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com>
>>>> wrote:
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> I think we'll try the full optimize route as we don't have storage
>> to
>>>> spare for second copies, etc.
>>>>>>>> -Mike
>>>>>>>>
>>>>>>>>> On 10/6/21 8:54 AM, Dave wrote:
>>>>>>>>> Personally I always do a full reindex when going to a new version,
>>>> just safer and you should always be able to do such at any point.
>> However
>>>> if you got the time to spare you can do an optimize and it will force
>> the
>>>> segments all into the current version
>>>>>>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com>
>>>> wrote:
>>>>>>>>>> Hello all,
>>>>>>>>>>
>>>>>>>>>> Is there an easy way to determine Lucene versions for segments?
>>>>>>>>>>
>>>>>>>>>> If we were to do a full reindex, rewriting all segments, would
>> that
>>>> update the segment version to match the current Lucene version in use?
>>>>>>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have
>>>> discovered that several of our collections have segments that are
>> Lucene 6.
>>>>>>>>>> -Mike
>>


Re: Upgrade Solr Segments: UpgradeIndexMergePolicy

Posted by Rahul Goswami <ra...@gmail.com>.
Thanks. I will check this out. I remember going through the code a while
back where there is an explicit check in one of the codec classes for
versions older than 7.x and it throws an IndexFormatTooOldException. So I
doubt this will help.
But I will be glad to be proved wrong if this works.


On Fri, Oct 8, 2021 at 8:48 AM Michael Conrad <mi...@newsrx.com> wrote:

> Would this help?
>
> UpgradeIndexMergePolicy
>
> This |MergePolicy|
> <
> https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/MergePolicy.html>
>
> is used for upgrading all existing segments of an index when calling
> |IndexWriter.forceMerge(int)|
> <
> https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/IndexWriter.html#forceMerge-int->.
>
> All other methods delegate to the base |MergePolicy| given to the
> constructor. This allows for an as-cheap-as possible upgrade of an older
> index by only upgrading segments that are created by previous Lucene
> versions. forceMerge does no longer really merge; it is just used to
> "forceMerge" older segment versions away.
>
>
> On 10/7/21 8:46 AM, Rahul Goswami wrote:
> > Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
> > were originally written in 5.x and 6.x.
> > We are scratching our heads to achieve this seamlessly since reindexing
> > will take several weeks given the size of indexes for many of our
> customers.
> >
> > -Rahul
> >
> > On Thu, Oct 7, 2021 at 8:35 AM Michael Conrad <mi...@newsrx.com>
> wrote:
> >
> >> No, worst case is it closes the index writer and leaves the drive full.
> >> 20k free space remaining.
> >>
> >> On 10/6/21 1:56 PM, Dave wrote:
> >>> It’s ok. Worst case it just fails and kills the temporary index after
> >> you run out of space. Really optimize is almost not even supported (it
> >> still works) but a full reindex is always the best bet if you can
> destroy
> >> original and it doesn’t effect anything
> >>>> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com>
> wrote:
> >>>>
> >>>> too late.... it's in progress.
> >>>>
> >>>>> On 10/6/21 9:11 AM, Dave wrote:
> >>>>> Hold on that idea then. An optimize will use three times your index
> >> size possibly.
> >>>>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com>
> >> wrote:
> >>>>>> Thanks,
> >>>>>>
> >>>>>> I think we'll try the full optimize route as we don't have storage
> to
> >> spare for second copies, etc.
> >>>>>> -Mike
> >>>>>>
> >>>>>>> On 10/6/21 8:54 AM, Dave wrote:
> >>>>>>> Personally I always do a full reindex when going to a new version,
> >> just safer and you should always be able to do such at any point.
> However
> >> if you got the time to spare you can do an optimize and it will force
> the
> >> segments all into the current version
> >>>>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com>
> >> wrote:
> >>>>>>>> Hello all,
> >>>>>>>>
> >>>>>>>> Is there an easy way to determine Lucene versions for segments?
> >>>>>>>>
> >>>>>>>> If we were to do a full reindex, rewriting all segments, would
> that
> >> update the segment version to match the current Lucene version in use?
> >>>>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have
> >> discovered that several of our collections have segments that are
> Lucene 6.
> >>>>>>>> -Mike
> >>
>
>

Upgrade Solr Segments: UpgradeIndexMergePolicy

Posted by Michael Conrad <mi...@newsrx.com>.
Would this help?

UpgradeIndexMergePolicy

This |MergePolicy| 
<https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/MergePolicy.html> 
is used for upgrading all existing segments of an index when calling 
|IndexWriter.forceMerge(int)| 
<https://lucene.apache.org/core/8_2_0//core/org/apache/lucene/index/IndexWriter.html#forceMerge-int->. 
All other methods delegate to the base |MergePolicy| given to the 
constructor. This allows for an as-cheap-as possible upgrade of an older 
index by only upgrading segments that are created by previous Lucene 
versions. forceMerge does no longer really merge; it is just used to 
"forceMerge" older segment versions away.


On 10/7/21 8:46 AM, Rahul Goswami wrote:
> Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
> were originally written in 5.x and 6.x.
> We are scratching our heads to achieve this seamlessly since reindexing
> will take several weeks given the size of indexes for many of our customers.
>
> -Rahul
>
> On Thu, Oct 7, 2021 at 8:35 AM Michael Conrad <mi...@newsrx.com> wrote:
>
>> No, worst case is it closes the index writer and leaves the drive full.
>> 20k free space remaining.
>>
>> On 10/6/21 1:56 PM, Dave wrote:
>>> It’s ok. Worst case it just fails and kills the temporary index after
>> you run out of space. Really optimize is almost not even supported (it
>> still works) but a full reindex is always the best bet if you can destroy
>> original and it doesn’t effect anything
>>>> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
>>>>
>>>> too late.... it's in progress.
>>>>
>>>>> On 10/6/21 9:11 AM, Dave wrote:
>>>>> Hold on that idea then. An optimize will use three times your index
>> size possibly.
>>>>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com>
>> wrote:
>>>>>> Thanks,
>>>>>>
>>>>>> I think we'll try the full optimize route as we don't have storage to
>> spare for second copies, etc.
>>>>>> -Mike
>>>>>>
>>>>>>> On 10/6/21 8:54 AM, Dave wrote:
>>>>>>> Personally I always do a full reindex when going to a new version,
>> just safer and you should always be able to do such at any point.  However
>> if you got the time to spare you can do an optimize and it will force the
>> segments all into the current version
>>>>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com>
>> wrote:
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> Is there an easy way to determine Lucene versions for segments?
>>>>>>>>
>>>>>>>> If we were to do a full reindex, rewriting all segments, would that
>> update the segment version to match the current Lucene version in use?
>>>>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have
>> discovered that several of our collections have segments that are Lucene 6.
>>>>>>>> -Mike
>>


Re: Is there an easy way to determine Lucene versions for segments?

Posted by Rahul Goswami <ra...@gmail.com>.
*7.2.1 to 8.x (doesn’t matter anyway)


On Thu, Oct 7, 2021 at 8:46 AM Rahul Goswami <ra...@gmail.com> wrote:

>
> Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
> were originally written in 5.x and 6.x.
> We are scratching our heads to achieve this seamlessly since reindexing
> will take several weeks given the size of indexes for many of our customers.
>
> -Rahul
>
> On Thu, Oct 7, 2021 at 8:35 AM Michael Conrad <mi...@newsrx.com> wrote:
>
>> No, worst case is it closes the index writer and leaves the drive full.
>> 20k free space remaining.
>>
>> On 10/6/21 1:56 PM, Dave wrote:
>> > It’s ok. Worst case it just fails and kills the temporary index after
>> you run out of space. Really optimize is almost not even supported (it
>> still works) but a full reindex is always the best bet if you can destroy
>> original and it doesn’t effect anything
>> >
>> >> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
>> >>
>> >> too late.... it's in progress.
>> >>
>> >>> On 10/6/21 9:11 AM, Dave wrote:
>> >>> Hold on that idea then. An optimize will use three times your index
>> size possibly.
>> >>>
>> >>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com>
>> wrote:
>> >>>> Thanks,
>> >>>>
>> >>>> I think we'll try the full optimize route as we don't have storage
>> to spare for second copies, etc.
>> >>>>
>> >>>> -Mike
>> >>>>
>> >>>>> On 10/6/21 8:54 AM, Dave wrote:
>> >>>>> Personally I always do a full reindex when going to a new version,
>> just safer and you should always be able to do such at any point.  However
>> if you got the time to spare you can do an optimize and it will force the
>> segments all into the current version
>> >>>>>
>> >>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com>
>> wrote:
>> >>>>>> Hello all,
>> >>>>>>
>> >>>>>> Is there an easy way to determine Lucene versions for segments?
>> >>>>>>
>> >>>>>> If we were to do a full reindex, rewriting all segments, would
>> that update the segment version to match the current Lucene version in use?
>> >>>>>>
>> >>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have
>> discovered that several of our collections have segments that are Lucene 6.
>> >>>>>>
>> >>>>>> -Mike
>>
>>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Dave <ha...@gmail.com>.
“ I tried removing the check in SegmentInfos.java (
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321)
, compiled the code and ran a full sequence of index upgrades from 5.x ->
6.x -> 7.x ->8.x. The upgrade goes through fine. Also search/update
operations work without any issues.
”
I have to admit, that is super clever and really cool it worked. It’s still against the mantra that you should be able to do a full reindex at any given point, and the two versions, but I pulled this off in the way back days, going from a raw lucene index (made before solr1.x) straight into solr. 

But I’d really look into a way to remake the index as needed.  Trust me the investment is worth the time. 

Best of luck and nice job on the hack, just remember it’s a hack 

> On Jan 2, 2022, at 11:24 PM, Rahul Goswami <ra...@gmail.com> wrote:
> 
> I tried removing the check in SegmentInfos.java (
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321)
> , compiled the code and ran a full sequence of index upgrades from 5.x ->
> 6.x -> 7.x ->8.x. The upgrade goes through fine. Also search/update
> operations work without any issues.

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Rahul Goswami <ra...@gmail.com>.
Thanks Shawn. It does seem like more than anything, the check for LATEST-1
is a safeguard against potential future incompatible change at this point.
I tried removing the check in SegmentInfos.java (
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.11.1/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L321)
, compiled the code and ran a full sequence of index upgrades from 5.x ->
6.x -> 7.x ->8.x. The upgrade goes through fine. Also search/update
operations work without any issues.

I would still follow-up on the lucene mailing list to know the
nitty-gritties to have a complete understanding in case I am missing
anything. In my opinion such kinds of restrictions limit the practical
usability of an otherwise awesome project. Personally, I feel complete
reindexing is not a practical solution for many systems in the wild and it
would be great if there could be inbuilt support for such upgrades without
having a hard break in between.

Thanks,
Rahul

On Fri, Dec 31, 2021 at 12:51 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/30/2021 9:46 PM, Rahul Goswami wrote:
> > What is the reason for blocking the upgrade ? If someone has been able to
> > successfully upgrade an index written in 6.x in the past to 8.x without
> > doing a full reindex, I would love to get some ideas.
>
> I'm told that Lucene has never guaranteed that an index can be upgraded
> through two major versions, even if going to the interim major version
> first.  The guarantee is one major version back, which means 8.x
> versions can read indexes originally built by 7.0 and later.
>
> The Lucene IndexUpgrader utility just does a ForceMerge on the index.
> (Solr calls it Optimize, Lucene calls it ForceMerge).  The fact that the
> original segments were written by a 6.x version is preserved by a 7.x
> ForceMerge, and when 8.x sees that, it refuses to load the index, rather
> than risk the index not working correctly.
>
> If you want nitty gritty low level details about what kind of problems
> might occur from using an index originally built by 6.x in 8.x, you're
> going to need to consult a Lucene expert on a Lucene mailing list.  I do
> not understand Lucene at a low enough level to provide those answers.
>
> It is always best to be able to do a full reindex from the original
> source data at any time.  Almost all Solr installs will require schema
> changes as business needs shift, and as the administrator gains more
> knowledge about Solr.  Most schema changes require a reindex, and a
> reindex is strongly recommended with ANY Solr/Lucene upgrade, even from
> one minor version to the next minor version.
>
> Thanks,
> Shawn
>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/30/2021 9:46 PM, Rahul Goswami wrote:
> What is the reason for blocking the upgrade ? If someone has been able to
> successfully upgrade an index written in 6.x in the past to 8.x without
> doing a full reindex, I would love to get some ideas.

I'm told that Lucene has never guaranteed that an index can be upgraded 
through two major versions, even if going to the interim major version 
first.  The guarantee is one major version back, which means 8.x 
versions can read indexes originally built by 7.0 and later.

The Lucene IndexUpgrader utility just does a ForceMerge on the index. 
(Solr calls it Optimize, Lucene calls it ForceMerge).  The fact that the 
original segments were written by a 6.x version is preserved by a 7.x 
ForceMerge, and when 8.x sees that, it refuses to load the index, rather 
than risk the index not working correctly.

If you want nitty gritty low level details about what kind of problems 
might occur from using an index originally built by 6.x in 8.x, you're 
going to need to consult a Lucene expert on a Lucene mailing list.  I do 
not understand Lucene at a low enough level to provide those answers.

It is always best to be able to do a full reindex from the original 
source data at any time.  Almost all Solr installs will require schema 
changes as business needs shift, and as the administrator gains more 
knowledge about Solr.  Most schema changes require a reindex, and a 
reindex is strongly recommended with ANY Solr/Lucene upgrade, even from 
one minor version to the next minor version.

Thanks,
Shawn

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Rahul Goswami <ra...@gmail.com>.
Bringing this thread back from the grave! Looks like the approach won't
fly. I would really like to know the reason for blocking index upgrade for
any index ever touched by 6.x.
Reindexing the whole data is not a practical approach for scenarios when
there are multiple client deployments and would result in the user getting
blocked for possibly several days (aka multiple deployments impacted). This
is particularly the case with large indexes (hundreds of GBs, and in some
cases > 1 TB). Consequently, we are stuck on Solr 7.x. As mentioned
earlier, running optimize with maxSegments=1 won't help either, and I can't
seem to fathom why?

What is the reason for blocking the upgrade ? If someone has been able to
successfully upgrade an index written in 6.x in the past to 8.x without
doing a full reindex, I would love to get some ideas.

Thanks,
Rahul

On Thu, Oct 7, 2021 at 9:47 AM Rahul Goswami <ra...@gmail.com> wrote:

> That’s what we are planning on doing. Block writes(making index
> read-only), write into a parallel directory using 8.x lucene codec and then
> switch over if/when completely written. Downside is (temporarily) requiring
> double the index size in total disk space while writing the index given
> that we regularly run into multi-terabyte indexes.
>
> Easier said than done, given the unknown challenges in doing so, so the
> feasibility remains to be seen. I really wish there was a supported way to
> do this out of the box.
>
> On Thu, Oct 7, 2021 at 9:36 AM Michael Conrad <mi...@newsrx.com> wrote:
>
>>
>>
>> On 10/7/21 8:46 AM, Rahul Goswami wrote:
>> > Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
>> > were originally written in 5.x and 6.x.
>> > We are scratching our heads to achieve this seamlessly since reindexing
>> > will take several weeks given the size of indexes for many of our
>> customers.
>> >
>> > -Rahul
>> >
>> >
>>
>> There really needs to be a way to simply rewrite existing segments into
>> the new lucene format. with appropriate errors in case of unsupported
>> field types, etc.
>>
>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Rahul Goswami <ra...@gmail.com>.
That’s what we are planning on doing. Block writes(making index read-only),
write into a parallel directory using 8.x lucene codec and then switch over
if/when completely written. Downside is (temporarily) requiring double the
index size in total disk space while writing the index given that we
regularly run into multi-terabyte indexes.

Easier said than done, given the unknown challenges in doing so, so the
feasibility remains to be seen. I really wish there was a supported way to
do this out of the box.

On Thu, Oct 7, 2021 at 9:36 AM Michael Conrad <mi...@newsrx.com> wrote:

>
>
> On 10/7/21 8:46 AM, Rahul Goswami wrote:
> > Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
> > were originally written in 5.x and 6.x.
> > We are scratching our heads to achieve this seamlessly since reindexing
> > will take several weeks given the size of indexes for many of our
> customers.
> >
> > -Rahul
> >
> >
>
> There really needs to be a way to simply rewrite existing segments into
> the new lucene format. with appropriate errors in case of unsupported
> field types, etc.
>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.

On 10/7/21 8:46 AM, Rahul Goswami wrote:
> Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
> were originally written in 5.x and 6.x.
> We are scratching our heads to achieve this seamlessly since reindexing
> will take several weeks given the size of indexes for many of our customers.
>
> -Rahul
>
>

There really needs to be a way to simply rewrite existing segments into 
the new lucene format. with appropriate errors in case of unsupported 
field types, etc.

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Rahul Goswami <ra...@gmail.com>.
Won’t work. I have tried optimize on 7.7.2 to 8.x where several segments
were originally written in 5.x and 6.x.
We are scratching our heads to achieve this seamlessly since reindexing
will take several weeks given the size of indexes for many of our customers.

-Rahul

On Thu, Oct 7, 2021 at 8:35 AM Michael Conrad <mi...@newsrx.com> wrote:

> No, worst case is it closes the index writer and leaves the drive full.
> 20k free space remaining.
>
> On 10/6/21 1:56 PM, Dave wrote:
> > It’s ok. Worst case it just fails and kills the temporary index after
> you run out of space. Really optimize is almost not even supported (it
> still works) but a full reindex is always the best bet if you can destroy
> original and it doesn’t effect anything
> >
> >> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
> >>
> >> too late.... it's in progress.
> >>
> >>> On 10/6/21 9:11 AM, Dave wrote:
> >>> Hold on that idea then. An optimize will use three times your index
> size possibly.
> >>>
> >>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com>
> wrote:
> >>>> Thanks,
> >>>>
> >>>> I think we'll try the full optimize route as we don't have storage to
> spare for second copies, etc.
> >>>>
> >>>> -Mike
> >>>>
> >>>>> On 10/6/21 8:54 AM, Dave wrote:
> >>>>> Personally I always do a full reindex when going to a new version,
> just safer and you should always be able to do such at any point.  However
> if you got the time to spare you can do an optimize and it will force the
> segments all into the current version
> >>>>>
> >>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com>
> wrote:
> >>>>>> Hello all,
> >>>>>>
> >>>>>> Is there an easy way to determine Lucene versions for segments?
> >>>>>>
> >>>>>> If we were to do a full reindex, rewriting all segments, would that
> update the segment version to match the current Lucene version in use?
> >>>>>>
> >>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have
> discovered that several of our collections have segments that are Lucene 6.
> >>>>>>
> >>>>>> -Mike
>
>

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.
No, worst case is it closes the index writer and leaves the drive full. 
20k free space remaining.

On 10/6/21 1:56 PM, Dave wrote:
> It’s ok. Worst case it just fails and kills the temporary index after you run out of space. Really optimize is almost not even supported (it still works) but a full reindex is always the best bet if you can destroy original and it doesn’t effect anything
>
>> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
>>
>> too late.... it's in progress.
>>
>>> On 10/6/21 9:11 AM, Dave wrote:
>>> Hold on that idea then. An optimize will use three times your index size possibly.
>>>
>>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>> Thanks,
>>>>
>>>> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
>>>>
>>>> -Mike
>>>>
>>>>> On 10/6/21 8:54 AM, Dave wrote:
>>>>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>>>>>
>>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> Is there an easy way to determine Lucene versions for segments?
>>>>>>
>>>>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>>>>>
>>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>>>>>
>>>>>> -Mike


Re: Is there an easy way to determine Lucene versions for segments?

Posted by Dave <ha...@gmail.com>.
It’s ok. Worst case it just fails and kills the temporary index after you run out of space. Really optimize is almost not even supported (it still works) but a full reindex is always the best bet if you can destroy original and it doesn’t effect anything

> On Oct 6, 2021, at 1:53 PM, Michael Conrad <mi...@newsrx.com> wrote:
> 
> too late.... it's in progress.
> 
>> On 10/6/21 9:11 AM, Dave wrote:
>> Hold on that idea then. An optimize will use three times your index size possibly.
>> 
>>>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>> 
>>> Thanks,
>>> 
>>> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
>>> 
>>> -Mike
>>> 
>>>> On 10/6/21 8:54 AM, Dave wrote:
>>>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>>>> 
>>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>>> Hello all,
>>>>> 
>>>>> Is there an easy way to determine Lucene versions for segments?
>>>>> 
>>>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>>>> 
>>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>>>> 
>>>>> -Mike
> 

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.
too late.... it's in progress.

On 10/6/21 9:11 AM, Dave wrote:
> Hold on that idea then. An optimize will use three times your index size possibly.
>
>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>
>> Thanks,
>>
>> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
>>
>> -Mike
>>
>>> On 10/6/21 8:54 AM, Dave wrote:
>>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>>>
>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>> Hello all,
>>>>
>>>> Is there an easy way to determine Lucene versions for segments?
>>>>
>>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>>>
>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>>>
>>>> -Mike


Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.
This approach doesn't work if the index is already a single segment 
since Solr Old.x.

What would probably be better if there is a way to rewrite segments, 
without actually trying to merge them.

On 10/6/21 9:36 AM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> FWIW, one way that we have used to determine whether a collection is ready for upgrade is to run a command like
>
> java -Xms512m -Xmx4g -cp lucene-core-8.5.2.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /your/path/here/snapshot.211006-054002/snapshot.shard1
>
> This will balk if the data was created under Solr6, but is happy enough with Solr7 (although this does not check for such things as deprecated data types)
>
> I would still be interested in the answer to your original question, so that I can confirm that all our data will be ready for Solr9 upgrade (preferably without having to wait for lucene-core-9.0.0.jar to come into existence). We do a complete reindex when we upgrade (but I want to check that no one skipped that step)
>
> -----Original Message-----
> From: Dave <ha...@gmail.com>
> Sent: Wednesday, October 06, 2021 9:11 AM
> To: users@solr.apache.org
> Cc: Jason Carter <ja...@newsrx.com>
> Subject: Re: Is there an easy way to determine Lucene versions for segments?
>
> Hold on that idea then. An optimize will use three times your index size possibly.
>
>> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>
>> Thanks,
>>
>> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
>>
>> -Mike
>>
>>> On 10/6/21 8:54 AM, Dave wrote:
>>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>>>
>>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>>> Hello all,
>>>>
>>>> Is there an easy way to determine Lucene versions for segments?
>>>>
>>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>>>
>>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>>>
>>>> -Mike


RE: Is there an easy way to determine Lucene versions for segments?

Posted by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov.INVALID>.
FWIW, one way that we have used to determine whether a collection is ready for upgrade is to run a command like

java -Xms512m -Xmx4g -cp lucene-core-8.5.2.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /your/path/here/snapshot.211006-054002/snapshot.shard1

This will balk if the data was created under Solr6, but is happy enough with Solr7 (although this does not check for such things as deprecated data types)

I would still be interested in the answer to your original question, so that I can confirm that all our data will be ready for Solr9 upgrade (preferably without having to wait for lucene-core-9.0.0.jar to come into existence). We do a complete reindex when we upgrade (but I want to check that no one skipped that step)

-----Original Message-----
From: Dave <ha...@gmail.com> 
Sent: Wednesday, October 06, 2021 9:11 AM
To: users@solr.apache.org
Cc: Jason Carter <ja...@newsrx.com>
Subject: Re: Is there an easy way to determine Lucene versions for segments?

Hold on that idea then. An optimize will use three times your index size possibly. 

> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
> 
> Thanks,
> 
> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
> 
> -Mike
> 
>> On 10/6/21 8:54 AM, Dave wrote:
>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>> 
>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> Is there an easy way to determine Lucene versions for segments?
>>> 
>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>> 
>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>> 
>>> -Mike
> 

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Dave <ha...@gmail.com>.
Hold on that idea then. An optimize will use three times your index size possibly. 

> On Oct 6, 2021, at 9:02 AM, Michael Conrad <mi...@newsrx.com> wrote:
> 
> Thanks,
> 
> I think we'll try the full optimize route as we don't have storage to spare for second copies, etc.
> 
> -Mike
> 
>> On 10/6/21 8:54 AM, Dave wrote:
>> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>> 
>>>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> Is there an easy way to determine Lucene versions for segments?
>>> 
>>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>> 
>>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>> 
>>> -Mike
> 

Re: Is there an easy way to determine Lucene versions for segments?

Posted by Michael Conrad <mi...@newsrx.com>.
Thanks,

I think we'll try the full optimize route as we don't have storage to 
spare for second copies, etc.

-Mike

On 10/6/21 8:54 AM, Dave wrote:
> Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version
>
>> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
>>
>> Hello all,
>>
>> Is there an easy way to determine Lucene versions for segments?
>>
>> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
>>
>> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
>>
>> -Mike


Re: Is there an easy way to determine Lucene versions for segments?

Posted by Dave <ha...@gmail.com>.
Personally I always do a full reindex when going to a new version, just safer and you should always be able to do such at any point.  However if you got the time to spare you can do an optimize and it will force the segments all into the current version

> On Oct 6, 2021, at 8:46 AM, Michael Conrad <mi...@newsrx.com> wrote:
> 
> Hello all,
> 
> Is there an easy way to determine Lucene versions for segments?
> 
> If we were to do a full reindex, rewriting all segments, would that update the segment version to match the current Lucene version in use?
> 
> We are working on upgrading from Solr 7.7.3 to Solr 8.x but have discovered that several of our collections have segments that are Lucene 6.
> 
> -Mike