You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doug Reeder <do...@ahlbrandsgroup.com> on 2019/05/03 03:20:50 UTC

Reverse-engineering existing installation

The documentation for SOLR is good.  However it is oriented toward setting
up a new installation, with the data model known.

I have inherited an existing installation.  Aspects of the data model I
know, but there's a lot of ways things could have been configured in SOLR,
and for some cases, I don't know what SOLR was supposed to do.

Can you reccomend any documentation on working out the configuration of an
existing installation?

Re: Reverse-engineering existing installation

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
My presentation from 2016 may be interesting as I deconstruct a Solr
example, including the tips/commands on how to do so:
https://www.slideshare.net/arafalov/rebuilding-solr-6-examples-layer-by-layer-lucenesolrrevolution-2016

The commands start around the slide 20.

Hope this helps,
    Alex.
P.s. If this (and other's ideas) is not enough, make sure to mention
the Solr version when you come back for additional questions. It may
help to know which files to recommend checking for additional hints.

On Thu, 2 May 2019 at 23:21, Doug Reeder <do...@ahlbrandsgroup.com> wrote:
>
> The documentation for SOLR is good.  However it is oriented toward setting
> up a new installation, with the data model known.
>
> I have inherited an existing installation.  Aspects of the data model I
> know, but there's a lot of ways things could have been configured in SOLR,
> and for some cases, I don't know what SOLR was supposed to do.
>
> Can you reccomend any documentation on working out the configuration of an
> existing installation?

Re: Reverse-engineering existing installation

Posted by Erick Erickson <er...@gmail.com>.
Wait. I was recommending you diff the 4.2.1 solrconfig and the solrconfig you’re using. Ditto with the schema. If you’re trying to diff the 7x or 8x ones they’ll be totally different.

But if you are getting massive differences in the yo4.2.1 stock and what you’re using, then whoever set it up made the changes and you’ll probably have to go through them by hand, noting all the differences in the non-commented parts.

Things that are _missing_ from the one you’re using .vs. the stock distro files you can pretty much ignore. They’ll be interesting in that you can delete the equivalent from the new distro, but…

I expect the schema will be the most different, solrconfig usually doesn’t change much.

FWIW,
Erick



> On May 3, 2019, at 7:30 PM, Doug Reeder <do...@ahlbrandsgroup.com> wrote:
> 
> Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
> need to strip the comments before we can get a useful diff of
> solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
> hasn't been updated in years.
> 
> 
> On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 5/3/2019 1:44 PM, Erick Erickson wrote:
>>> Then git will let you check out any previous branch. 4.2 is from before
>> we switched to Git, co I’m not sure you can go that far back, but 4x is
>> probably close enough for comparing configs.
>> 
>> Git has all of Lucene's history, and most of Solr's history, back to
>> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
>> releases are there:
>> 
>> --------------------
>> elyograg@smeagol:~/asf/lucene-solr$ git checkout
>> releases/lucene-solr/4.2.1
>> Checking out files: 100% (13209/13209), done.
>> Note: checking out 'releases/lucene-solr/4.2.1'.
>> 
>> You are in 'detached HEAD' state. You can look around, make experimental
>> changes and commit them, and you can discard any commits you make in
>> this state without impacting any branches by performing another checkout.
>> 
>> If you want to create a new branch to retain commits you create, you may
>> do so (now or later) by using -b with the checkout command again. Example:
>> 
>>   git checkout -b <new-branch-name>
>> 
>> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
>> --------------------
>> 
>> Thanks,
>> Shawn
>> 


Re: Reverse-engineering existing installation

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I would focus on fields not being used. Then, on types not used. Then, you
will see what was actually custom to your setup.

In solrconfig.xml, I would focus on request handlers and maybe defaults
used.

Regards,
     Alex

On Mon, May 6, 2019, 2:25 PM Doug Reeder, <do...@ahlbrandsgroup.com> wrote:

> Thanks, xmlstarlet makes it straightforward to get the canonical XML.
>
> It looks like our schema.xml files are rather different from files
> like solr/example/solr/collection1/conf/schema.xml
>
> Any suggestions of sections I should focus on?
>
> On Sat, May 4, 2019 at 8:11 AM Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
> > XMLStarlet still works just fine. So if you want the fast way, that is
> the
> > one.
> >
> > Otherwise, some xml editors can do it (not sure which ones) or you can
> look
> > for XSLT or XQuery examples on the web.
> >
> > XMLStarlet actually just spits out XSLT internally, or even externally if
> > you ask.
> >
> > Regards,
> >      Alex
> >
> >
> > On Fri, May 3, 2019, 10:30 PM Doug Reeder, <do...@ahlbrandsgroup.com>
> > wrote:
> >
> > > Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like
> we'll
> > > need to strip the comments before we can get a useful diff of
> > > solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> > > files?  XMLStarlet is hosted on SourceForge, which I no longer trust,
> and
> > > hasn't been updated in years.
> > >
> > >
> > > On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org>
> wrote:
> > >
> > > > On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > > > > Then git will let you check out any previous branch. 4.2 is from
> > before
> > > > we switched to Git, co I’m not sure you can go that far back, but 4x
> is
> > > > probably close enough for comparing configs.
> > > >
> > > > Git has all of Lucene's history, and most of Solr's history, back to
> > > > when Lucene and Solr were merged before the 3.1.0 release.  So the
> 4.x
> > > > releases are there:
> > > >
> > > > --------------------
> > > > elyograg@smeagol:~/asf/lucene-solr$ git checkout
> > > > releases/lucene-solr/4.2.1
> > > > Checking out files: 100% (13209/13209), done.
> > > > Note: checking out 'releases/lucene-solr/4.2.1'.
> > > >
> > > > You are in 'detached HEAD' state. You can look around, make
> > experimental
> > > > changes and commit them, and you can discard any commits you make in
> > > > this state without impacting any branches by performing another
> > checkout.
> > > >
> > > > If you want to create a new branch to retain commits you create, you
> > may
> > > > do so (now or later) by using -b with the checkout command again.
> > > Example:
> > > >
> > > >    git checkout -b <new-branch-name>
> > > >
> > > > HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> > > > --------------------
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > >
> >
>

Re: Reverse-engineering existing installation

Posted by Erick Erickson <er...@gmail.com>.
Unfortunately…everything. You may have to compare tag-by-tag, especially in solrconfig.xml. In the schema, all your fieldTypes and the associated fields are critical….



> On May 6, 2019, at 11:25 AM, Doug Reeder <do...@ahlbrandsgroup.com> wrote:
> 
> Thanks, xmlstarlet makes it straightforward to get the canonical XML.
> 
> It looks like our schema.xml files are rather different from files
> like solr/example/solr/collection1/conf/schema.xml
> 
> Any suggestions of sections I should focus on?
> 
> On Sat, May 4, 2019 at 8:11 AM Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
> 
>> XMLStarlet still works just fine. So if you want the fast way, that is the
>> one.
>> 
>> Otherwise, some xml editors can do it (not sure which ones) or you can look
>> for XSLT or XQuery examples on the web.
>> 
>> XMLStarlet actually just spits out XSLT internally, or even externally if
>> you ask.
>> 
>> Regards,
>>     Alex
>> 
>> 
>> On Fri, May 3, 2019, 10:30 PM Doug Reeder, <do...@ahlbrandsgroup.com>
>> wrote:
>> 
>>> Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
>>> need to strip the comments before we can get a useful diff of
>>> solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
>>> files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
>>> hasn't been updated in years.
>>> 
>>> 
>>> On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>> 
>>>> On 5/3/2019 1:44 PM, Erick Erickson wrote:
>>>>> Then git will let you check out any previous branch. 4.2 is from
>> before
>>>> we switched to Git, co I’m not sure you can go that far back, but 4x is
>>>> probably close enough for comparing configs.
>>>> 
>>>> Git has all of Lucene's history, and most of Solr's history, back to
>>>> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
>>>> releases are there:
>>>> 
>>>> --------------------
>>>> elyograg@smeagol:~/asf/lucene-solr$ git checkout
>>>> releases/lucene-solr/4.2.1
>>>> Checking out files: 100% (13209/13209), done.
>>>> Note: checking out 'releases/lucene-solr/4.2.1'.
>>>> 
>>>> You are in 'detached HEAD' state. You can look around, make
>> experimental
>>>> changes and commit them, and you can discard any commits you make in
>>>> this state without impacting any branches by performing another
>> checkout.
>>>> 
>>>> If you want to create a new branch to retain commits you create, you
>> may
>>>> do so (now or later) by using -b with the checkout command again.
>>> Example:
>>>> 
>>>>   git checkout -b <new-branch-name>
>>>> 
>>>> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
>>>> --------------------
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>>> 
>> 


Re: Reverse-engineering existing installation

Posted by Doug Reeder <do...@ahlbrandsgroup.com>.
Thanks, xmlstarlet makes it straightforward to get the canonical XML.

It looks like our schema.xml files are rather different from files
like solr/example/solr/collection1/conf/schema.xml

Any suggestions of sections I should focus on?

On Sat, May 4, 2019 at 8:11 AM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> XMLStarlet still works just fine. So if you want the fast way, that is the
> one.
>
> Otherwise, some xml editors can do it (not sure which ones) or you can look
> for XSLT or XQuery examples on the web.
>
> XMLStarlet actually just spits out XSLT internally, or even externally if
> you ask.
>
> Regards,
>      Alex
>
>
> On Fri, May 3, 2019, 10:30 PM Doug Reeder, <do...@ahlbrandsgroup.com>
> wrote:
>
> > Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
> > need to strip the comments before we can get a useful diff of
> > solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> > files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
> > hasn't been updated in years.
> >
> >
> > On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org> wrote:
> >
> > > On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > > > Then git will let you check out any previous branch. 4.2 is from
> before
> > > we switched to Git, co I’m not sure you can go that far back, but 4x is
> > > probably close enough for comparing configs.
> > >
> > > Git has all of Lucene's history, and most of Solr's history, back to
> > > when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> > > releases are there:
> > >
> > > --------------------
> > > elyograg@smeagol:~/asf/lucene-solr$ git checkout
> > > releases/lucene-solr/4.2.1
> > > Checking out files: 100% (13209/13209), done.
> > > Note: checking out 'releases/lucene-solr/4.2.1'.
> > >
> > > You are in 'detached HEAD' state. You can look around, make
> experimental
> > > changes and commit them, and you can discard any commits you make in
> > > this state without impacting any branches by performing another
> checkout.
> > >
> > > If you want to create a new branch to retain commits you create, you
> may
> > > do so (now or later) by using -b with the checkout command again.
> > Example:
> > >
> > >    git checkout -b <new-branch-name>
> > >
> > > HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> > > --------------------
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>

Re: Reverse-engineering existing installation

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
XMLStarlet still works just fine. So if you want the fast way, that is the
one.

Otherwise, some xml editors can do it (not sure which ones) or you can look
for XSLT or XQuery examples on the web.

XMLStarlet actually just spits out XSLT internally, or even externally if
you ask.

Regards,
     Alex


On Fri, May 3, 2019, 10:30 PM Doug Reeder, <do...@ahlbrandsgroup.com> wrote:

> Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
> need to strip the comments before we can get a useful diff of
> solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
> files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
> hasn't been updated in years.
>
>
> On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > > Then git will let you check out any previous branch. 4.2 is from before
> > we switched to Git, co I’m not sure you can go that far back, but 4x is
> > probably close enough for comparing configs.
> >
> > Git has all of Lucene's history, and most of Solr's history, back to
> > when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> > releases are there:
> >
> > --------------------
> > elyograg@smeagol:~/asf/lucene-solr$ git checkout
> > releases/lucene-solr/4.2.1
> > Checking out files: 100% (13209/13209), done.
> > Note: checking out 'releases/lucene-solr/4.2.1'.
> >
> > You are in 'detached HEAD' state. You can look around, make experimental
> > changes and commit them, and you can discard any commits you make in
> > this state without impacting any branches by performing another checkout.
> >
> > If you want to create a new branch to retain commits you create, you may
> > do so (now or later) by using -b with the checkout command again.
> Example:
> >
> >    git checkout -b <new-branch-name>
> >
> > HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> > --------------------
> >
> > Thanks,
> > Shawn
> >
>

Re: Reverse-engineering existing installation

Posted by Doug Reeder <do...@ahlbrandsgroup.com>.
Thanks! Diffs for solr.xml and zoo.cfg were easy, but it looks like we'll
need to strip the comments before we can get a useful diff of
solrconfig.xml or schema.xml.  Can you recommend tools to normalize XML
files?  XMLStarlet is hosted on SourceForge, which I no longer trust, and
hasn't been updated in years.


On Fri, May 3, 2019 at 4:24 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/3/2019 1:44 PM, Erick Erickson wrote:
> > Then git will let you check out any previous branch. 4.2 is from before
> we switched to Git, co I’m not sure you can go that far back, but 4x is
> probably close enough for comparing configs.
>
> Git has all of Lucene's history, and most of Solr's history, back to
> when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x
> releases are there:
>
> --------------------
> elyograg@smeagol:~/asf/lucene-solr$ git checkout
> releases/lucene-solr/4.2.1
> Checking out files: 100% (13209/13209), done.
> Note: checking out 'releases/lucene-solr/4.2.1'.
>
> You are in 'detached HEAD' state. You can look around, make experimental
> changes and commit them, and you can discard any commits you make in
> this state without impacting any branches by performing another checkout.
>
> If you want to create a new branch to retain commits you create, you may
> do so (now or later) by using -b with the checkout command again. Example:
>
>    git checkout -b <new-branch-name>
>
> HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
> --------------------
>
> Thanks,
> Shawn
>

Re: Reverse-engineering existing installation

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/3/2019 1:44 PM, Erick Erickson wrote:
> Then git will let you check out any previous branch. 4.2 is from before we switched to Git, co I’m not sure you can go that far back, but 4x is probably close enough for comparing configs.

Git has all of Lucene's history, and most of Solr's history, back to 
when Lucene and Solr were merged before the 3.1.0 release.  So the 4.x 
releases are there:

--------------------
elyograg@smeagol:~/asf/lucene-solr$ git checkout releases/lucene-solr/4.2.1
Checking out files: 100% (13209/13209), done.
Note: checking out 'releases/lucene-solr/4.2.1'.

You are in 'detached HEAD' state. You can look around, make experimental 
changes and commit them, and you can discard any commits you make in 
this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may 
do so (now or later) by using -b with the checkout command again. Example:

   git checkout -b <new-branch-name>

HEAD is now at 50c41a3e5c Lucene Java 4.2.1 release.
--------------------

Thanks,
Shawn

Re: Reverse-engineering existing installation

Posted by Erick Erickson <er...@gmail.com>.
Doug:

You can pull any version of Solr from Git.

git clone https://gitbox.apache.org/repos/asf/lucene-solr.git some_local_dir

Then git will let you check out any previous branch. 4.2 is from before we switched to Git, co I’m not sure you can go that far back, but 4x is probably close enough for comparing configs.

All that said, and assuming you’re going to either 7x or 8x… I’d just think about starting over. Once you get the old configs and account for 

1> any schema changes.
2> any config changes, _especially_ any custom components

Consider starting with a current version of Solr and re-indexing. You’ll absolutely _have_ to re-index _all_ your source material anyway so thinking about going from 4x->5x->6x->7x->8x is futile anyway.

Best,
Erick

> On May 3, 2019, at 12:51 PM, Doug Reeder <do...@ahlbrandsgroup.com> wrote:
> 
> Thanks! Alexandre's presentation is helpful in understanding what's not
> essential.  David's suggesting of comparing config files is good - I'll
> have to see if I can dig up the config files for version 4.2, which we're
> currently running.
> 
> I'll also look into updating to a supported version. I guess I'll be
> reading https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html and
> the similar ones for later versions.  Is an upgrade guide for version 4 to
> 5 still around somewhere?
> 
> On Fri, May 3, 2019 at 12:21 AM David Smiley <da...@gmail.com>
> wrote:
> 
>> Consider trying to diff configs from a default at the version it was copied
>> from, if possible. Even better, the configs should be in source control and
>> then you can browse history with commentary and sometimes links to issue
>> trackers and code reviews.
>> 
>> Also a big part that you can’t see by staring at configs is what the
>> queries look like. You should examine the system interacting with Solr to
>> observe embedded comments/docs for insights.
>> 
>> On Thu, May 2, 2019 at 11:21 PM Doug Reeder <do...@ahlbrandsgroup.com>
>> wrote:
>> 
>>> The documentation for SOLR is good.  However it is oriented toward
>> setting
>>> up a new installation, with the data model known.
>>> 
>>> I have inherited an existing installation.  Aspects of the data model I
>>> know, but there's a lot of ways things could have been configured in
>> SOLR,
>>> and for some cases, I don't know what SOLR was supposed to do.
>>> 
>>> Can you reccomend any documentation on working out the configuration of
>> an
>>> existing installation?
>>> 
>> --
>> Sent from Gmail Mobile
>> 


Re: Reverse-engineering existing installation

Posted by Doug Reeder <do...@ahlbrandsgroup.com>.
Thanks! Alexandre's presentation is helpful in understanding what's not
essential.  David's suggesting of comparing config files is good - I'll
have to see if I can dig up the config files for version 4.2, which we're
currently running.

I'll also look into updating to a supported version. I guess I'll be
reading https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html and
the similar ones for later versions.  Is an upgrade guide for version 4 to
5 still around somewhere?

On Fri, May 3, 2019 at 12:21 AM David Smiley <da...@gmail.com>
wrote:

> Consider trying to diff configs from a default at the version it was copied
> from, if possible. Even better, the configs should be in source control and
> then you can browse history with commentary and sometimes links to issue
> trackers and code reviews.
>
> Also a big part that you can’t see by staring at configs is what the
> queries look like. You should examine the system interacting with Solr to
> observe embedded comments/docs for insights.
>
> On Thu, May 2, 2019 at 11:21 PM Doug Reeder <do...@ahlbrandsgroup.com>
> wrote:
>
> > The documentation for SOLR is good.  However it is oriented toward
> setting
> > up a new installation, with the data model known.
> >
> > I have inherited an existing installation.  Aspects of the data model I
> > know, but there's a lot of ways things could have been configured in
> SOLR,
> > and for some cases, I don't know what SOLR was supposed to do.
> >
> > Can you reccomend any documentation on working out the configuration of
> an
> > existing installation?
> >
> --
> Sent from Gmail Mobile
>

Re: Reverse-engineering existing installation

Posted by David Smiley <da...@gmail.com>.
Consider trying to diff configs from a default at the version it was copied
from, if possible. Even better, the configs should be in source control and
then you can browse history with commentary and sometimes links to issue
trackers and code reviews.

Also a big part that you can’t see by staring at configs is what the
queries look like. You should examine the system interacting with Solr to
observe embedded comments/docs for insights.

On Thu, May 2, 2019 at 11:21 PM Doug Reeder <do...@ahlbrandsgroup.com> wrote:

> The documentation for SOLR is good.  However it is oriented toward setting
> up a new installation, with the data model known.
>
> I have inherited an existing installation.  Aspects of the data model I
> know, but there's a lot of ways things could have been configured in SOLR,
> and for some cases, I don't know what SOLR was supposed to do.
>
> Can you reccomend any documentation on working out the configuration of an
> existing installation?
>
-- 
Sent from Gmail Mobile