You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ravi Kumar Taminidi <ra...@whitepine-st.com> on 2017/05/19 14:33:46 UTC

Solr in NAS or Network Shared Drive

Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different servers (linux), Is there any way can we make the Core to be located in NAS or Network shared Drive so both the solrs using the same Index.

Let me know if any performance issues, our size of Index is appx 1GB.

Thanks

Ravi

-----Original Message-----
From: biplobbiswas [mailto:revolutionisme+solr@gmail.com] 
Sent: Friday, May 19, 2017 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Nested Document is flattened even with @Field(child = true) annotation

Hi
Mikhail Khludnev-2 wrote
> Hello,
> 
> You need to use
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherPa
> rsers-BlockJoinQueryParsers
> and
> https://cwiki.apache.org/confluence/display/solr/Transforming+Result+D
> ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFactor
> y
> to get the nested data back.
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev

I had already gone through those links you posted and they talk about retrieving after indexing. My problem is that my documents are not indexed in a nested structure.

Can you please look at the first comment as well where I posted a sample code and sample response which i get back.

Because its creating distinct documents for nested structure




--
View this message in context: http://lucene.472066.n3.nabble.com/Nested-Document-is-flattened-even-with-Field-child-true-annotation-tp4335877p4335891.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

I agree completely, it was just something ive always wanted to try doing.
 if my indexes were smaller id just fire up a bunch of slaves on a single
machine and nginx them out, but even 2tb SSD's are some what expensive and
theres not always enough ports on the servers to keep adding more.

On Fri, May 19, 2017 at 3:33 PM, Erick Erickson <er...@gmail.com>
wrote:

> One problem here is how to open new searchers on the r/o core.
> Consider  the autocommit setting. The cycle is
> > when the first doc comes in, start your timer
> > x milliseconds later, do a commit and (perhaps) open a new searcher.
>
> but the core referencing the index in R/O mode doesn't have any update
> event to start the timer.
>
> Even if you issue a commit to the R/O copy, there is short-circuiting
> in the code that says "since nothing's changed, I'll just ignore
> this". And by definition, the commit is an update call....
>
> I suppose you could force things here by issuing a reload command on
> the R/O core...
>
> WARNING: Since I strongly discourage this it's not something I've
> personally verified....
>
> And the risk of corrupting your index because someone inadvertently
> does something unexpected is high.
>
> I always come back to the question of why in the world spend
> engineering time on this when the  cost of a new 1TB disk is so low. I
> realize there are environments where you can't "just plug in another
> disk", but you see where I'm going.
>
> Best,
> Erick
>
> On Fri, May 19, 2017 at 11:47 AM, David Hastings
> <ha...@gmail.com> wrote:
> > Mt thought would be that the machine would need only the same amount of
> ram
> > minus the heap size of the second instance of solr, since it will be file
> > caching the index into memory only once since its the same files, but
> read
> > by both solr instances.  my solr slaves have about 150gb each.
> >
> > On Fri, May 19, 2017 at 2:45 PM, Rick Leir <rl...@leirtech.com> wrote:
> >
> >> > multiple solr instances on one machine performs better than multiple
> >>
> >> Does the machine have enough RAM to support all the instances? Again,
> time
> >> for an experiment!
> >> --
> >> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

Re: Solr in NAS or Network Shared Drive

Posted by Erick Erickson <er...@gmail.com>.

One problem here is how to open new searchers on the r/o core.
Consider  the autocommit setting. The cycle is
> when the first doc comes in, start your timer
> x milliseconds later, do a commit and (perhaps) open a new searcher.

but the core referencing the index in R/O mode doesn't have any update
event to start the timer.

Even if you issue a commit to the R/O copy, there is short-circuiting
in the code that says "since nothing's changed, I'll just ignore
this". And by definition, the commit is an update call....

I suppose you could force things here by issuing a reload command on
the R/O core...

WARNING: Since I strongly discourage this it's not something I've
personally verified....

And the risk of corrupting your index because someone inadvertently
does something unexpected is high.

I always come back to the question of why in the world spend
engineering time on this when the  cost of a new 1TB disk is so low. I
realize there are environments where you can't "just plug in another
disk", but you see where I'm going.

Best,
Erick

On Fri, May 19, 2017 at 11:47 AM, David Hastings
<ha...@gmail.com> wrote:
> Mt thought would be that the machine would need only the same amount of ram
> minus the heap size of the second instance of solr, since it will be file
> caching the index into memory only once since its the same files, but read
> by both solr instances.  my solr slaves have about 150gb each.
>
> On Fri, May 19, 2017 at 2:45 PM, Rick Leir <rl...@leirtech.com> wrote:
>
>> > multiple solr instances on one machine performs better than multiple
>>
>> Does the machine have enough RAM to support all the instances? Again, time
>> for an experiment!
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

Mt thought would be that the machine would need only the same amount of ram
minus the heap size of the second instance of solr, since it will be file
caching the index into memory only once since its the same files, but read
by both solr instances.  my solr slaves have about 150gb each.

On Fri, May 19, 2017 at 2:45 PM, Rick Leir <rl...@leirtech.com> wrote:

> > multiple solr instances on one machine performs better than multiple
>
> Does the machine have enough RAM to support all the instances? Again, time
> for an experiment!
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr in NAS or Network Shared Drive

Posted by Rick Leir <rl...@leirtech.com>.

> multiple solr instances on one machine performs better than multiple 

Does the machine have enough RAM to support all the instances? Again, time for an experiment!
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

RE: Solr in NAS or Network Shared Drive

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Docker has a "layered" filesystem strategy, where new writes are written to a top layer, so maybe there's a way to do this with docker.
Pretty speculative, but:

- Start a docker container based on an image containing Solr, but no index data.
- Build your index within the image.
- Shutdown solr and build a new Docker image from the container.
- Now start two new docker containers from that image, both running Solr.

Getting to this architecture may have some gotchas, as you clearly don't want to reindex 350gb+400gb, and don't have the storage to copy it over into a docker image.  Maybe OS-level backup/restore could solve this problem.   Also, getting docker to store/load images from NFS is a small detail - they either have configuration or you can use mount/symbolic links.

Pardon the outlandish solution, but I tend to think Systems engineering fix first, and Java coding second.   I think that maybe they could help on the Java side over at lucene-user, because a Java solution would need to be pretty deep, and involving changing some of the basics on how (whether) locking is done.

-----Original Message-----
From: David Hastings [mailto:hastings.recursive@gmail.com] 
Sent: Friday, May 19, 2017 1:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in NAS or Network Shared Drive

The reason for me to want to try it is because replication is not possible on the single machine, as the index size is around 350gb+another 400gb, and i dont have enough SSD to cover a replication from the master node.  Also i have a theory and heard this as well from a presentation at the LR conference in Boston this past year, that multiple solr instances on one machine performs better than multiple machines, would be interesting to have solr have a "read only"/"listen" state to do no writing to the index, but keep referencing the index properties/version files.

On Fri, May 19, 2017 at 1:26 PM, Davis, Daniel (NIH/NLM) [C] < daniel.davis@nih.gov> wrote:

> Better off to just do Replication to the slave using the replication 
> handler.
>
> However, if there  is no network connectivity, e.g. this is an offsite 
> cold/warm spare, then here is a solution:
>
> The NAS likely supports some Copy-on-write/snapshotting capabilities.   If
> your systems people will work with you, you can use the 
> replication/backup handler to take a NAS snapshot just after hard commit, and then have the
> snapshot replicated to another volume.   I suspect Solr will have to be
> started on the cold/warm spare when you do a failover to offsite, 
> because I know of no way to have the OS react to events when a 
> snapshot is replicated by the NAS.
>
> This kind of solution is what you might see for an Oracle, or any 
> other binary ACID database, so you can look at best practices for 
> integrating these products with Netapp or EMC Celera for more ideas.
>
> -----Original Message-----
> From: Rick Leir [mailto:rleir@leirtech.com]
> Sent: Friday, May 19, 2017 12:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in NAS or Network Shared Drive
>
> For an experiment, mount the NAS filesystem ro (readonly). Is there 
> any way to tell Solr not to bother with a lockfile? And what happens 
> if an update or add gets requested by mistake, does it take down Solr?
>
> Why not do this all the simple way, and just replicate?
>
> On May 19, 2017 10:41:19 AM EDT, David Hastings < 
> hastings.recursive@gmail.com> wrote:
> >ive always wanted to experiment with this, but you have to be very 
> >careful that only one of the cores, or neither, can do ANY writes, 
> >also if you have a suggester index you need to make sure that each 
> >core builds their own independently.  In any case from every thing 
> >ive read the general answer is dont do it.  would like to hear other 
> >peoples thoughts on this however.
> >
> >On Fri, May 19, 2017 at 10:33 AM, Ravi Kumar Taminidi < 
> >ravi.taminidi@whitepine-st.com> wrote:
> >
> >> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
> >different
> >> servers (linux), Is there any way can we make the Core to be 
> >> located
> >in NAS
> >> or Network shared Drive so both the solrs using the same Index.
> >>
> >> Let me know if any performance issues, our size of Index is appx 1GB.
> >>
> >> Thanks
> >>
> >> Ravi
> >>
> >> -----Original Message-----
> >> From: biplobbiswas [mailto:revolutionisme+solr@gmail.com]
> >> Sent: Friday, May 19, 2017 9:23 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Nested Document is flattened even with @Field(child =
> >true)
> >> annotation
> >>
> >> Hi
> >> Mikhail Khludnev-2 wrote
> >> > Hello,
> >> >
> >> > You need to use
> >> >
> >https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherP
> >a
> >> > rsers-BlockJoinQueryParsers
> >> > and
> >> >
> >https://cwiki.apache.org/confluence/display/solr/Transforming+Result+
> >D
> >> >
> >ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFacto
> >r
> >> > y
> >> > to get the nested data back.
> >> >
> >> >
> >> > --
> >> > Sincerely yours
> >> > Mikhail Khludnev
> >>
> >> I had already gone through those links you posted and they talk 
> >> about retrieving after indexing. My problem is that my documents 
> >> are not
> >indexed
> >> in a nested structure.
> >>
> >> Can you please look at the first comment as well where I posted a
> >sample
> >> code and sample response which i get back.
> >>
> >> Because its creating distinct documents for nested structure
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://lucene.472066.n3.
> >> nabble.com/Nested-Document-is-flattened-even-with-Field-
> >> child-true-annotation-tp4335877p4335891.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

The reason for me to want to try it is because replication is not possible
on the single machine, as the index size is around 350gb+another 400gb, and
i dont have enough SSD to cover a replication from the master node.  Also i
have a theory and heard this as well from a presentation at the LR
conference in Boston this past year, that multiple solr instances on one
machine performs better than multiple machines, would be interesting to
have solr have a "read only"/"listen" state to do no writing to the index,
but keep referencing the index properties/version files.

On Fri, May 19, 2017 at 1:26 PM, Davis, Daniel (NIH/NLM) [C] <
daniel.davis@nih.gov> wrote:

> Better off to just do Replication to the slave using the replication
> handler.
>
> However, if there  is no network connectivity, e.g. this is an offsite
> cold/warm spare, then here is a solution:
>
> The NAS likely supports some Copy-on-write/snapshotting capabilities.   If
> your systems people will work with you, you can use the replication/backup
> handler to take a NAS snapshot just after hard commit, and then have the
> snapshot replicated to another volume.   I suspect Solr will have to be
> started on the cold/warm spare when you do a failover to offsite, because I
> know of no way to have the OS react to events when a snapshot is replicated
> by the NAS.
>
> This kind of solution is what you might see for an Oracle, or any other
> binary ACID database, so you can look at best practices for integrating
> these products with Netapp or EMC Celera for more ideas.
>
> -----Original Message-----
> From: Rick Leir [mailto:rleir@leirtech.com]
> Sent: Friday, May 19, 2017 12:40 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in NAS or Network Shared Drive
>
> For an experiment, mount the NAS filesystem ro (readonly). Is there any
> way to tell Solr not to bother with a lockfile? And what happens if an
> update or add gets requested by mistake, does it take down Solr?
>
> Why not do this all the simple way, and just replicate?
>
> On May 19, 2017 10:41:19 AM EDT, David Hastings <
> hastings.recursive@gmail.com> wrote:
> >ive always wanted to experiment with this, but you have to be very
> >careful that only one of the cores, or neither, can do ANY writes, also
> >if you have a suggester index you need to make sure that each core
> >builds their own independently.  In any case from every thing ive read
> >the general answer is dont do it.  would like to hear other peoples
> >thoughts on this however.
> >
> >On Fri, May 19, 2017 at 10:33 AM, Ravi Kumar Taminidi <
> >ravi.taminidi@whitepine-st.com> wrote:
> >
> >> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
> >different
> >> servers (linux), Is there any way can we make the Core to be located
> >in NAS
> >> or Network shared Drive so both the solrs using the same Index.
> >>
> >> Let me know if any performance issues, our size of Index is appx 1GB.
> >>
> >> Thanks
> >>
> >> Ravi
> >>
> >> -----Original Message-----
> >> From: biplobbiswas [mailto:revolutionisme+solr@gmail.com]
> >> Sent: Friday, May 19, 2017 9:23 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Nested Document is flattened even with @Field(child =
> >true)
> >> annotation
> >>
> >> Hi
> >> Mikhail Khludnev-2 wrote
> >> > Hello,
> >> >
> >> > You need to use
> >> >
> >https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherPa
> >> > rsers-BlockJoinQueryParsers
> >> > and
> >> >
> >https://cwiki.apache.org/confluence/display/solr/Transforming+Result+D
> >> >
> >ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFactor
> >> > y
> >> > to get the nested data back.
> >> >
> >> >
> >> > --
> >> > Sincerely yours
> >> > Mikhail Khludnev
> >>
> >> I had already gone through those links you posted and they talk about
> >> retrieving after indexing. My problem is that my documents are not
> >indexed
> >> in a nested structure.
> >>
> >> Can you please look at the first comment as well where I posted a
> >sample
> >> code and sample response which i get back.
> >>
> >> Because its creating distinct documents for nested structure
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://lucene.472066.n3.
> >> nabble.com/Nested-Document-is-flattened-even-with-Field-
> >> child-true-annotation-tp4335877p4335891.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>

RE: Solr in NAS or Network Shared Drive

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.

Better off to just do Replication to the slave using the replication handler.

However, if there  is no network connectivity, e.g. this is an offsite cold/warm spare, then here is a solution:

The NAS likely supports some Copy-on-write/snapshotting capabilities.   If your systems people will work with you, you can use the replication/backup handler to take a NAS snapshot just after hard commit, and then have the snapshot replicated to another volume.   I suspect Solr will have to be started on the cold/warm spare when you do a failover to offsite, because I know of no way to have the OS react to events when a snapshot is replicated by the NAS.

This kind of solution is what you might see for an Oracle, or any other binary ACID database, so you can look at best practices for integrating these products with Netapp or EMC Celera for more ideas.

-----Original Message-----
From: Rick Leir [mailto:rleir@leirtech.com] 
Sent: Friday, May 19, 2017 12:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr in NAS or Network Shared Drive

For an experiment, mount the NAS filesystem ro (readonly). Is there any way to tell Solr not to bother with a lockfile? And what happens if an update or add gets requested by mistake, does it take down Solr?

Why not do this all the simple way, and just replicate?

On May 19, 2017 10:41:19 AM EDT, David Hastings <ha...@gmail.com> wrote:
>ive always wanted to experiment with this, but you have to be very 
>careful that only one of the cores, or neither, can do ANY writes, also 
>if you have a suggester index you need to make sure that each core 
>builds their own independently.  In any case from every thing ive read 
>the general answer is dont do it.  would like to hear other peoples 
>thoughts on this however.
>
>On Fri, May 19, 2017 at 10:33 AM, Ravi Kumar Taminidi < 
>ravi.taminidi@whitepine-st.com> wrote:
>
>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>different
>> servers (linux), Is there any way can we make the Core to be located
>in NAS
>> or Network shared Drive so both the solrs using the same Index.
>>
>> Let me know if any performance issues, our size of Index is appx 1GB.
>>
>> Thanks
>>
>> Ravi
>>
>> -----Original Message-----
>> From: biplobbiswas [mailto:revolutionisme+solr@gmail.com]
>> Sent: Friday, May 19, 2017 9:23 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Nested Document is flattened even with @Field(child =
>true)
>> annotation
>>
>> Hi
>> Mikhail Khludnev-2 wrote
>> > Hello,
>> >
>> > You need to use
>> >
>https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherPa
>> > rsers-BlockJoinQueryParsers
>> > and
>> >
>https://cwiki.apache.org/confluence/display/solr/Transforming+Result+D
>> >
>ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFactor
>> > y
>> > to get the nested data back.
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>>
>> I had already gone through those links you posted and they talk about 
>> retrieving after indexing. My problem is that my documents are not
>indexed
>> in a nested structure.
>>
>> Can you please look at the first comment as well where I posted a
>sample
>> code and sample response which i get back.
>>
>> Because its creating distinct documents for nested structure
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Nested-Document-is-flattened-even-with-Field-
>> child-true-annotation-tp4335877p4335891.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr in NAS or Network Shared Drive

Posted by Rick Leir <rl...@leirtech.com>.

For an experiment, mount the NAS filesystem ro (readonly). Is there any way to tell Solr not to bother with a lockfile? And what happens if an update or add gets requested by mistake, does it take down Solr?

Why not do this all the simple way, and just replicate?

On May 19, 2017 10:41:19 AM EDT, David Hastings <ha...@gmail.com> wrote:
>ive always wanted to experiment with this, but you have to be very
>careful
>that only one of the cores, or neither, can do ANY writes, also if you
>have
>a suggester index you need to make sure that each core builds their own
>independently.  In any case from every thing ive read the general
>answer is
>dont do it.  would like to hear other peoples thoughts on this however.
>
>On Fri, May 19, 2017 at 10:33 AM, Ravi Kumar Taminidi <
>ravi.taminidi@whitepine-st.com> wrote:
>
>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>different
>> servers (linux), Is there any way can we make the Core to be located
>in NAS
>> or Network shared Drive so both the solrs using the same Index.
>>
>> Let me know if any performance issues, our size of Index is appx 1GB.
>>
>> Thanks
>>
>> Ravi
>>
>> -----Original Message-----
>> From: biplobbiswas [mailto:revolutionisme+solr@gmail.com]
>> Sent: Friday, May 19, 2017 9:23 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Nested Document is flattened even with @Field(child =
>true)
>> annotation
>>
>> Hi
>> Mikhail Khludnev-2 wrote
>> > Hello,
>> >
>> > You need to use
>> >
>https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherPa
>> > rsers-BlockJoinQueryParsers
>> > and
>> >
>https://cwiki.apache.org/confluence/display/solr/Transforming+Result+D
>> >
>ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFactor
>> > y
>> > to get the nested data back.
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>>
>> I had already gone through those links you posted and they talk about
>> retrieving after indexing. My problem is that my documents are not
>indexed
>> in a nested structure.
>>
>> Can you please look at the first comment as well where I posted a
>sample
>> code and sample response which i get back.
>>
>> Because its creating distinct documents for nested structure
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Nested-Document-is-flattened-even-with-Field-
>> child-true-annotation-tp4335877p4335891.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

ive always wanted to experiment with this, but you have to be very careful
that only one of the cores, or neither, can do ANY writes, also if you have
a suggester index you need to make sure that each core builds their own
independently.  In any case from every thing ive read the general answer is
dont do it.  would like to hear other peoples thoughts on this however.

On Fri, May 19, 2017 at 10:33 AM, Ravi Kumar Taminidi <
ravi.taminidi@whitepine-st.com> wrote:

> Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different
> servers (linux), Is there any way can we make the Core to be located in NAS
> or Network shared Drive so both the solrs using the same Index.
>
> Let me know if any performance issues, our size of Index is appx 1GB.
>
> Thanks
>
> Ravi
>
> -----Original Message-----
> From: biplobbiswas [mailto:revolutionisme+solr@gmail.com]
> Sent: Friday, May 19, 2017 9:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Nested Document is flattened even with @Field(child = true)
> annotation
>
> Hi
> Mikhail Khludnev-2 wrote
> > Hello,
> >
> > You need to use
> > https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherPa
> > rsers-BlockJoinQueryParsers
> > and
> > https://cwiki.apache.org/confluence/display/solr/Transforming+Result+D
> > ocuments#TransformingResultDocuments-[child]-ChildDocTransformerFactor
> > y
> > to get the nested data back.
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
> I had already gone through those links you posted and they talk about
> retrieving after indexing. My problem is that my documents are not indexed
> in a nested structure.
>
> Can you please look at the first comment as well where I posted a sample
> code and sample response which i get back.
>
> Because its creating distinct documents for nested structure
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Nested-Document-is-flattened-even-with-Field-
> child-true-annotation-tp4335877p4335891.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr in NAS or Network Shared Drive

Posted by Florian Gleixner <fl...@redflo.de>.

On 19.05.2017 16:33, Ravi Kumar Taminidi wrote:
> Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different servers (linux), Is there any way can we make the Core to be located in NAS or Network shared Drive so both the solrs using the same Index.
> 
> Let me know if any performance issues, our size of Index is appx 1GB.
> 
> Thanks
> 
> Ravi
> 

The operating system can cache a local filesystem for a infinitely long
time, because no one else is allowed to change the data. With network
filesystems, the operating system can not be sure, that the data have
not been altered by another one. So usually caches on network
filesystems are frequently invalidated.

I think you loose the caching from the OS - memory speed vs. network
filesystem speed! Not sure if mmap helps here ....

Re: Solr in NAS or Network Shared Drive

Posted by Erick Erickson <er...@gmail.com>.

Bob:

I'd guess you had to fiddle with lock factories and the like, although
you say that master/slave wasn't even available when you put this
system together so I don't even remember what was available "way back
when" ;).

If it ain't broke, don't fix it applies. That said, if I were redoing
the system (or even upgrading) I'd strongly consider either
master/slave or SolrCloud if for no other reason than you'd have to
re-figure how to signal the search boxes that the index had
changed....

Do note that there has been some talk of "read only replicas", see:
SOLR-6237 but that hasn't been committed and I don't know how much
love that issue will get.

Best,
Erick

On Fri, May 26, 2017 at 9:09 AM, Robert Haschart <rh...@virginia.edu> wrote:
> We have run using this exact scenario for several years.   We have three
> Solr servers sitting behind a load balancer, with all three accessing the
> same Solr index stored on read-only network addressable storage.   A fourth
> machine is used to update the index (typically daily) and signal the three
> Solr servers when the updated index is available.   Our index is primarily
> bibliographic information and it contains about 8 million documents and is
> about 30GB in size.    We've used this configuration since before Zookeeper
> and Cloud-based Solr or even java-based master slave replication were
> available.   I cannot say whether this configuration has any benefits over
> the current accepted way of load-balancing, but it has worked well for us
> for several years and we've never had a corrupted index problem.
>
>
> -Bob Haschart
> University of Virginia Library
>
>
>
> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>>
>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>>
>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different
>>> servers (linux), Is there any way can we make the Core to be located in NAS
>>> or Network shared Drive so both the solrs using the same Index.
>>>
>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>
>> I think it's a very bad idea to try to share indexes between multiple
>> Solr instances.  You can override the locking and get it to work, and
>> you may be able to find advice on the Internet about how to do it.  I
>> can tell you that it's outside the design intent for both Lucene and
>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>> sharing an index.
>>
>> In general, network storage is not a good idea for Solr.  There's added
>> latency for accessing any data, and frequently the filesystem won't
>> support the kind of locking that Lucene wants to use, but the biggest
>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>> disk caching in the SOlr server's local memory for good performance.  If
>> the network filesystem cannot be cached by the client that has mounted
>> the storage, which I believe is the case for most network filesystem
>> types, then you're reliant on disk caching in the network server(s).
>> For VERY large indexes, which is really the only viable use case I can
>> imagine for network storage, it is highly unlikely that the network
>> server(s) will have enough memory to effectively cache the data.
>>
>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>> includes the ability for a client to allocate memory that gets used
>> exclusively for caching on the client side, which allows HDFS to
>> function like a local filesystem in ways that I don't think NFS can.
>> Getting back to my advice about not sharing indexes -- even with
>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>
>> A 1GB index is very small, so there's no good reason I can think of to
>> involve network storage.  I would strongly recommend local storage, and
>> you should abandon any attempt to share the same index data between more
>> than one Solr instance.
>>
>> Thanks,
>> Shawn
>>
>

Re: Solr in NAS or Network Shared Drive

Posted by Walter Underwood <wu...@wunderwood.org>.

Pretty sure that master/slave was in Solr 1.2. That was very nearly ten years ago.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 26, 2017, at 9:52 AM, David Hastings <ha...@gmail.com> wrote:
> 
> Im curious about this.  when you say "and signal the three Solr servers
> when the updated index is available.  " how does it send the signal? IE
> what command, just a reload?  Also what prevents them from doing a merge on
> their own?  Thanks
> 
> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
> wrote:
> 
>> We have run using this exact scenario for several years.   We have three
>> Solr servers sitting behind a load balancer, with all three accessing the
>> same Solr index stored on read-only network addressable storage.   A fourth
>> machine is used to update the index (typically daily) and signal the three
>> Solr servers when the updated index is available.   Our index is primarily
>> bibliographic information and it contains about 8 million documents and is
>> about 30GB in size.    We've used this configuration since before Zookeeper
>> and Cloud-based Solr or even java-based master slave replication were
>> available.   I cannot say whether this configuration has any benefits over
>> the current accepted way of load-balancing, but it has worked well for us
>> for several years and we've never had a corrupted index problem.
>> 
>> 
>> -Bob Haschart
>> University of Virginia Library
>> 
>> 
>> 
>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>> 
>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>> 
>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>> different servers (linux), Is there any way can we make the Core to be
>>>> located in NAS or Network shared Drive so both the solrs using the same
>>>> Index.
>>>> 
>>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>> 
>>> I think it's a very bad idea to try to share indexes between multiple
>>> Solr instances.  You can override the locking and get it to work, and
>>> you may be able to find advice on the Internet about how to do it.  I
>>> can tell you that it's outside the design intent for both Lucene and
>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>> sharing an index.
>>> 
>>> In general, network storage is not a good idea for Solr.  There's added
>>> latency for accessing any data, and frequently the filesystem won't
>>> support the kind of locking that Lucene wants to use, but the biggest
>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>>> disk caching in the SOlr server's local memory for good performance.  If
>>> the network filesystem cannot be cached by the client that has mounted
>>> the storage, which I believe is the case for most network filesystem
>>> types, then you're reliant on disk caching in the network server(s).
>>> For VERY large indexes, which is really the only viable use case I can
>>> imagine for network storage, it is highly unlikely that the network
>>> server(s) will have enough memory to effectively cache the data.
>>> 
>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>> includes the ability for a client to allocate memory that gets used
>>> exclusively for caching on the client side, which allows HDFS to
>>> function like a local filesystem in ways that I don't think NFS can.
>>> Getting back to my advice about not sharing indexes -- even with
>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>> 
>>> A 1GB index is very small, so there's no good reason I can think of to
>>> involve network storage.  I would strongly recommend local storage, and
>>> you should abandon any attempt to share the same index data between more
>>> than one Solr instance.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>

Re: Solr in NAS or Network Shared Drive

Posted by Dave <ha...@gmail.com>.

This could be useful in a space expensive situation, although the reason I wanted to try it is multiple solr instances in one server reading one index on the ssd. This use case where on the nfs still leads to a single point of failure situation on one of the most fragile parts of a server, the disk, on one machine. So if the nfs master gets corrupt then all clients are dead rather than the slaves all having their own copy of the index. 





> On May 26, 2017, at 5:37 PM, Florian Gleixner <fl...@redflo.de> wrote:
> 
> 
> Just tested: if file metadata (last change time, access permissions ...)
> on NFS storage change, then all NFS clients invalidate the memory cache
> of the file completely.
> So, if your index does not get changed, caching is good on readonly
> slaves - the NFS client queries only file metadata sometimes.
> But if yout index changes, all affected files have to be read again from
> NFS. You can try this by "touching" the files.
> 
> fincore from linux ftools can be used to view the file caching status.
> 
> "touching" a file on a local mount does not invalidate the memory cache.
> The kernel knows, that no file data have been changed.
> 
> 
>> On 26.05.2017 19:53, Robert Haschart wrote:
>> 
>> The individual servers cannot do a merge on their own, since they mount
>> the NAS read-only.   Nothing they can do will affect the index.  I
>> believe this allows each machine to cache much of the index in memory,
>> with no fear that their cache will be made invalid by one of the others.
>> 
>> -Bob Haschart
>> University of Virginia Library
>> 
> 
> 
>

Re: Solr in NAS or Network Shared Drive

Posted by Florian Gleixner <fl...@redflo.de>.

Just tested: if file metadata (last change time, access permissions ...)
on NFS storage change, then all NFS clients invalidate the memory cache
of the file completely.
So, if your index does not get changed, caching is good on readonly
slaves - the NFS client queries only file metadata sometimes.
But if yout index changes, all affected files have to be read again from
NFS. You can try this by "touching" the files.

fincore from linux ftools can be used to view the file caching status.

"touching" a file on a local mount does not invalidate the memory cache.
The kernel knows, that no file data have been changed.

On 26.05.2017 19:53, Robert Haschart wrote:

> The individual servers cannot do a merge on their own, since they mount
> the NAS read-only.   Nothing they can do will affect the index.  I
> believe this allows each machine to cache much of the index in memory,
> with no fear that their cache will be made invalid by one of the others.
> 
> -Bob Haschart
> University of Virginia Library
>

Re: Solr in NAS or Network Shared Drive

Posted by Robert Haschart <rh...@virginia.edu>.

When the indexing solr instance finishes, it fast-copies the newly built 
core to a new directory on the network storage, and then does the 
CREATE, SWAP, UNLOAD messages.
Just before starting this message, I needed to update some records and 
re-deploy to production, the process took less time then it took me to 
write this message.

-Bob Haschart
University of Virginia Library

On 5/26/2017 2:11 PM, David Hastings wrote:
> so are "core" and "corebak" pointing to the same datadir or do you have the
> indexing solr instance keep writing to a new directory?
>
> On Fri, May 26, 2017 at 1:53 PM, Robert Haschart <rh...@virginia.edu> wrote:
>
>> The process we use to signal the read-only servers, is to submit a CREATE
>> request pointing to the newly created index, with a name like corebak, then
>> doing a SWAP request between core and corebak, then submit an UNLOAD
>> request for the corebak which is now pointing at the previous version.
>>
>> The individual servers cannot do a merge on their own, since they mount
>> the NAS read-only.   Nothing they can do will affect the index.  I believe
>> this allows each machine to cache much of the index in memory, with no fear
>> that their cache will be made invalid by one of the others.
>>
>> -Bob Haschart
>> University of Virginia Library
>>
>>
>>
>> On 5/26/2017 12:52 PM, David Hastings wrote:
>>
>>> Im curious about this.  when you say "and signal the three Solr servers
>>> when the updated index is available.  " how does it send the signal? IE
>>> what command, just a reload?  Also what prevents them from doing a merge
>>> on
>>> their own?  Thanks
>>>
>>> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
>>> wrote:
>>>
>>> We have run using this exact scenario for several years.   We have three
>>>> Solr servers sitting behind a load balancer, with all three accessing the
>>>> same Solr index stored on read-only network addressable storage.   A
>>>> fourth
>>>> machine is used to update the index (typically daily) and signal the
>>>> three
>>>> Solr servers when the updated index is available.   Our index is
>>>> primarily
>>>> bibliographic information and it contains about 8 million documents and
>>>> is
>>>> about 30GB in size.    We've used this configuration since before
>>>> Zookeeper
>>>> and Cloud-based Solr or even java-based master slave replication were
>>>> available.   I cannot say whether this configuration has any benefits
>>>> over
>>>> the current accepted way of load-balancing, but it has worked well for us
>>>> for several years and we've never had a corrupted index problem.
>>>>
>>>>
>>>> -Bob Haschart
>>>> University of Virginia Library
>>>>
>>>>
>>>>
>>>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>>>>
>>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>>>> different servers (linux), Is there any way can we make the Core to be
>>>>>> located in NAS or Network shared Drive so both the solrs using the same
>>>>>> Index.
>>>>>>
>>>>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>>>>
>>>>>> I think it's a very bad idea to try to share indexes between multiple
>>>>> Solr instances.  You can override the locking and get it to work, and
>>>>> you may be able to find advice on the Internet about how to do it.  I
>>>>> can tell you that it's outside the design intent for both Lucene and
>>>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>>>> sharing an index.
>>>>>
>>>>> In general, network storage is not a good idea for Solr.  There's added
>>>>> latency for accessing any data, and frequently the filesystem won't
>>>>> support the kind of locking that Lucene wants to use, but the biggest
>>>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>>>>> disk caching in the SOlr server's local memory for good performance.  If
>>>>> the network filesystem cannot be cached by the client that has mounted
>>>>> the storage, which I believe is the case for most network filesystem
>>>>> types, then you're reliant on disk caching in the network server(s).
>>>>> For VERY large indexes, which is really the only viable use case I can
>>>>> imagine for network storage, it is highly unlikely that the network
>>>>> server(s) will have enough memory to effectively cache the data.
>>>>>
>>>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>>>> includes the ability for a client to allocate memory that gets used
>>>>> exclusively for caching on the client side, which allows HDFS to
>>>>> function like a local filesystem in ways that I don't think NFS can.
>>>>> Getting back to my advice about not sharing indexes -- even with
>>>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>>>>
>>>>> A 1GB index is very small, so there's no good reason I can think of to
>>>>> involve network storage.  I would strongly recommend local storage, and
>>>>> you should abandon any attempt to share the same index data between more
>>>>> than one Solr instance.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>>>
>>>>>

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

so are "core" and "corebak" pointing to the same datadir or do you have the
indexing solr instance keep writing to a new directory?

On Fri, May 26, 2017 at 1:53 PM, Robert Haschart <rh...@virginia.edu> wrote:

> The process we use to signal the read-only servers, is to submit a CREATE
> request pointing to the newly created index, with a name like corebak, then
> doing a SWAP request between core and corebak, then submit an UNLOAD
> request for the corebak which is now pointing at the previous version.
>
> The individual servers cannot do a merge on their own, since they mount
> the NAS read-only.   Nothing they can do will affect the index.  I believe
> this allows each machine to cache much of the index in memory, with no fear
> that their cache will be made invalid by one of the others.
>
> -Bob Haschart
> University of Virginia Library
>
>
>
> On 5/26/2017 12:52 PM, David Hastings wrote:
>
>> Im curious about this.  when you say "and signal the three Solr servers
>> when the updated index is available.  " how does it send the signal? IE
>> what command, just a reload?  Also what prevents them from doing a merge
>> on
>> their own?  Thanks
>>
>> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
>> wrote:
>>
>> We have run using this exact scenario for several years.   We have three
>>> Solr servers sitting behind a load balancer, with all three accessing the
>>> same Solr index stored on read-only network addressable storage.   A
>>> fourth
>>> machine is used to update the index (typically daily) and signal the
>>> three
>>> Solr servers when the updated index is available.   Our index is
>>> primarily
>>> bibliographic information and it contains about 8 million documents and
>>> is
>>> about 30GB in size.    We've used this configuration since before
>>> Zookeeper
>>> and Cloud-based Solr or even java-based master slave replication were
>>> available.   I cannot say whether this configuration has any benefits
>>> over
>>> the current accepted way of load-balancing, but it has worked well for us
>>> for several years and we've never had a corrupted index problem.
>>>
>>>
>>> -Bob Haschart
>>> University of Virginia Library
>>>
>>>
>>>
>>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>>>
>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>>>
>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>>> different servers (linux), Is there any way can we make the Core to be
>>>>> located in NAS or Network shared Drive so both the solrs using the same
>>>>> Index.
>>>>>
>>>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>>>
>>>>> I think it's a very bad idea to try to share indexes between multiple
>>>> Solr instances.  You can override the locking and get it to work, and
>>>> you may be able to find advice on the Internet about how to do it.  I
>>>> can tell you that it's outside the design intent for both Lucene and
>>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>>> sharing an index.
>>>>
>>>> In general, network storage is not a good idea for Solr.  There's added
>>>> latency for accessing any data, and frequently the filesystem won't
>>>> support the kind of locking that Lucene wants to use, but the biggest
>>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>>>> disk caching in the SOlr server's local memory for good performance.  If
>>>> the network filesystem cannot be cached by the client that has mounted
>>>> the storage, which I believe is the case for most network filesystem
>>>> types, then you're reliant on disk caching in the network server(s).
>>>> For VERY large indexes, which is really the only viable use case I can
>>>> imagine for network storage, it is highly unlikely that the network
>>>> server(s) will have enough memory to effectively cache the data.
>>>>
>>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>>> includes the ability for a client to allocate memory that gets used
>>>> exclusively for caching on the client side, which allows HDFS to
>>>> function like a local filesystem in ways that I don't think NFS can.
>>>> Getting back to my advice about not sharing indexes -- even with
>>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>>>
>>>> A 1GB index is very small, so there's no good reason I can think of to
>>>> involve network storage.  I would strongly recommend local storage, and
>>>> you should abandon any attempt to share the same index data between more
>>>> than one Solr instance.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>>
>

Re: Solr in NAS or Network Shared Drive

Posted by Robert Haschart <rh...@virginia.edu>.

The process we use to signal the read-only servers, is to submit a 
CREATE request pointing to the newly created index, with a name like 
corebak, then doing a SWAP request between core and corebak, then submit 
an UNLOAD request for the corebak which is now pointing at the previous 
version.

The individual servers cannot do a merge on their own, since they mount 
the NAS read-only.   Nothing they can do will affect the index.  I 
believe this allows each machine to cache much of the index in memory, 
with no fear that their cache will be made invalid by one of the others.

-Bob Haschart
University of Virginia Library



On 5/26/2017 12:52 PM, David Hastings wrote:
> Im curious about this.  when you say "and signal the three Solr servers
> when the updated index is available.  " how does it send the signal? IE
> what command, just a reload?  Also what prevents them from doing a merge on
> their own?  Thanks
>
> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
> wrote:
>
>> We have run using this exact scenario for several years.   We have three
>> Solr servers sitting behind a load balancer, with all three accessing the
>> same Solr index stored on read-only network addressable storage.   A fourth
>> machine is used to update the index (typically daily) and signal the three
>> Solr servers when the updated index is available.   Our index is primarily
>> bibliographic information and it contains about 8 million documents and is
>> about 30GB in size.    We've used this configuration since before Zookeeper
>> and Cloud-based Solr or even java-based master slave replication were
>> available.   I cannot say whether this configuration has any benefits over
>> the current accepted way of load-balancing, but it has worked well for us
>> for several years and we've never had a corrupted index problem.
>>
>>
>> -Bob Haschart
>> University of Virginia Library
>>
>>
>>
>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>>
>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>>
>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>> different servers (linux), Is there any way can we make the Core to be
>>>> located in NAS or Network shared Drive so both the solrs using the same
>>>> Index.
>>>>
>>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>>
>>> I think it's a very bad idea to try to share indexes between multiple
>>> Solr instances.  You can override the locking and get it to work, and
>>> you may be able to find advice on the Internet about how to do it.  I
>>> can tell you that it's outside the design intent for both Lucene and
>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>> sharing an index.
>>>
>>> In general, network storage is not a good idea for Solr.  There's added
>>> latency for accessing any data, and frequently the filesystem won't
>>> support the kind of locking that Lucene wants to use, but the biggest
>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>>> disk caching in the SOlr server's local memory for good performance.  If
>>> the network filesystem cannot be cached by the client that has mounted
>>> the storage, which I believe is the case for most network filesystem
>>> types, then you're reliant on disk caching in the network server(s).
>>> For VERY large indexes, which is really the only viable use case I can
>>> imagine for network storage, it is highly unlikely that the network
>>> server(s) will have enough memory to effectively cache the data.
>>>
>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>> includes the ability for a client to allocate memory that gets used
>>> exclusively for caching on the client side, which allows HDFS to
>>> function like a local filesystem in ways that I don't think NFS can.
>>> Getting back to my advice about not sharing indexes -- even with
>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>>
>>> A 1GB index is very small, so there's no good reason I can think of to
>>> involve network storage.  I would strongly recommend local storage, and
>>> you should abandon any attempt to share the same index data between more
>>> than one Solr instance.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>

Re: Solr in NAS or Network Shared Drive

Posted by David Hastings <ha...@gmail.com>.

Im curious about this.  when you say "and signal the three Solr servers
when the updated index is available.  " how does it send the signal? IE
what command, just a reload?  Also what prevents them from doing a merge on
their own?  Thanks

On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh...@virginia.edu>
wrote:

> We have run using this exact scenario for several years.   We have three
> Solr servers sitting behind a load balancer, with all three accessing the
> same Solr index stored on read-only network addressable storage.   A fourth
> machine is used to update the index (typically daily) and signal the three
> Solr servers when the updated index is available.   Our index is primarily
> bibliographic information and it contains about 8 million documents and is
> about 30GB in size.    We've used this configuration since before Zookeeper
> and Cloud-based Solr or even java-based master slave replication were
> available.   I cannot say whether this configuration has any benefits over
> the current accepted way of load-balancing, but it has worked well for us
> for several years and we've never had a corrupted index problem.
>
>
> -Bob Haschart
> University of Virginia Library
>
>
>
> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>
>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>
>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>> different servers (linux), Is there any way can we make the Core to be
>>> located in NAS or Network shared Drive so both the solrs using the same
>>> Index.
>>>
>>> Let me know if any performance issues, our size of Index is appx 1GB.
>>>
>> I think it's a very bad idea to try to share indexes between multiple
>> Solr instances.  You can override the locking and get it to work, and
>> you may be able to find advice on the Internet about how to do it.  I
>> can tell you that it's outside the design intent for both Lucene and
>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>> sharing an index.
>>
>> In general, network storage is not a good idea for Solr.  There's added
>> latency for accessing any data, and frequently the filesystem won't
>> support the kind of locking that Lucene wants to use, but the biggest
>> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
>> disk caching in the SOlr server's local memory for good performance.  If
>> the network filesystem cannot be cached by the client that has mounted
>> the storage, which I believe is the case for most network filesystem
>> types, then you're reliant on disk caching in the network server(s).
>> For VERY large indexes, which is really the only viable use case I can
>> imagine for network storage, it is highly unlikely that the network
>> server(s) will have enough memory to effectively cache the data.
>>
>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>> includes the ability for a client to allocate memory that gets used
>> exclusively for caching on the client side, which allows HDFS to
>> function like a local filesystem in ways that I don't think NFS can.
>> Getting back to my advice about not sharing indexes -- even with
>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>
>> A 1GB index is very small, so there's no good reason I can think of to
>> involve network storage.  I would strongly recommend local storage, and
>> you should abandon any attempt to share the same index data between more
>> than one Solr instance.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: Solr in NAS or Network Shared Drive

Posted by Robert Haschart <rh...@virginia.edu>.

We have run using this exact scenario for several years.   We have three 
Solr servers sitting behind a load balancer, with all three accessing 
the same Solr index stored on read-only network addressable storage.   A 
fourth machine is used to update the index (typically daily) and signal 
the three Solr servers when the updated index is available.   Our index 
is primarily bibliographic information and it contains about 8 million 
documents and is about 30GB in size.    We've used this configuration 
since before Zookeeper and Cloud-based Solr or even java-based master 
slave replication were available.   I cannot say whether this 
configuration has any benefits over the current accepted way of 
load-balancing, but it has worked well for us for several years and 
we've never had a corrupted index problem.


-Bob Haschart
University of Virginia Library



On 5/23/2017 10:05 PM, Shawn Heisey wrote:
> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2 different servers (linux), Is there any way can we make the Core to be located in NAS or Network shared Drive so both the solrs using the same Index.
>>
>> Let me know if any performance issues, our size of Index is appx 1GB.
> I think it's a very bad idea to try to share indexes between multiple
> Solr instances.  You can override the locking and get it to work, and
> you may be able to find advice on the Internet about how to do it.  I
> can tell you that it's outside the design intent for both Lucene and
> Solr.  Lucene works aggressively to *prevent* multiple processes from
> sharing an index.
>
> In general, network storage is not a good idea for Solr.  There's added
> latency for accessing any data, and frequently the filesystem won't
> support the kind of locking that Lucene wants to use, but the biggest
> potential problem is disk caching.  Solr/Lucene is absolutely reliant on
> disk caching in the SOlr server's local memory for good performance.  If
> the network filesystem cannot be cached by the client that has mounted
> the storage, which I believe is the case for most network filesystem
> types, then you're reliant on disk caching in the network server(s).
> For VERY large indexes, which is really the only viable use case I can
> imagine for network storage, it is highly unlikely that the network
> server(s) will have enough memory to effectively cache the data.
>
> Solr has explicit support for HDFS storage, but as I understand it, HDFS
> includes the ability for a client to allocate memory that gets used
> exclusively for caching on the client side, which allows HDFS to
> function like a local filesystem in ways that I don't think NFS can.
> Getting back to my advice about not sharing indexes -- even with
> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>
> A 1GB index is very small, so there's no good reason I can think of to
> involve network storage.  I would strongly recommend local storage, and
> you should abandon any attempt to share the same index data between more
> than one Solr instance.
>
> Thanks,
> Shawn
>

Re: Solr in NAS or Network Shared Drive

Posted by Shawn Heisey <ap...@elyograg.org>.

On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
> Hello, Scenario: Currently we have 2 Solr Servers running in 2 different servers (linux), Is there any way can we make the Core to be located in NAS or Network shared Drive so both the solrs using the same Index.
>
> Let me know if any performance issues, our size of Index is appx 1GB.

I think it's a very bad idea to try to share indexes between multiple
Solr instances. You can override the locking and get it to work, and
you may be able to find advice on the Internet about how to do it. I
can tell you that it's outside the design intent for both Lucene and
Solr. Lucene works aggressively to *prevent* multiple processes from
sharing an index.

In general, network storage is not a good idea for Solr. There's added
latency for accessing any data, and frequently the filesystem won't
support the kind of locking that Lucene wants to use, but the biggest
potential problem is disk caching. Solr/Lucene is absolutely reliant on
disk caching in the SOlr server's local memory for good performance. If
the network filesystem cannot be cached by the client that has mounted
the storage, which I believe is the case for most network filesystem
types, then you're reliant on disk caching in the network server(s).
For VERY large indexes, which is really the only viable use case I can
imagine for network storage, it is highly unlikely that the network
server(s) will have enough memory to effectively cache the data.

Solr has explicit support for HDFS storage, but as I understand it, HDFS
includes the ability for a client to allocate memory that gets used
exclusively for caching on the client side, which allows HDFS to
function like a local filesystem in ways that I don't think NFS can.
Getting back to my advice about not sharing indexes -- even with
SolrCloud on HDFS, multiple replicas generally do NOT share an index.

A 1GB index is very small, so there's no good reason I can think of to
involve network storage. I would strongly recommend local storage, and
you should abandon any attempt to share the same index data between more
than one Solr instance.

Thanks,
Shawn