You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by gnandre <ar...@gmail.com> on 2022/09/23 15:51:39 UTC

Atomic indexing as default indexing

Is there a way to make atomic indexing default?

Say, even if some clients send non-atomic indexing requests, it should get
converted to atomic indexing requests on Solr end, is that possible?

I am asking because we usually run into the following issue:
1. Client A is the major contributor of almost all the fields of  a Solr
document. This is non-atomic indexing.
2. Client B contributes some additional fields to the same document and
does this with atomic indexing.
3. If Client A indexes again, the fields populated by Client B are wiped
out.

If we make all indexing atomic indexing on Solr end then we won't run into
this problem (except in a rare case where Client A deletes the document
then indexes it back, this is fine and we can deal with it because it is
rare)

Re: Exception with embedded Solr (was: Re: Atomic indexing as default indexing)

Posted by Shawn Heisey <el...@elyograg.org.INVALID>.

On 9/23/22 15:08, Shawn Heisey wrote:
> have removed the email headers that would bury this message inside a 
> thread that has nothing to do with it

I *thought*  had removed those headers.  But the message got buried anyway.

Shawn

Exception with embedded Solr (was: Re: Atomic indexing as default indexing)

Posted by Shawn Heisey <ap...@elyograg.org.INVALID>.

On 9/23/22 12:07, L H wrote:
> Hello dear colleagues,
>
> I was using Embedded solr on JAVA 8 for caching some data - however, I am
> required to update JAVA to version 17.
>
> I can see that core container is not able to access home directory.
>
> Below is the exception I get; could someone please help me to know to fix
> the issue?

I have removed the email headers that would bury this message inside a 
thread that has nothing to do with it, which is where I found your 
message.  You didn't even change the subject.  Please do not reply to an 
existing message unless that message is directly related to what you are 
sending.  Start a brand new message with a new subject for a new topic.

https://www.dropbox.com/s/3avr9o03gpx7rko/solr-user-buried-thread-2022-09.png?dl=0

What version of Solr/SolrJ are you using?  I suspect that you're using a 
version that was not qualified with any Java version later than 8.  You 
might need to upgrade Solr to have it work right with Java 17.  In 
recent years Java has gotten a lot better at not introducing breaking 
changes, but you have just jumped NINE major versions.  Any software is 
likely to change in extreme ways across that many major versions.

The sweet spot for Solr 7 or 8 seems to be Java 11, but these Solr 
versions only require Java 8.  Solr 9.x *requires* Java 11, and it is 
the only version I personally would run with anything newer than Java 
11.  For Solr 6, I would not run anything newer than Java 8.  Solr 7.0 
was the first version that was qualified to run in Java 9, and I recall 
code changes being required to achieve that.

Thanks,
Shawn

Re: Atomic indexing as default indexing

Posted by L H <le...@gmail.com>.

Hello dear colleagues,

I was using Embedded solr on JAVA 8 for caching some data - however, I am
required to update JAVA to version 17.

I can see that core container is not able to access home directory.

Below is the exception I get; could someone please help me to know to fix
the issue?



============================          exception
======================:


Caused by: org.apache.solr.common.SolrException: JVM Error creating core
[invoiceconfig]: null
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:856)
Caused by: org.apache.solr.common.SolrException: JVM Error creating core
[invoiceconfig]: null

at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:494)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:889)
Caused by: java.lang.ExceptionInInitializerError
Caused by: java.lang.ExceptionInInitializerError

at java.base/java.lang.J9VMInternals.ensureError(J9VMInternals.java:185)
at
java.base/java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:174)
at
org.apache.solr.core.MMapDirectoryFactory.init(MMapDirectoryFactory.java:51)
at org.apache.solr.core.SolrCore.initDirectoryFactory(SolrCore.java:528)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:724)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:688)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:838)
... 6 more
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make
public jdk.internal.ref.Cleaner java.nio.DirectByteBuffer.cleaner()
accessible: module java.base does not "opens java.nio" to unnamed module
@f0b0647f
Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make
public jdk.internal.ref.Cleaner java.nio.DirectByteBuffer.cleaner()
accessible: module java.base does not "opens java.nio" to unnamed module
@f0b0647f

at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
at java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
at java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
at
org.apache.lucene.store.MMapDirectory.unmapHackImpl(MMapDirectory.java:345)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:692)
at org.apache.lucene.store.MMapDirectory.<clinit>(MMapDirectory.java:326)
... 11 more

Re: Atomic indexing as default indexing

Posted by Thomas Corthals <th...@klascement.net>.

Op vr 23 sep. 2022 om 18:17 schreef Shawn Heisey
<ap...@elyograg.org.invalid>:

> On 9/23/22 09:51, gnandre wrote:
> > Is there a way to make atomic indexing default?
> >
> > Say, even if some clients send non-atomic indexing requests, it should
> get
> > converted to atomic indexing requests on Solr end, is that possible?
> >
> > I am asking because we usually run into the following issue:
> > 1. Client A is the major contributor of almost all the fields of  a Solr
> > document. This is non-atomic indexing.
> > 2. Client B contributes some additional fields to the same document and
> > does this with atomic indexing.
> > 3. If Client A indexes again, the fields populated by Client B are wiped
> > out.
> >
> > If we make all indexing atomic indexing on Solr end then we won't run
> into
> > this problem (except in a rare case where Client A deletes the document
> > then indexes it back, this is fine and we can deal with it because it is
> > rare)
>
> We would be surprising a LOT of users if we did that.  Right now they
> can simply reindex a document to delete fields that were indexed before
> but shouldn't be there.  If we made atomic indexing the default, we
> would definitely get complaints about the fact that these fields did not
> get removed.
>
> And what about users that have a schema that is not appropriate for
> atomic indexing?  Quite a lot of users, me included, have fields that
> are indexed but not stored and have no docValues.  I can guarantee you
> that if we made atomic indexing the default, that users would assume
> that all their existing fields will be preserved, and that might not be
> the case.
>
> It sounds like what you should do is have client A be aware that a
> document might have changes done after they indexed it, and they should
> do a check to see whether a doc already exists, and if it does, change
> their indexing to atomic.
>
> It is extremely problematic to have one index be built by two different
> entities in this way.  Maybe instead you should have separate indexes
> for each client and use Solr's join capability to combine the info from
> both indexes into one result.  Just be aware that Solr's join capability
> will NOT do everything a relational database expert might expect.
>
> Thanks,
> Shawn
>
>
Client A can use Optimistic Concurrency
<https://solr.apache.org/guide/solr/latest/indexing-guide/partial-document-updates.html#optimistic-concurrency>
to check if a document has been updated by client B.

Use the /get handler from client A to get the _version_ after indexing and
store it locally. Use that _version_ for further updates from client A to
check if the document was changed by client B.

Thomas

Re: Atomic indexing as default indexing

Posted by Shawn Heisey <ap...@elyograg.org.INVALID>.

On 9/23/22 09:51, gnandre wrote:
> Is there a way to make atomic indexing default?
>
> Say, even if some clients send non-atomic indexing requests, it should get
> converted to atomic indexing requests on Solr end, is that possible?
>
> I am asking because we usually run into the following issue:
> 1. Client A is the major contributor of almost all the fields of  a Solr
> document. This is non-atomic indexing.
> 2. Client B contributes some additional fields to the same document and
> does this with atomic indexing.
> 3. If Client A indexes again, the fields populated by Client B are wiped
> out.
>
> If we make all indexing atomic indexing on Solr end then we won't run into
> this problem (except in a rare case where Client A deletes the document
> then indexes it back, this is fine and we can deal with it because it is
> rare)

We would be surprising a LOT of users if we did that.  Right now they 
can simply reindex a document to delete fields that were indexed before 
but shouldn't be there.  If we made atomic indexing the default, we 
would definitely get complaints about the fact that these fields did not 
get removed.

And what about users that have a schema that is not appropriate for 
atomic indexing?  Quite a lot of users, me included, have fields that 
are indexed but not stored and have no docValues.  I can guarantee you 
that if we made atomic indexing the default, that users would assume 
that all their existing fields will be preserved, and that might not be 
the case.

It sounds like what you should do is have client A be aware that a 
document might have changes done after they indexed it, and they should 
do a check to see whether a doc already exists, and if it does, change 
their indexing to atomic.

It is extremely problematic to have one index be built by two different 
entities in this way.  Maybe instead you should have separate indexes 
for each client and use Solr's join capability to combine the info from 
both indexes into one result.  Just be aware that Solr's join capability 
will NOT do everything a relational database expert might expect.

Thanks,
Shawn