You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by S G <sg...@gmail.com> on 2017/06/06 14:26:44 UTC

Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Hi,

We are seeing some very bad performance on our performance test that tries
to load a 2 shard, 3 replica system with about 2000 writes/sec and 2000
reads/sec

The exception stack trace seems to point to a specific line of code and a
similar stack trace is reported by users on Elastic-Search forums too.

Could this be a a common bug in Lucene which is affecting both the systems?
https://issues.apache.org/jira/browse/SOLR-10806

One bad part about Solr is that once it happens, the whole system comes to
a grinding halt.
Solr UI is not accessible, even for the nodes not hosting any collections !
It would be really nice to get rid of such an instability in the system.

Thanks
SG

Re: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Posted by Erick Erickson <er...@gmail.com>.

I do not recommend data_driven_configs to be used for production for
several reasons. First and foremost, the heuristics used simply
_cannot_ optimize your schema for your use case, it has to make a
"best effort". Not to mention that weird data can produce a bazillion
fields without you knowing about it. It'd fine for getting started
without having to deal with the schema, but nobody I know uses it for
production.

I would strongly recommend you take whatever schema has been produced
so far and use it as a basis for a curated schema, turn off
field-guessing (see add-unknown-fields-to-schema) in solrconfig.xml
and use that as a basis for your testing.

note that "managed schema" is used by data_driven_configs, but you can
still use the managed schema _without_ "field guessing". Or switch to
"Classic" schema.

Best,
Erick

On Wed, Jun 7, 2017 at 8:23 AM, S G <sg...@gmail.com> wrote:
> Solr nodes were provisioned just 2 weeks back and it was a brand new Solr
> cluster.
> The nodes always had 6.3 indexes for the past few days but for this very
> test, we had created a brand new collection.
>
> We were using data-driven schema and our theory is that one of the shard
> guessed some field to be as long while the other shard guessed the same
> field to be as integer.
>
> If that is true, then its a pretty bad problem IMO which is difficult to
> reproduce (because each shard should simultaneously guess the type of the
> same field to be different). Also this is a problem that may not show up in
> several test-runs but may show up directly in production because it depends
> on race conditions between the shards.
>
> And it still does not answer why the Solr UI is becoming unresponsive. Why
> is the thread running Solr UI getting blocked due to any low-level problems?
>
>
> On Tue, Jun 6, 2017 at 8:58 AM, Erick Erickson <er...@gmail.com>
> wrote:
>>
>> Uwe just posted a detailed explanation on that jira. Note in particular
>> that you must delete the index from disk to be certain all remnants of the
>> old metadata are gone if you change field definitions or you can get this
>> error. I generally either delete the collection or create a new one when
>> changing the schema.
>>
>> On Jun 6, 2017 8:19 AM, "Varun Thacker" <va...@vthacker.in> wrote:
>>
>> Does this happen on a fresh Solr 6.3 ( as mentioned on SOLR-10806 ) or was
>> the index existing with some other version and then upgraded to 6.3 ?
>>
>> Is the problem reproducible for you?
>>
>>
>> On Tue, Jun 6, 2017 at 7:26 AM, S G <sg...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We are seeing some very bad performance on our performance test that
>>> tries to load a 2 shard, 3 replica system with about 2000 writes/sec and
>>> 2000 reads/sec
>>>
>>> The exception stack trace seems to point to a specific line of code and a
>>> similar stack trace is reported by users on Elastic-Search forums too.
>>>
>>> Could this be a a common bug in Lucene which is affecting both the
>>> systems?
>>> https://issues.apache.org/jira/browse/SOLR-10806
>>>
>>> One bad part about Solr is that once it happens, the whole system comes
>>> to a grinding halt.
>>> Solr UI is not accessible, even for the nodes not hosting any collections
>>> !
>>> It would be really nice to get rid of such an instability in the system.
>>>
>>> Thanks
>>> SG
>>>
>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Posted by S G <sg...@gmail.com>.

Solr nodes were provisioned just 2 weeks back and it was a brand new Solr
cluster.
The nodes always had 6.3 indexes for the past few days but for this very
test, we had created a brand new collection.

We were using data-driven schema and our theory is that one of the shard
guessed some field to be as long while the other shard guessed the same
field to be as integer.

If that is true, then its a pretty bad problem IMO which is difficult to
reproduce (because each shard should simultaneously guess the type of the
same field to be different). Also this is a problem that may not show up in
several test-runs but may show up directly in production because it depends
on race conditions between the shards.

And it still does not answer why the Solr UI is becoming unresponsive. Why
is the thread running Solr UI getting blocked due to any low-level problems?

On Tue, Jun 6, 2017 at 8:58 AM, Erick Erickson <er...@gmail.com>
wrote:

> Uwe just posted a detailed explanation on that jira. Note in particular
> that you must delete the index from disk to be certain all remnants of the
> old metadata are gone if you change field definitions or you can get this
> error. I generally either delete the collection or create a new one when
> changing the schema.
>
> On Jun 6, 2017 8:19 AM, "Varun Thacker" <va...@vthacker.in> wrote:
>
> Does this happen on a fresh Solr 6.3 ( as mentioned on SOLR-10806 ) or was
> the index existing with some other version and then upgraded to 6.3 ?
>
> Is the problem reproducible for you?
>
>
> On Tue, Jun 6, 2017 at 7:26 AM, S G <sg...@gmail.com> wrote:
>
>> Hi,
>>
>> We are seeing some very bad performance on our performance test that
>> tries to load a 2 shard, 3 replica system with about 2000 writes/sec and
>> 2000 reads/sec
>>
>> The exception stack trace seems to point to a specific line of code and a
>> similar stack trace is reported by users on Elastic-Search forums too.
>>
>> Could this be a a common bug in Lucene which is affecting both the
>> systems?
>> https://issues.apache.org/jira/browse/SOLR-10806
>>
>> One bad part about Solr is that once it happens, the whole system comes
>> to a grinding halt.
>> Solr UI is not accessible, even for the nodes not hosting any collections
>> !
>> It would be really nice to get rid of such an instability in the system.
>>
>> Thanks
>> SG
>>
>>
>>
>
>

Re: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Posted by Erick Erickson <er...@gmail.com>.

Uwe just posted a detailed explanation on that jira. Note in particular
that you must delete the index from disk to be certain all remnants of the
old metadata are gone if you change field definitions or you can get this
error. I generally either delete the collection or create a new one when
changing the schema.

On Jun 6, 2017 8:19 AM, "Varun Thacker" <va...@vthacker.in> wrote:

Does this happen on a fresh Solr 6.3 ( as mentioned on SOLR-10806 ) or was
the index existing with some other version and then upgraded to 6.3 ?

Is the problem reproducible for you?

On Tue, Jun 6, 2017 at 7:26 AM, S G <sg...@gmail.com> wrote:

> Hi,
>
> We are seeing some very bad performance on our performance test that tries
> to load a 2 shard, 3 replica system with about 2000 writes/sec and 2000
> reads/sec
>
> The exception stack trace seems to point to a specific line of code and a
> similar stack trace is reported by users on Elastic-Search forums too.
>
> Could this be a a common bug in Lucene which is affecting both the systems?
> https://issues.apache.org/jira/browse/SOLR-10806
>
> One bad part about Solr is that once it happens, the whole system comes to
> a grinding halt.
> Solr UI is not accessible, even for the nodes not hosting any collections !
> It would be really nice to get rid of such an instability in the system.
>
> Thanks
> SG
>
>
>

Re: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Posted by Varun Thacker <va...@vthacker.in>.

Does this happen on a fresh Solr 6.3 ( as mentioned on SOLR-10806 ) or was
the index existing with some other version and then upgraded to 6.3 ?

Is the problem reproducible for you?


On Tue, Jun 6, 2017 at 7:26 AM, S G <sg...@gmail.com> wrote:

> Hi,
>
> We are seeing some very bad performance on our performance test that tries
> to load a 2 shard, 3 replica system with about 2000 writes/sec and 2000
> reads/sec
>
> The exception stack trace seems to point to a specific line of code and a
> similar stack trace is reported by users on Elastic-Search forums too.
>
> Could this be a a common bug in Lucene which is affecting both the systems?
> https://issues.apache.org/jira/browse/SOLR-10806
>
> One bad part about Solr is that once it happens, the whole system comes to
> a grinding halt.
> Solr UI is not accessible, even for the nodes not hosting any collections !
> It would be really nice to get rid of such an instability in the system.
>
> Thanks
> SG
>
>
>

Re: Invalid shift value (64) in prefixCoded bytes (is encoded value really an INT?)

Posted by Varun Thacker <va...@vthacker.in>.

Does this happen on a fresh Solr 6.3 ( as mentioned on SOLR-10806 ) or was
the index existing with some other version and then upgraded to 6.3 ?

Is the problem reproducible for you?


On Tue, Jun 6, 2017 at 7:26 AM, S G <sg...@gmail.com> wrote:

> Hi,
>
> We are seeing some very bad performance on our performance test that tries
> to load a 2 shard, 3 replica system with about 2000 writes/sec and 2000
> reads/sec
>
> The exception stack trace seems to point to a specific line of code and a
> similar stack trace is reported by users on Elastic-Search forums too.
>
> Could this be a a common bug in Lucene which is affecting both the systems?
> https://issues.apache.org/jira/browse/SOLR-10806
>
> One bad part about Solr is that once it happens, the whole system comes to
> a grinding halt.
> Solr UI is not accessible, even for the nodes not hosting any collections !
> It would be really nice to get rid of such an instability in the system.
>
> Thanks
> SG
>
>
>