You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Nikolas Osvalds (Jira)" <ji...@apache.org> on 2022/12/15 13:36:00 UTC
[jira] [Updated] (SOLR-16589) Large fields with large="true" can be truncated in v9+
[ https://issues.apache.org/jira/browse/SOLR-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikolas Osvalds updated SOLR-16589:
-----------------------------------
Description:
h3. Summary
For fields using `large="true"`, large fields (which is what they are intended for) can be truncated in v9+ of Lucene.
Example fieldtype definition:
```
<fieldtype name="string_large" class="solr.TextField" multiValued="false" indexed="false" stored="true" omitNorms="true" large="true" />
```
h3. Cause
Looks like this is a bug introduced along with https://issues.apache.org/jira/browse/LUCENE-8805) / [https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849:]
[https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465|https://github.com/apache/solr/blob/bc2d9623f7960f83636eb8416b11dd4e91ab4b22/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L508-L511]
```
public void stringField(FieldInfo fieldInfo, String value) throws IOException {
Objects.requireNonNull(value, "String value should not be null");
bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
bytesRef.length = value.length();
```
Specifically with respect to "large" fields handling.
The length in utf8 bytes will often be longer than the string length `value.length()`, hence the truncation.
h3. Fix
The Fix would be:
`bytesRef.length = bytesRef.bytes.length`
was:
#
##
h3. Summary
For fields using `large="true"`, large fields (which is what they are intended for) can be truncated in v9+ of Lucene.
Example fieldtype definition:
```
<fieldtype name="string_large" class="solr.TextField" multiValued="false" indexed="false" stored="true" omitNorms="true" large="true" />
```
h3. Cause
Looks like this is a bug introduced along with https://issues.apache.org/jira/browse/LUCENE-8805) / [https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849:]
[https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465|https://github.com/apache/solr/blob/bc2d9623f7960f83636eb8416b11dd4e91ab4b22/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L508-L511]
```
public void stringField(FieldInfo fieldInfo, String value) throws IOException {
Objects.requireNonNull(value, "String value should not be null");
bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
bytesRef.length = value.length();
```
Specifically with respect to "large" fields handling.
The length in utf8 bytes will often be longer than the string length `value.length()`, hence the truncation.
h3. Fix
The Fix would be:
`bytesRef.length = bytesRef.bytes.length`
> Large fields with large="true" can be truncated in v9+
> ------------------------------------------------------
>
> Key: SOLR-16589
> URL: https://issues.apache.org/jira/browse/SOLR-16589
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: search
> Affects Versions: 9.0, 9.1, 9.2
> Reporter: Nikolas Osvalds
> Priority: Major
>
> h3. Summary
> For fields using `large="true"`, large fields (which is what they are intended for) can be truncated in v9+ of Lucene.
> Example fieldtype definition:
> ```
> <fieldtype name="string_large" class="solr.TextField" multiValued="false" indexed="false" stored="true" omitNorms="true" large="true" />
> ```
> h3. Cause
> Looks like this is a bug introduced along with https://issues.apache.org/jira/browse/LUCENE-8805) / [https://github.com/apache/lucene/issues/9849|https://github.com/apache/lucene/issues/9849:]
> [https://github.com/apache/lucene/blob/5a694ea26ff862ecc874ca798135073d300c2234/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L462-L465|https://github.com/apache/solr/blob/bc2d9623f7960f83636eb8416b11dd4e91ab4b22/solr/core/src/java/org/apache/solr/search/SolrDocumentFetcher.java#L508-L511]
> ```
> public void stringField(FieldInfo fieldInfo, String value) throws IOException {
> Objects.requireNonNull(value, "String value should not be null");
> bytesRef.bytes = value.getBytes(StandardCharsets.UTF_8);
> bytesRef.length = value.length();
> ```
>
> Specifically with respect to "large" fields handling.
> The length in utf8 bytes will often be longer than the string length `value.length()`, hence the truncation.
> h3. Fix
> The Fix would be:
> `bytesRef.length = bytesRef.bytes.length`
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org