You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2013/08/12 21:24:48 UTC

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

    [ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737260#comment-13737260 ] 

Yonik Seeley commented on LUCENE-4583:
--------------------------------------

I'm confused by the following comment:

{code}
+  /** Maximum length for each binary doc values field,
+   *  because we use PagedBytes with page size of 16 bits. */
+  public static final int MAX_BINARY_FIELD_LENGTH = (1 << 16) + 1;
{code}

But this patch removes the PagedBytes limitation, right?

After this patch, are there any remaining code limitations that prevent raising the limit, or is it really just self imposed via
{code}
+      if (v.length > Lucene42DocValuesFormat.MAX_BINARY_FIELD_LENGTH) {
+        throw new IllegalArgumentException("DocValuesField \"" + field.name + "\" is too large, must be <= " + Lucene42DocValuesFormat.MAX_BINARY_FIELD_LENGTH);
+      }
{code}

                
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Assignee: Michael McCandless
>            Priority: Critical
>             Fix For: 5.0, 4.5
>
>         Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field value in the docs.  It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit.  32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.    The following test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org