You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2007/08/17 21:36:31 UTC

[jira] Created: (LUCENE-985) AAIOB thrown when length of termText is longer than 16384 characters

AAIOB thrown when length of termText is longer than 16384 characters
--------------------------------------------------------------------

                 Key: LUCENE-985
                 URL: https://issues.apache.org/jira/browse/LUCENE-985
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 2.3
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.3


DocumentsWriter has a max term length of 16384; if you cross that you
get an unfriendly AIOOB.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520727 ] 

Michael McCandless commented on LUCENE-985:
-------------------------------------------

> As a clarification point for people who stumble upon this issue
> years from now after encountering whatever exception we put in place
> of the current one...why is there a max termText length?

This is because DocumentsWriter packs the term text for each unique
term seen into a pool of char[] blocks of 16384 chars each (to avoid
GC overhead of each separate String).  So, every time a new term is
seen, it puts it at the end of the current block; when there's not
enough space it allocates another block from the pool.  So a given
term must fit entirely into a single block.


> AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520693 ] 

Hoss Man commented on LUCENE-985:
---------------------------------

As a clarification point for people who stumble upon this issue years from now after encountering whatever exception we put in place of the current one...

why is there a max termText length?

> AIOOB thrown when length of termText is longer than 16384 characters
> --------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly AIOOB.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-985:
--------------------------------------

    Summary: AIOOB thrown when length of termText is longer than 16384 characters  (was: AAIOB thrown when length of termText is longer than 16384 characters)

> AIOOB thrown when length of termText is longer than 16384 characters
> --------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly AIOOB.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-985:
--------------------------------------

    Attachment: LUCENE-985.patch

> I doubt anyone will have a problem with the limit. And if they hit
> the exception it is probably due to bad end-user input of some
> kind. I always run a token filter that leaves out any token larger
> than 250 charachters or so, depending on the application. (It was
> quite accidential that I hit this AIOOBE.)

Agreed!

> That would also be a recommendation I think makes sense in the
> documentation people will look up when hitting the exception.

I've added a blurb in javadoc for IndexWriter.addDocument explaining
this limit.

Thanks for catching this Karl!

> AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-985.patch
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated LUCENE-985:
----------------------------

    Description: 
DocumentsWriter has a max term length of 16384; if you cross that you
get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

  was:
DocumentsWriter has a max term length of 16384; if you cross that you
get an unfriendly AIOOB.  We should fix to raise a clearer exception.

        Summary: AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)  (was: AIOOB thrown when length of termText is longer than 16384 characters)

(making summary longer to improve searchability of the exception for other people who may get bit by it)



> AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-985.
---------------------------------------

    Resolution: Fixed

> AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-985.patch
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-985) AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520741 ] 

Karl Wettin commented on LUCENE-985:
------------------------------------

I doubt anyone will have a problem with the limit. And if they hit the exception it is probably due to bad end-user input of some kind. I always run a token filter that leaves out any token larger than 250 charachters or so, depending on the application. (It was quite accidential that I hit this AIOOBE.)

That would also be a recommendation I think makes sense in the documentation people will look up when hitting the exception.

> AIOOB thrown when length of termText is longer than 16384 characters (ArrayIndexOutOfBoundsException)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-985
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>
> DocumentsWriter has a max term length of 16384; if you cross that you
> get an unfriendly ArrayIndexOutOfBoundsException.  We should fix to raise a clearer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org