You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/02/05 20:04:09 UTC

[jira] [Commented] (SOLR-5698) exceptionally long terms are silently ignored during indexing

    [ https://issues.apache.org/jira/browse/SOLR-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13892429#comment-13892429 ] 

Hoss Man commented on SOLR-5698:
--------------------------------

Easy steps to reproduce using the example configs...

{noformat}
hossman@frisbee:~$ perl -le 'print "a,aaa"; print "z," . ("Z" x 32767);' | curl 'http://localhost:8983/solr/update?header=false&fieldnames=name,long_s&rowid=id&commit=true' -H 'Content-Type: application/csv' --data-binary @- 
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">572</int></lst>
</response>
hossman@frisbee:~$ curl 'http://localhost:8983/solr/select?q=*:*&fl=id,name&wt=json&indent=true'{
  "responseHeader":{
    "status":0,
    "QTime":12,
    "params":{
      "fl":"id,name",
      "indent":"true",
      "q":"*:*",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "name":"a",
        "id":"0"},
      {
        "name":"z",
        "id":"1"}]
  }}
hossman@frisbee:~$ curl 'http://localhost:8983/solr/select?q=long_s:*&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "indent":"true",
      "q":"long_s:*",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "name":"a",
        "long_s":"aaa",
        "id":"0",
        "_version_":1459225819107819520}]
  }}
{noformat}

> exceptionally long terms are silently ignored during indexing
> -------------------------------------------------------------
>
>                 Key: SOLR-5698
>                 URL: https://issues.apache.org/jira/browse/SOLR-5698
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>
> As reported on the user list, when a term is greater then 2^15 bytes it is silently ignored at indexing time -- no error is given at all.
> we should investigate:
> * if there is a way to get the lower level lucene code to propogate up an error we can return to the user instead of silently ignoring these terms
> * if there is no way to generate a low level error:
> ** is there at least way to make this limit configurable so it's more obvious to users that this limit exists?
> ** should we make things like StrField do explicit size checking on the terms they produce and explicitly throw their own error?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org