You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/08/27 19:06:59 UTC

[jira] Updated: (SOLR-1091) "phps" (serialized PHP) writer produces invalid output

     [ https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-1091:
-------------------------------

    Attachment: SOLR-1091.patch

Here's a patch that can handle the modified UTF8 that Jetty puts out, as well as speeding up the normal UTF8 case using Lucene's UTF8 encoding.

modified UTF8 support is switched on if the jetty.home property is set (jetty does this by default).

> "phps" (serialized PHP) writer produces invalid output
> ------------------------------------------------------
>
>                 Key: SOLR-1091
>                 URL: https://issues.apache.org/jira/browse/SOLR-1091
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.3
>         Environment: Sun JRE 1.6.0 on Centos 5
>            Reporter: frank farmer
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1091.patch
>
>
> The serialized PHP output writer can outputs invalid string lengths for certain (unusual) input values.  Specifically, I had a document containing the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without issue; however, when fetching the document back out using the serialized PHP writer, it returns a string like the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6 bytes long.
> When using PHP's native serialize() function, it correctly sets the length to 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't have any trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.