You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chemistry.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2011/03/31 11:57:05 UTC
[jira] [Created] (CMIS-344) Query parser should not use UTF-8
encoding
Query parser should not use UTF-8 encoding
------------------------------------------
Key: CMIS-344
URL: https://issues.apache.org/jira/browse/CMIS-344
Project: Chemistry
Issue Type: Bug
Components: opencmis-server
Affects Versions: OpenCMIS 0.4.0
Reporter: Michael Dürig
QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
Instead of
CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
the input stream should be obtained like this:
CharStream input = new ANTLRStringStream(statement);
The former method transforms the characters in the contains clause of the query
SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8
encoding
Posted by Michael Dürig <mi...@gmail.com>.
> Note though that SELECT * FROM cmis:document WHERE CONTAINS
> ('\u4E2D\u6587') isn't actually legal CMISQL, as currently CMISQL has
> no notion of Unicode escaping. The query would have to contain actual
> Unicode characters.
But doesn't this query contain actual Unicode characters? \u4E2D and
\u6587 are Java Unicode Escapes [1].
Michael
[1]
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850
> NB: Unicode escaping is only specified in SQL-2008, not SQL-92. See
> this for a summary:
> http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#N11E65
>
> Florent
>
> On Thu, Mar 31, 2011 at 2:00 PM, Florent Guillaume<fg...@nuxeo.com> wrote:
>> No objection, I probably wasn't aware of ANTLRStringStream when I
>> wrote that code.
>>
>> Florent
>>
>> On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel<jh...@opentext.com> wrote:
>>> Florent,
>>>
>>> as far as I remember this code came originally from your side. Would you have any objections to apply the proposed patch? Would this break something on your side?
>>>
>>> Jens
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jens Hübel (JIRA) [mailto:jira@apache.org]
>>> Sent: Donnerstag, 31. März 2011 12:42
>>> To: dev@chemistry.apache.org
>>> Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
>>>
>>>
>>> [ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>>
>>> Jens Hübel reassigned CMIS-344:
>>> -------------------------------
>>>
>>> Assignee: Jens Hübel
>>>
>>>> Query parser should not use UTF-8 encoding
>>>> ------------------------------------------
>>>>
>>>> Key: CMIS-344
>>>> URL: https://issues.apache.org/jira/browse/CMIS-344
>>>> Project: Chemistry
>>>> Issue Type: Bug
>>>> Components: opencmis-server
>>>> Affects Versions: OpenCMIS 0.4.0
>>>> Reporter: Michael Dürig
>>>> Assignee: Jens Hübel
>>>> Attachments: CMIS-344.patch
>>>>
>>>>
>>>> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
>>>> Instead of
>>>> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
>>>> the input stream should be obtained like this:
>>>> CharStream input = new ANTLRStringStream(statement);
>>>> The former method transforms the characters in the contains clause of the query
>>>> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
>>>> in an incorrect way.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>>
>>
>>
>>
>> --
>> Florent Guillaume, Director of R&D, Nuxeo
>> Open Source, Java EE based, Enterprise Content Management (ECM)
>> http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
>>
>
>
>
RE: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Posted by Jens Hübel <jh...@opentext.com>.
Interesting. So perhaps this is something we should bring back to the TC then. A future CMIS version probably should respect Unicode escaping.
Jens
-----Original Message-----
From: Florent Guillaume [mailto:fg@nuxeo.com]
Sent: Donnerstag, 31. März 2011 14:17
To: dev@chemistry.apache.org
Cc: Jens Hübel
Subject: Re: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Note though that SELECT * FROM cmis:document WHERE CONTAINS
('\u4E2D\u6587') isn't actually legal CMISQL, as currently CMISQL has
no notion of Unicode escaping. The query would have to contain actual
Unicode characters.
NB: Unicode escaping is only specified in SQL-2008, not SQL-92. See
this for a summary:
http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#N11E65
Florent
On Thu, Mar 31, 2011 at 2:00 PM, Florent Guillaume <fg...@nuxeo.com> wrote:
> No objection, I probably wasn't aware of ANTLRStringStream when I
> wrote that code.
>
> Florent
>
> On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel <jh...@opentext.com> wrote:
>> Florent,
>>
>> as far as I remember this code came originally from your side. Would you have any objections to apply the proposed patch? Would this break something on your side?
>>
>> Jens
>>
>>
>>
>> -----Original Message-----
>> From: Jens Hübel (JIRA) [mailto:jira@apache.org]
>> Sent: Donnerstag, 31. März 2011 12:42
>> To: dev@chemistry.apache.org
>> Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
>>
>>
>> [ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Jens Hübel reassigned CMIS-344:
>> -------------------------------
>>
>> Assignee: Jens Hübel
>>
>>> Query parser should not use UTF-8 encoding
>>> ------------------------------------------
>>>
>>> Key: CMIS-344
>>> URL: https://issues.apache.org/jira/browse/CMIS-344
>>> Project: Chemistry
>>> Issue Type: Bug
>>> Components: opencmis-server
>>> Affects Versions: OpenCMIS 0.4.0
>>> Reporter: Michael Dürig
>>> Assignee: Jens Hübel
>>> Attachments: CMIS-344.patch
>>>
>>>
>>> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
>>> Instead of
>>> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
>>> the input stream should be obtained like this:
>>> CharStream input = new ANTLRStringStream(statement);
>>> The former method transforms the characters in the contains clause of the query
>>> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
>>> in an incorrect way.
>>
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>
>
>
> --
> Florent Guillaume, Director of R&D, Nuxeo
> Open Source, Java EE based, Enterprise Content Management (ECM)
> http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
>
--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
Re: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Posted by Florent Guillaume <fg...@nuxeo.com>.
Note though that SELECT * FROM cmis:document WHERE CONTAINS
('\u4E2D\u6587') isn't actually legal CMISQL, as currently CMISQL has
no notion of Unicode escaping. The query would have to contain actual
Unicode characters.
NB: Unicode escaping is only specified in SQL-2008, not SQL-92. See
this for a summary:
http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#N11E65
Florent
On Thu, Mar 31, 2011 at 2:00 PM, Florent Guillaume <fg...@nuxeo.com> wrote:
> No objection, I probably wasn't aware of ANTLRStringStream when I
> wrote that code.
>
> Florent
>
> On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel <jh...@opentext.com> wrote:
>> Florent,
>>
>> as far as I remember this code came originally from your side. Would you have any objections to apply the proposed patch? Would this break something on your side?
>>
>> Jens
>>
>>
>>
>> -----Original Message-----
>> From: Jens Hübel (JIRA) [mailto:jira@apache.org]
>> Sent: Donnerstag, 31. März 2011 12:42
>> To: dev@chemistry.apache.org
>> Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
>>
>>
>> [ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Jens Hübel reassigned CMIS-344:
>> -------------------------------
>>
>> Assignee: Jens Hübel
>>
>>> Query parser should not use UTF-8 encoding
>>> ------------------------------------------
>>>
>>> Key: CMIS-344
>>> URL: https://issues.apache.org/jira/browse/CMIS-344
>>> Project: Chemistry
>>> Issue Type: Bug
>>> Components: opencmis-server
>>> Affects Versions: OpenCMIS 0.4.0
>>> Reporter: Michael Dürig
>>> Assignee: Jens Hübel
>>> Attachments: CMIS-344.patch
>>>
>>>
>>> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
>>> Instead of
>>> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
>>> the input stream should be obtained like this:
>>> CharStream input = new ANTLRStringStream(statement);
>>> The former method transforms the characters in the contains clause of the query
>>> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
>>> in an incorrect way.
>>
>> --
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>
>
>
> --
> Florent Guillaume, Director of R&D, Nuxeo
> Open Source, Java EE based, Enterprise Content Management (ECM)
> http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
>
--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
Re: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Posted by Florent Guillaume <fg...@nuxeo.com>.
No objection, I probably wasn't aware of ANTLRStringStream when I
wrote that code.
Florent
On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel <jh...@opentext.com> wrote:
> Florent,
>
> as far as I remember this code came originally from your side. Would you have any objections to apply the proposed patch? Would this break something on your side?
>
> Jens
>
>
>
> -----Original Message-----
> From: Jens Hübel (JIRA) [mailto:jira@apache.org]
> Sent: Donnerstag, 31. März 2011 12:42
> To: dev@chemistry.apache.org
> Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
>
>
> [ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Jens Hübel reassigned CMIS-344:
> -------------------------------
>
> Assignee: Jens Hübel
>
>> Query parser should not use UTF-8 encoding
>> ------------------------------------------
>>
>> Key: CMIS-344
>> URL: https://issues.apache.org/jira/browse/CMIS-344
>> Project: Chemistry
>> Issue Type: Bug
>> Components: opencmis-server
>> Affects Versions: OpenCMIS 0.4.0
>> Reporter: Michael Dürig
>> Assignee: Jens Hübel
>> Attachments: CMIS-344.patch
>>
>>
>> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
>> Instead of
>> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
>> the input stream should be obtained like this:
>> CharStream input = new ANTLRStringStream(statement);
>> The former method transforms the characters in the contains clause of the query
>> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
>> in an incorrect way.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87
RE: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
Posted by Jens Hübel <jh...@opentext.com>.
Florent,
as far as I remember this code came originally from your side. Would you have any objections to apply the proposed patch? Would this break something on your side?
Jens
-----Original Message-----
From: Jens Hübel (JIRA) [mailto:jira@apache.org]
Sent: Donnerstag, 31. März 2011 12:42
To: dev@chemistry.apache.org
Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding
[ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Hübel reassigned CMIS-344:
-------------------------------
Assignee: Jens Hübel
> Query parser should not use UTF-8 encoding
> ------------------------------------------
>
> Key: CMIS-344
> URL: https://issues.apache.org/jira/browse/CMIS-344
> Project: Chemistry
> Issue Type: Bug
> Components: opencmis-server
> Affects Versions: OpenCMIS 0.4.0
> Reporter: Michael Dürig
> Assignee: Jens Hübel
> Attachments: CMIS-344.patch
>
>
> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
> Instead of
> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
> the input stream should be obtained like this:
> CharStream input = new ANTLRStringStream(statement);
> The former method transforms the characters in the contains clause of the query
> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
> in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (CMIS-344) Query parser should not use UTF-8
encoding
Posted by "Jens Hübel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Hübel closed CMIS-344.
---------------------------
> Query parser should not use UTF-8 encoding
> ------------------------------------------
>
> Key: CMIS-344
> URL: https://issues.apache.org/jira/browse/CMIS-344
> Project: Chemistry
> Issue Type: Bug
> Components: opencmis-server
> Affects Versions: OpenCMIS 0.4.0
> Reporter: Michael Dürig
> Assignee: Jens Hübel
> Fix For: OpenCMIS 0.4.0
>
> Attachments: CMIS-344.patch
>
>
> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
> Instead of
> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
> the input stream should be obtained like this:
> CharStream input = new ANTLRStringStream(statement);
> The former method transforms the characters in the contains clause of the query
> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
> in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CMIS-344) Query parser should not use UTF-8
encoding
Posted by "Jens Hübel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Hübel resolved CMIS-344.
-----------------------------
Resolution: Fixed
Fix Version/s: OpenCMIS 0.4.0
Patch applied. Thanks for providing it.
> Query parser should not use UTF-8 encoding
> ------------------------------------------
>
> Key: CMIS-344
> URL: https://issues.apache.org/jira/browse/CMIS-344
> Project: Chemistry
> Issue Type: Bug
> Components: opencmis-server
> Affects Versions: OpenCMIS 0.4.0
> Reporter: Michael Dürig
> Assignee: Jens Hübel
> Fix For: OpenCMIS 0.4.0
>
> Attachments: CMIS-344.patch
>
>
> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
> Instead of
> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
> the input stream should be obtained like this:
> CharStream input = new ANTLRStringStream(statement);
> The former method transforms the characters in the contains clause of the query
> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
> in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CMIS-344) Query parser should not use UTF-8
encoding
Posted by "Jens Hübel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Hübel reassigned CMIS-344:
-------------------------------
Assignee: Jens Hübel
> Query parser should not use UTF-8 encoding
> ------------------------------------------
>
> Key: CMIS-344
> URL: https://issues.apache.org/jira/browse/CMIS-344
> Project: Chemistry
> Issue Type: Bug
> Components: opencmis-server
> Affects Versions: OpenCMIS 0.4.0
> Reporter: Michael Dürig
> Assignee: Jens Hübel
> Attachments: CMIS-344.patch
>
>
> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
> Instead of
> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
> the input stream should be obtained like this:
> CharStream input = new ANTLRStringStream(statement);
> The former method transforms the characters in the contains clause of the query
> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
> in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CMIS-344) Query parser should not use UTF-8
encoding
Posted by "Michael Dürig (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Dürig updated CMIS-344:
-------------------------------
Attachment: CMIS-344.patch
Proposed patch
> Query parser should not use UTF-8 encoding
> ------------------------------------------
>
> Key: CMIS-344
> URL: https://issues.apache.org/jira/browse/CMIS-344
> Project: Chemistry
> Issue Type: Bug
> Components: opencmis-server
> Affects Versions: OpenCMIS 0.4.0
> Reporter: Michael Dürig
> Attachments: CMIS-344.patch
>
>
> QueryUtil converts the query statement to a UTF-8 encoded byte array which is used as input to the lexer instead of using the string directly.
> Instead of
> CharStream input = new ANTLRInputStream(new ByteArrayInputStream(statement.getBytes("UTF-8")));
> the input stream should be obtained like this:
> CharStream input = new ANTLRStringStream(statement);
> The former method transforms the characters in the contains clause of the query
> SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
> in an incorrect way.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira