You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Matt Massie (JIRA)" <ji...@apache.org> on 2009/12/30 00:06:29 UTC
[jira] Created: (AVRO-268) Replace lemon-generated JSON parser with
simpler recursive descent parser
Replace lemon-generated JSON parser with simpler recursive descent parser
-------------------------------------------------------------------------
Key: AVRO-268
URL: https://issues.apache.org/jira/browse/AVRO-268
Project: Avro
Issue Type: Improvement
Components: c
Reporter: Matt Massie
Fix For: 1.3.0
This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
This parser
* reads and returns a single JSON_value and its nested children (using recursive descent parsing)
* allows you to process JSON from streams in addition to static memory buffers
* correctly processes unicode \u escaping including surrogates
* distinguishes between integer and real number representations
* provides information about the line and character in JSON that failed to parse
* is much simpler to understand and maintain (less lines of code and source files)
* is written to allow error recovery to be added later
This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-268) Replace lemon-generated JSON parser with
simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Massie updated AVRO-268:
-----------------------------
Resolution: Won't Fix
Status: Resolved (was: Patch Available)
No need for this work. I've finally found a high-quality C parser with a friendly license called Jansson which will serve as the JSON parser moving forward. Sorry for the JIRA noise.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Massie reassigned AVRO-268:
--------------------------------
Assignee: Matt Massie
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-268) Replace lemon-generated JSON parser with
simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Massie updated AVRO-268:
-----------------------------
Status: Patch Available (was: Open)
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Jeff Hammerbacher (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795306#action_12795306 ]
Jeff Hammerbacher commented on AVRO-268:
----------------------------------------
Worth noting that the new behavior is not standard:
JavaScript:
{code}
js> var a = JSON.parse('{ "key": "value" } foo bar baz');
js: "/Users/hammer/codebox/narwhal/engines/default/lib/json.js", line 474: exception from uncaught JavaScript throw: SyntaxError: JSON.parse
{code}
Python:
{code}
>>> b = json.loads('{ "key": "value" } foo bar baz')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.5-i386/egg/simplejson/__init__.py", line 307, in loads
File "build/bdist.macosx-10.5-i386/egg/simplejson/decoder.py", line 338, in decode
ValueError: Extra data: line 1 column 19 - line 1 column 30 (char 19 - 30)
{code}
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795372#action_12795372 ]
Matt Massie commented on AVRO-268:
----------------------------------
Ryan-
We posted comments about two minutes apart.
I hope that my earlier comment clarifies that I didn't implement a non-standard JSON parser.
-Matt
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Ryan King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795359#action_12795359 ]
Ryan King commented on AVRO-268:
--------------------------------
Please don't use a non-standard json parser.
I was under the understanding that part of the reason for choosing JSON is that it is a standard format (http://www.ietf.org/rfc/rfc4627.txt) with parsers available in many languages already. If you use a non-standard JSON parser, you lose that benefit.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795194#action_12795194 ]
Doug Cutting commented on AVRO-268:
-----------------------------------
Sounds like a fine approach.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Ryan King (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795453#action_12795453 ]
Ryan King commented on AVRO-268:
--------------------------------
Alright, I now understand a bit better what you're going for.
Now forgive me if this is a naive question: whenever you need to read JSON for avro, do we know ahead of time how long the JSON blob will be?
Most JSON parsers don't have the property of returning as soon as a full object has been parsed, so we'll need to be able to read just the appropriate length, then parse as JSON.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (AVRO-268) Replace lemon-generated JSON parser with
simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Massie updated AVRO-268:
-----------------------------
Attachment: AVRO-268.patch
I noticed that two old directories still exist in svn
$ rm -rf src/c/json/fail
$ rm -rf src/c/json/pass
will also be performed when this patch is committed.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795193#action_12795193 ]
Matt Massie commented on AVRO-268:
----------------------------------
I should also mention that a few unit tests are *removed* with this patch as well. Tests with trailing character that used to fail now succeed with the new parser.
For example, the following JSON used to fail to parse
{code}
{ "key": "value" } foo bar baz
{code}
while now, the new parser will return immediately when it hits the last '}' ignoring the trailing junk characters.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (AVRO-268) Replace lemon-generated JSON parser
with simpler recursive descent parser
Posted by "Matt Massie (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/AVRO-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795357#action_12795357 ]
Matt Massie commented on AVRO-268:
----------------------------------
I should have been clearer.
I'm not saying that
{code}
{ "key" : "value" } foo bar baz
{code}
is valid JSON. It's not.
I was just speaking to the fact that the parser isn't greedy and will return as soon as it's completed a JSON value. However, if the stream in this example remained pointed to 'foo', the parser would throw an error the next time it's called.
> Replace lemon-generated JSON parser with simpler recursive descent parser
> -------------------------------------------------------------------------
>
> Key: AVRO-268
> URL: https://issues.apache.org/jira/browse/AVRO-268
> Project: Avro
> Issue Type: Improvement
> Components: c
> Reporter: Matt Massie
> Assignee: Matt Massie
> Fix For: 1.3.0
>
> Attachments: AVRO-268.patch
>
>
> This is a drop-in replacement for the current JSON parser which is based on lemon (a LALR parser generator).
> This parser
> * reads and returns a single JSON_value and its nested children (using recursive descent parsing)
> * allows you to process JSON from streams in addition to static memory buffers
> * correctly processes unicode \u escaping including surrogates
> * distinguishes between integer and real number representations
> * provides information about the line and character in JSON that failed to parse
> * is much simpler to understand and maintain (less lines of code and source files)
> * is written to allow error recovery to be added later
> This patch also adds more unit tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.