You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by John Evans <jo...@jpevans.com> on 2008/06/29 00:18:05 UTC

Unnecessary escaping in URLs?

Hi All,

I am storing normal URLs (e.g. "http://www.example.com/foo/bar/baz.html") in
a field in couchdb.  I am accessing couchdb from python. Initially I used
simplejson as my parser since there was example code based on that.  As I
have started working with larger and larger data sets with larger and larger
records I have discovered that simplejson is very slow, so I looked into
faster alternatives.  I discovered cjson which is very fast but less
tolerant, so now in my couchdb client class, everywhere that used to call
simplejson now tries cjson first and if cjson raises an exception then it
fails back to simple json.  Something like this:

    try:
        native_data = cjson.decode(raw_data)
    except DecodeError:
        native_data = simplejson.loads(raw_data)

Well, what I have discovered is that when reading the URLs I have stored,
these two implementations treat them differently.  Specifically, everything
works as I would expect it to with simplejson, but with cjson the URLs show
up with backslashes in them, so the example above would come out as
"http:\/\/www.example.com\/foo\/bar\/bar.html".  I thought at first that
this must be a bug in cjson, but then I pulled up a record in the browser
(and again with curl) and I see that the extra backslashes are actually
returned by couchdb.  (I double checked that my POST/PUTs were not including
them) and in fact couchdb seems to be the culprit.  Since it works with
simplejson, my guess would be that this may still be a bug in how cjson
parses the strings, but I'm curious, is this additional escaping
intentional? and if so, is it necessary? and if so, why?

(and if anyone has any recommendations on how to get cjson to parse it
correctly, that would be great too :))

Thanks,
-
John

Re: Unnecessary escaping in URLs?

Posted by John Evans <jo...@jpevans.com>.

I discovered this blog post after I sent my initial email:

http://garybernhardt.blogspot.com/2007/07/when-json-isnt-json.html

It sheds some light on the problem, and as you suspect, it looks like cjson
is not handling this valid JSON.

(It turns out the updated code was using the same failover logic so of
course my tests were showing the data going in cleanly because it was
succeeding with cjson and never failing over to simplejson, whereas the data
I was running into problems with was originally put there by simplejson...
my fault).

Thanks,
-
John

On Sat, Jun 28, 2008 at 3:30 PM, Chris Anderson <jc...@grabb.it> wrote:

> John,
>
> "http:\/\/www.example.com\/foo\/bar\/bar.html"
>
> Is a valid representation of
>
> "http://www.example.com/foo/bar/bar.html"
>
> in JSON.
>
> I'm not sure about Python, but by pasting the string into Firebug, I
> see that it comes out clean.
>
> The JSON spec is extremely simple: http://www.json.org/
>
> It seems like the cjson Python library might be failing to unescape.
>
> Chris
>
>
>
>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Re: Unnecessary escaping in URLs?

Posted by Chris Anderson <jc...@grabb.it>.

John,

"http:\/\/www.example.com\/foo\/bar\/bar.html"

Is a valid representation of

"http://www.example.com/foo/bar/bar.html"

in JSON.

I'm not sure about Python, but by pasting the string into Firebug, I
see that it comes out clean.

The JSON spec is extremely simple: http://www.json.org/

It seems like the cjson Python library might be failing to unescape.

Chris





-- 
Chris Anderson
http://jchris.mfdz.com