You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Bill Stephenson <bi...@cherrypc.com> on 2018/04/07 05:24:50 UTC

Perl and bad characters

I’ve been working on a “comments” feature for my “CherryPC blog”. 

I don’t want readers to have to make a user account to comment so I’m wanting to use a perl script on the server side that has the user credentials in the $url variable below.

This is the code I’m using to update the document with the comment.

# Convert the JSON to a perl object

my $data_structure = decode_json(`curl -X GET $url`);

my $_id = $data_structure->{'_id'};
my $_rev = $data_structure->{'_rev'};
my $title = $data_structure->{'title'};
my $subtitle = $data_structure->{'subtitle'};
my $content = $data_structure->{'content'};
my $Text_publish = $data_structure->{'Text_publish'};
my $publishDate = $data_structure->{'publishDate'};


my $returnJSON = qq`{"$_id": "_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", "docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;

my $Post = `curl -X PUT $url -d '$returnJSON'`;

This works fine with plain text, but the blog posts are made with TinyMCE and use HTML.  I can update them fine with Javascript and PouchDB, but Perl is dying on double quotes, single quotes, and backslashes:

‘ “ \

I’ve narrowed it down to just those 3 characters. If I strip those from the html and comments it will all post fine, but html doesn’t work without those so that’s not an option.

I’m using these modules:

use strict;
use warnings;
use utf8;
use JSON::XS;
use Data::Dumper;
use CGI;

From what I understand "use utf8” forces the all data to be utf-8 encoded and I’ve used several different modules to encode the data and built the entire document in a perl object and converted that to JSON as opposed to a simple string like above, but it still dies on those three characters. 

This is what the curl error tells me:

PUT Error: bad_request
reason: invalid UTF-8 JSON

So, it’s those 3 characters that are not being encoded correctly.

If anyone has any ideas and/or advice on how to deal with this I’d sure appreciate them. I’ve pretty much ran out of them at this point.

Kindest Regards,

Bill Stephenson




RE: Perl and bad characters

Posted by Keith Gable <zi...@ignition-project.com>.
The last line is you taking user input and putting it into a shell argument without escaping. Use a real HTTP library so that you don’t get exploited by a robot. Does this still persist if you use a real HTTP library?



________________________________
From: Bill Stephenson <bi...@cherrypc.com>
Sent: Friday, April 6, 2018 10:24:50 PM
To: user@couchdb.apache.org
Subject: Perl and bad characters

I’ve been working on a “comments” feature for my “CherryPC blog”.

I don’t want readers to have to make a user account to comment so I’m wanting to use a perl script on the server side that has the user credentials in the $url variable below.

This is the code I’m using to update the document with the comment.

# Convert the JSON to a perl object

my $data_structure = decode_json(`curl -X GET $url`);

my $_id = $data_structure->{'_id'};
my $_rev = $data_structure->{'_rev'};
my $title = $data_structure->{'title'};
my $subtitle = $data_structure->{'subtitle'};
my $content = $data_structure->{'content'};
my $Text_publish = $data_structure->{'Text_publish'};
my $publishDate = $data_structure->{'publishDate'};


my $returnJSON = qq`{"$_id": "_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", "docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;

my $Post = `curl -X PUT $url -d '$returnJSON'`;

This works fine with plain text, but the blog posts are made with TinyMCE and use HTML.  I can update them fine with Javascript and PouchDB, but Perl is dying on double quotes, single quotes, and backslashes:

‘ “ \

I’ve narrowed it down to just those 3 characters. If I strip those from the html and comments it will all post fine, but html doesn’t work without those so that’s not an option.

I’m using these modules:

use strict;
use warnings;
use utf8;
use JSON::XS;
use Data::Dumper;
use CGI;

From what I understand "use utf8” forces the all data to be utf-8 encoded and I’ve used several different modules to encode the data and built the entire document in a perl object and converted that to JSON as opposed to a simple string like above, but it still dies on those three characters.

This is what the curl error tells me:

PUT Error: bad_request
reason: invalid UTF-8 JSON

So, it’s those 3 characters that are not being encoded correctly.

If anyone has any ideas and/or advice on how to deal with this I’d sure appreciate them. I’ve pretty much ran out of them at this point.

Kindest Regards,

Bill Stephenson




Re: Perl and bad characters

Posted by Raimund Riedel <ra...@gmail.com>.
The internal representation of character strings in Perl-5 is not 
identical to UTF-8 or UTF-X, although they both may occur in the same 
string variable. There is no automatic conversion; the "use utf8;" 
pragma is only to enable Perl-5 source code written in UTF-8 (see 
"perldoc utf8"). Therefore each UTF-8 text coming from outside the 
program must be decoded, as well as all data to leave the program as 
UTF-8 text must be encoded.

So please after including the "use Encode;" pragma replace your line
     my $data_structure = decode_json(`curl -X GET $url`);
by something like
     my $data_structure = decode_json(decode('utf8', `curl -X GET $url`));
and replace analogously
     my $Post = `curl -X PUT $url -d '$returnJSON'`;
by
     my $JSONutf8 = encode('utf8', $returnJSON);
     my $Post = `curl -X PUT $url -d '$JSONutf8'`;

This method helped my a lot to build and use a couch database with many 
international names in its texts. Since the error message you included 
is related to UTF-8, it should be worth while to try in your case.

Kind regards,
Raimund Riedel


Am 07.04.2018 um 07:24 schrieb Bill Stephenson:
> I’ve been working on a “comments” feature for my “CherryPC blog”.
>
> I don’t want readers to have to make a user account to comment so I’m wanting to use a perl script on the server side that has the user credentials in the $url variable below.
>
> This is the code I’m using to update the document with the comment.
>
> # Convert the JSON to a perl object
>
> my $data_structure = decode_json(`curl -X GET $url`);
>
> my $_id = $data_structure->{'_id'};
> my $_rev = $data_structure->{'_rev'};
> my $title = $data_structure->{'title'};
> my $subtitle = $data_structure->{'subtitle'};
> my $content = $data_structure->{'content'};
> my $Text_publish = $data_structure->{'Text_publish'};
> my $publishDate = $data_structure->{'publishDate'};
>
>
> my $returnJSON = qq`{"$_id": "_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", "docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;
>
> my $Post = `curl -X PUT $url -d '$returnJSON'`;
>
> This works fine with plain text, but the blog posts are made with TinyMCE and use HTML.  I can update them fine with Javascript and PouchDB, but Perl is dying on double quotes, single quotes, and backslashes:
>
> ‘ “ \
>
> I’ve narrowed it down to just those 3 characters. If I strip those from the html and comments it will all post fine, but html doesn’t work without those so that’s not an option.
>
> I’m using these modules:
>
> use strict;
> use warnings;
> use utf8;
> use JSON::XS;
> use Data::Dumper;
> use CGI;
>
>  From what I understand "use utf8” forces the all data to be utf-8 encoded and I’ve used several different modules to encode the data and built the entire document in a perl object and converted that to JSON as opposed to a simple string like above, but it still dies on those three characters.
>
> This is what the curl error tells me:
>
> PUT Error: bad_request
> reason: invalid UTF-8 JSON
>
> So, it’s those 3 characters that are not being encoded correctly.
>
> If anyone has any ideas and/or advice on how to deal with this I’d sure appreciate them. I’ve pretty much ran out of them at this point.
>
> Kindest Regards,
>
> Bill Stephenson
>
>
>
>

-- 
Raimund Riedel
______________ rajmundrd@gmail.com
______________ Mi parolas Esperanton


Re: Perl and bad characters

Posted by Michael Zedeler <mi...@zedeler.dk>.
Hi Bill.

You need to escape the characters and unescape them again when you 
retrieve them from the server. What you should do is something like this:

use JSON;
use HTTP::Request::Common;

my $result = PUT $url, Content_Tipe => 'application/json', Content => 
encode_json($data_structure);

My Perl knowledge is a little rusty, but something along the lines above 
should work. If you use curl externally (with backticks as in your 
example), you're likely to run into a lot of escaping problems that 
really aren't worth working around.

Regards,

Michael.

On 07-04-2018 07:24, Bill Stephenson wrote:
> I’ve been working on a “comments” feature for my “CherryPC blog”.
>
> I don’t want readers to have to make a user account to comment so I’m wanting to use a perl script on the server side that has the user credentials in the $url variable below.
>
> This is the code I’m using to update the document with the comment.
>
> # Convert the JSON to a perl object
>
> my $data_structure = decode_json(`curl -X GET $url`);
>
> my $_id = $data_structure->{'_id'};
> my $_rev = $data_structure->{'_rev'};
> my $title = $data_structure->{'title'};
> my $subtitle = $data_structure->{'subtitle'};
> my $content = $data_structure->{'content'};
> my $Text_publish = $data_structure->{'Text_publish'};
> my $publishDate = $data_structure->{'publishDate'};
>
>
> my $returnJSON = qq`{"$_id": "_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", "docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;
>
> my $Post = `curl -X PUT $url -d '$returnJSON'`;
>
> This works fine with plain text, but the blog posts are made with TinyMCE and use HTML.  I can update them fine with Javascript and PouchDB, but Perl is dying on double quotes, single quotes, and backslashes:
>
> ‘ “ \
>
> I’ve narrowed it down to just those 3 characters. If I strip those from the html and comments it will all post fine, but html doesn’t work without those so that’s not an option.
>
> I’m using these modules:
>
> use strict;
> use warnings;
> use utf8;
> use JSON::XS;
> use Data::Dumper;
> use CGI;
>
>  From what I understand "use utf8” forces the all data to be utf-8 encoded and I’ve used several different modules to encode the data and built the entire document in a perl object and converted that to JSON as opposed to a simple string like above, but it still dies on those three characters.
>
> This is what the curl error tells me:
>
> PUT Error: bad_request
> reason: invalid UTF-8 JSON
>
> So, it’s those 3 characters that are not being encoded correctly.
>
> If anyone has any ideas and/or advice on how to deal with this I’d sure appreciate them. I’ve pretty much ran out of them at this point.
>
> Kindest Regards,
>
> Bill Stephenson
>
>
>

-- 
Michael Zedeler
70 25 19 99
michael@zedeler.dk

dk.linkedin.com/in/mzedeler | twitter.com/mzedeler | github.com/mzedeler


Re: Perl and bad characters - Solved!

Posted by Bill Stephenson <bi...@cherrypc.com>.
I found the simple “append” solution I was looking for on stackoverflow.com:

https://stackoverflow.com/questions/21718486/need-to-add-new-data-to-json-array-in-perl <https://stackoverflow.com/questions/21718486/need-to-add-new-data-to-json-array-in-perl>

Thank you all again for the help!

I’ve included the working script below. If anyone has suggestions for improvements please let me know.

Kindest Regards,

Bill Stephenson


—————————————————————————
#!/usr/bin/perl

use strict;
use warnings;
use JSON::XS;
use Data::Dumper;
use CGI;
use LWP::Simple;

my $cgi = CGI->new;
my $msg;

# Get and print the post doc ID:
my $id = $cgi->param('postID');    
# $msg = "id:  $id";
# &print_message($msg);  

# Get the blog post from CouchDB
my $url = "https://user:pass\@cherrypc.com:6984/cherrypc/$id";

my $blogdoc = get $url;
die "Couldn't get $url" unless defined $blogdoc;

# convert json to perl object
my $data_structure = decode_json($blogdoc);
 
# $msg = "Doc Title: ". $data_structure->{'title'};
# &print_message;

# $msg = "data_structure: \n" . Dumper($data_structure);
# &print_message($msg);

# -----------------------------------------
#Append the list of comments:

my $newCommentDate = $cgi->param('commentDate');     
my $newComment = $cgi->param('comment');  

my $newdata = {commentDate=>"$newCommentDate",comment=>"$newComment"};
push @{ $data_structure->{'comments'}  }, $newdata;
# -----------------------------------------

# convert perl object back to json 
my $updatedDoc = encode_json $data_structure;

# Update the document
my $req = HTTP::Request->new(PUT => $url);
$req->content_type('application/json');
$req->content($updatedDoc);

my $ua = LWP::UserAgent->new; 
my $res = $ua->request($req);
# $res is an HTTP::Response.

if ($res->is_success) {
    $msg = "Success: ". $res->as_string;
    &print_message($msg);
  }
  else {
    $msg =  "Failed: ". $res->status_line;
    &print_message($msg);
  }

# $msg = Dumper($res);
# &print_message($msg);

print $cgi->header('text/plain;charset=UTF-8');  
  
exit;

#--------------------------------------------------------------------------------------------------------
sub print_message {
#--------------------------------------------------------------------------------------------------------
	open (DUMPFILE, ">>/usr/lib/cgi-bin/debug.txt") or die "Unable to open /usr/lib/cgi-bin/debug.txt \n";
	print DUMPFILE "\n------------------- message ------------------- \n";
	print DUMPFILE "$msg";
	print DUMPFILE "\n--------------------- end --------------------- \n";
	close(DUMPFILE);
	$msg = "";
	return;
}


Re: Perl and bad characters - Progress!

Posted by Bill Stephenson <bi...@cherrypc.com>.
As per the advice given I took a different route and used LWP to  get and put the document. 

And I’m going the ditch the building of the JSON like so:

my $returnJSON = qq`{"_id": "$_id", "_rev": "$_rev", "title": "$title", "subtitle": "$subtitle", "content": "$content", "docType": "text", "Text_publish": "yes", "publishDate": "$publishDate",$newCommentsList}`;

That just doesn’t work.  I don’t know why, but all I really need to do is add the comment to the Perl object so I’m going to work on that. Perl Objects do baffle me a bit but I’m going to try and trudge my way through figuring it out. 

Thank you all for the advice and pointers.

Below is what I’ve got so far and this does update the doc successfully so I have made progress with your help:
----------------------------------------------------------------------

#!/usr/bin/perl

use strict;
use warnings;
use JSON::XS;
use Data::Dumper;
use CGI;
use LWP::Simple;

my $cgi = CGI->new;
my $msg;

# Get and print the post doc ID:
my $id = $cgi->param('postID');    
# $msg = "id:  $id";
# &print_message($msg);  

# Get the blog post from CouchDB
my $url = "https://user:pass\@cherrypc.com:6984/cherrypc/$id";

my $blogdoc = get $url;
die "Couldn't get $url" unless defined $blogdoc;

# convert json to perl object
my $data_structure = decode_json($blogdoc);

$msg = "Doc Title: ". $data_structure->{'title'};
&print_message;

# -----------------------------------------
#Get the list of comments:

my %comments;
my $comments;
my @comments;
my $comment;
my $commentDate;
my $msgCounter = 1;
my $commentList;

$comments{ $_->{'comment'} } = $_->{'commentDate'} for @{ $data_structure->{'comments'} };

$msg = "Number of Comments: " . scalar keys %comments;
&print_message;

# Get and print old comments:
while ( ( $comment, $commentDate ) = each %comments ) { 
	$commentList .= "Comment: $comment :: Comment Date : $commentDate \n";
}

$msg = "Old Comments: \n" .$commentList; 
&print_message;

# Need to add the new comment to the $data_structure perl object here
my $newCommentDate = $cgi->param('commentDate');     
my $newComment = $cgi->param('comment');  

# Print new comment:
$msg = "New Comment: \nComment $newComment :: Comment Date : $newCommentDate";
&print_message($msg);

# I have no idea yet how to append the new comment to the Perl Object.
# Seems like this should be easy but I've yet to figure that out.

# -----------------------------------------
# convert perl object back to json 
my $updatedDoc = encode_json $data_structure;
# -----------------------------------------

# Update the document
my $req = HTTP::Request->new(PUT => $url);
$req->content_type('application/json');
$req->content($updatedDoc);

my $ua = LWP::UserAgent->new; 
my $res = $ua->request($req);
# $res is an HTTP::Response.

if ($res->is_success) {
    $msg = "Success: ". $res->as_string;
	&print_message($msg);
  }
  else {
    $msg =  "Failed: ". $res->status_line;
	&print_message($msg);
  }

# $msg = Dumper($res);
# &print_message($msg);

print $cgi->header('text/plain;charset=UTF-8');  
  
exit;

#--------------------------------------------------------------------------------------------------------
sub print_message {
#--------------------------------------------------------------------------------------------------------
	open (DUMPFILE, ">>/usr/lib/cgi-bin/debug.txt") or die "Unable to open /usr/lib/cgi-bin/debug.txt \n";
	print DUMPFILE "\n------------------- message ------------------- \n";
	print DUMPFILE "$msg";
	print DUMPFILE "\n--------------------- end --------------------- \n";
	close(DUMPFILE);
	$msg = "";
	return;
}


Re: Perl and bad characters

Posted by Dave Cottlehuber <dc...@skunkwerks.at>.
On Sat, 7 Apr 2018, at 07:24, Bill Stephenson wrote:
> I’ve been working on a “comments” feature for my “CherryPC blog”.
>
> I don’t want readers to have to make a user account to comment so I’m
> wanting to use a perl script on the server side that has the user
> credentials in the $url variable below.
>
> This is the code I’m using to update the document with the comment.
>
> # Convert the JSON to a perl object
>
> my $data_structure = decode_json(`curl -X GET $url`);

Hi Bill,

I think all the responses so far have given you a piece of the puzzle, but not the whole. Hopefully this completes the picture.

TLDR just use Store::CouchDB,  and all of the conversion will "just happen" directly

https://metacpan.org/release/Store-CouchDB

[snip]

given this JSON object as an example of what gets converted to/from perl objects:

GET /testy/data HTTP/1.1
Host: localhost:5984
Accept: application/json
Accept-Encoding: gzip, deflate
User-Agent: bat/0.1.0

HTTP/1.1 200 OK
Content-Length : 111
Cache-Control : must-revalidate
Server : CouchDB/1.7.1 (Erlang OTP/19)
Etag : "1-75710f70ac64e43a12fbc76ca3305852"
Date : Sat, 07 Apr 2018 21:58:56 GMT
Content-Type : application/json

{
  "_id": "data",
  "_rev": "1-75710f70ac64e43a12fbc76ca3305852",
  "object": {
    "nested": true
  },
  "array": [
    1,
    "2",
    false,
    null
  ]
}

The JSON should be self-explanatory but its useful to see how the same data is represented in perl.

So here's a very short perl program to show you how much easier this is using Store::CouchDB and the automatic perl<->JSON conversions.

#!/usr/bin/env perl

use Modern::Perl;
use Store::CouchDB;
use Data::Printer caller_info => 1;

# connect to our couch and existing db
my $c = Store::CouchDB->new({host => 'localhost', db => 'testy'});

# fetch a JSON doc and auto-convert it to a perl object
my $doc = $c->get_doc('data');

# dump out the perl object we got back
# it's not so scary
p $doc;

# look up a simple plain text value
say $doc->{'_id'};

# nested objects can be traversed by name
# true in JSON is mapped to 1 in perl (truthy)
# and will round-trip correctly
say $doc->{'object'}->{'nested'};
exit 0;


The output follows; in your terminal it will be in glorious colour:

 ./relax.pl 
Printing in line 15 of ./relax.pl:
\ {
    _id      "data",
    _rev     "1-75710f70ac64e43a12fbc76ca3305852",
    array    [
        [0] 1,
        [1] 2,
        [2] JSON::PP::Boolean  {
            Parents       Types::Serialiser::BooleanBase
            public methods (0)
            private methods (0)
            internals: 0
        },
        [3] undef
    ],
    object   {
        nested   JSON::PP::Boolean  {
            Parents       Types::Serialiser::BooleanBase
            public methods (0)
            private methods (0)
            internals: 1
        }
    }
}
data
1

note how the doc id "data" is a simple string, and the JSON::PP::Boolean object, and undef, is used to ensure that the JSON true, false, and null values respectively have an equivalent definition in perl that can be safely round-tripped back to JSON.

> From what I understand "use utf8” forces the all data to be utf-8...

I highly recommend getting chromatic's book https://pragprog.com/book/swperl/modern-perl-fourth-edition and reading it immediately. There's a free online version http://www.onyxneon.com/books/modern_perl/index.html but it's well worth the pennies. A few hours reading this will power up your perl skills enormously I think.

> If anyone has any ideas and/or advice on how to deal with this I’d sure
> appreciate them. I’ve pretty much ran out of them at this point.

TLDR use a library and then you can skip all the pain of shelling out to external programs. Also this is a very risky strategy - sanitising user input that is passed to/from curl and so forth is dangerous, especially on the internet. The same goes for date/time handling, look on metacpan.org and find a library to do the heavy lifting for you.

Finally, http://mojolicious.org/ is a very good modern perl web framework. In a future incarnation of your site/app it may be very helpful, but I suspect you've already done enough not to want to start from scratch.

Good luck!

A+
Dave