You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@whimsical.apache.org by Craig Russell <cr...@oracle.com> on 2016/08/27 18:41:54 UTC

Fwd: ICLA — Gosha Arinich aka goshakkk

This email causes (still pending email) an error sending mail.

I suspect it is because of the em-dash in the subject.

I don’t know how to look at or edit the pending.yml on the server.

From: Gosha Arinich <me...@goshakkk.name>
Date: Sat, 27 Aug 2016 03:03:00 +0300
Message-ID: <CA...@mail.gmail.com>
Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
To: secretary@apache.org

So, two issues: the pending mail needs to be sent; the bug needs to be fixed.

Thanks,

Craig

> Begin forwarded message:
> 
> From: Gosha Arinich <me...@goshakkk.name>
> Subject: ICLA — Gosha Arinich aka goshakkk
> Date: August 26, 2016 at 5:03:00 PM PDT
> To: secretary@apache.org
> 
> 
> 
> -- 
> Cheers,
> Gosha
> 

Craig L Russell
Secretary, Apache Software Foundation
clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Sam Ruby <ru...@intertwingly.net>.
On Sun, Aug 28, 2016 at 9:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>
> Not surprising given the torturous path that the subject goes through
> in the current workbench implementation.

For contrast, see:

https://whimsy.apache.org/secmail/201608/4b6a164b35/

Clicking on raw produces:

Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=

Clicking on headers produces:

Subject: ICLA — Gosha Arinich aka goshakkk

- Sam Ruby

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Sam Ruby <ru...@intertwingly.net>.
On Tue, Aug 30, 2016 at 1:11 PM, Craig Russell <cr...@oracle.com> wrote:
> This morning I had a similar issue, but with the email:cc: !binary encoding.
>
> Patching the subject line handling doesn’t fix the cc line.
>
> I’m concerned that *any* UTF8 encoding of email fields will cause the same issue.
>
> Can we find the place where the !binary encoding is chosen instead of “normal” UTF8?

It presumably would be here:

https://github.com/apache/whimsy/blob/5b3fff5fffeb98027a3e6201e38c2a0dc725d19b/www/secretary/workbench/file.cgi#L55

What bothers me most is that this generally worked in the past and I
don't know of anything that changed recently that could have affected
this.

I continue to maintain that the overall process (extracted from the
email with one program, results saved into a number of svn properties,
to later be extracted by parsing the results of the svn command line,
with the results inserted into an outbound email) is fragile.

Patching the secretary workbench until it works again (for a while?)
is possible, but I encourage you to find the exact email that you had
a problem with here: https://whimsy.apache.org/secmail/, click on both
raw and then headers and see the results.

With secmail, the raw email is retained on the server, and the process
is parse raw email, insert results into outbound email.  Not only are
there less moving parts, but (a) if there is a problem you can see the
intermediate results on the server, (b) any fixes that are made will
effectively be applied retroactively as the new parsing will be done
against the original (raw) message, and (c) I have a script
(parsemail.rb) that will download and parse all emails that have ever
been archived for secretary@ and run parse against them.

> Thanks,
>
> Craig

- Sam Ruby

>> On Aug 28, 2016, at 5:37 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>
>> On Sun, Aug 28, 2016 at 8:04 PM, Craig Russell <cr...@oracle.com> wrote:
>>> Can you please take a look and see why the rescue didn’t work?
>>
>> Logs can be found here:
>>
>> https://whimsy.apache.org/members/log/
>>
>> In particular, https://whimsy.apache.org/members/log/whimsy_error.log
>>
>> What I am still seeing is:
>>
>> _ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT
>> to UTF-8>, referer:
>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>
>> And further up the stack traceback:
>>
>> _WARN   /usr/local/rvm/gems/ruby-2.3.1/gems/mail-2.6.4/lib/mail/message.rb:1887:in
>> `to_s', referer:
>> https://whimsy.apache.org/secretary/workbench/file.cgi
>> _WARN   /x1/srv/whimsy/www/secretary/workbench/file.cgi:318:in `block
>> in send_email', referer:
>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>
>> So, you are not hitting the exception handler, and you are dying later
>> when trying to convert the message (which includes a binary subject)
>> into a string.
>>
>> The reason why you are not hitting the exception handler is that you
>> are not calling force_encoding.  A second problem is that if an
>> exception were to be raised, you wouldn't be catching it as the
>> exception needs to be qualified: Encoding::UndefinedConversionError
>>
>>> Thanks,
>>>
>>> Craig
>>
>> - Sam Ruby
>>
>>>> On Aug 28, 2016, at 4:30 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>>
>>>> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>> I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.
>>>>>
>>>>> What tools do I need to see what’s going on?
>>>>
>>>> What code is actually deployed can be seen on the last two lines of
>>>> the status page: https://whimsy.apache.org/status/
>>>>
>>>> Nothing in the (current) workbench shows the raw contents of
>>>> pending.yml.  It would be easy to add as a new CGI script.  It could
>>>> even be added as a new action in file.cgi.
>>>>
>>>> Alternately, we could ask for you to be added to have shell access to
>>>> whimsy-vm3.
>>>>
>>>>> Thanks,
>>>>>
>>>>> Craig
>>>>
>>>> - Sam Ruby
>>>>
>>>>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>>>>>
>>>>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>>>>>> <cr...@oracle.com> wrote:
>>>>>>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>>>>>>>
>>>>>>>>       # override subject?
>>>>>>>>       if vars.email_subject and !vars.email_subject.empty?
>>>>>>>>         if vars.email_subject =~ /^re:\s/i
>>>>>>>>           subject vars.email_subject
>>>>>>>>         else
>>>>>>>>           subject 'Re: ' + vars.email_subject
>>>>>>>>         end
>>>>>>>>       end
>>>>>>>>
>>>>>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>>>>>
>>>>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>>>>>>>
>>>>>>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>>>>>>>
>>>>>>> All of your analysis seems to be on target.
>>>>>>>
>>>>>>> This is from the log:
>>>>>>>
>>>>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>>>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>>>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>>>>>> UTF-8>, referer:
>>>>>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>>>>>
>>>>>>> Looking at pending.yml with the interactive ruby shell:
>>>>>>>
>>>>>>> $ irb
>>>>>>> irb(main):001:0> require 'yaml'
>>>>>>> => true
>>>>>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>>>>>> => [{"doctype"=>"icla",
>>>>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>>>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>>>>>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>>>>>>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>>>>>>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>>>>>>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>>>>>>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>>>>>>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>>>>>>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>>>>>>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>>>>>>> "email:addr"=>"me@goshakkk.name",
>>>>>>> "email:id"=>"<CA...@mail.gmail.com>",
>>>>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>>>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>>>>>> irb(main):003:0> pending.first['email:subject']
>>>>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>>>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>>>>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>>>>>
>>>>>>> Not surprising given the torturous path that the subject goes through
>>>>>>> in the current workbench implementation.  A cron job extracts the
>>>>>>> subject line from the email using python libraries and puts it into a
>>>>>>> svn property associated with the file.  The workbench then uses the
>>>>>>> command line to extract that property and parses the output from the
>>>>>>> command.  What is surprising is that if there is an error in handling
>>>>>>> non-ASCII characters why it hasn't shown up before and more
>>>>>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>>>>>> before, and I'm not sure what is different about this email.
>>>>>>
>>>>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation.
>>>>>>>
>>>>>>> In any case, suggested fixes:
>>>>>>>
>>>>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>>>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>>>>>> statement.  It should be harmless in cases that currently work, and
>>>>>>> should fix this case.  In cases where the data is binary data that
>>>>>>> can't be interpreted as utf-8, it will continue to blow up.
>>>>>>>
>>>>>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>>>>>> you don't need to set subject in the rescue clause as it was set by
>>>>>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>>>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>>>>>
>>>>>>> These changes should enable you to process the currently pending action.
>>>>>>
>>>>>> Now waiting for deployment…
>>>>>>
>>>>>> Craig
>>>>>>
>>>>>>>
>>>>>>>> Craig
>>>>>>>
>>>>>>> - Sam Ruby
>>>>>>>
>>>>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>>>
>>>>>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> - doctype: icla
>>>>>>>>> source: craig-russell-copy.pdf
>>>>>>>>> realname: Craig Russell Emdash
>>>>>>>>> pubname: Craig Russell Emdash
>>>>>>>>> email: craig.russell@oracle.com
>>>>>>>>> filename: craig-russell-emdash.pdf
>>>>>>>>> nname: Craig Russell
>>>>>>>>> nemail: craig.russell@oracle.com
>>>>>>>>> iname: Craig Russell
>>>>>>>>> iemail: craig.russell@oracle.com
>>>>>>>>> uname: Craig Russell
>>>>>>>>> uemail: craig.russell@oracle.com
>>>>>>>>> pname: Craig Russell
>>>>>>>>> pemail: craig.russell@oracle.com
>>>>>>>>> memail: craig.russell@oracle.com
>>>>>>>>> gname: Craig Russell
>>>>>>>>> gemail: craig.russell@oracle.com
>>>>>>>>> contact: Craig Russell
>>>>>>>>> cemail: craig.russell@oracle.com
>>>>>>>>> ipodling: " "
>>>>>>>>> email:addr: craig.russell@oracle.com
>>>>>>>>> email:id: "<02...@oracle.com>"
>>>>>>>>> email:name: Craig Russell
>>>>>>>>> email:subject: !binary |-
>>>>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>>>>>> svn:mime-type: application/pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>>>>>
>>>>>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>>>>>
>>>>>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>>>>>
>>>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>>>
>>>>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Craig
>>>>>>>>>>
>>>>>>>>>>> Begin forwarded message:
>>>>>>>>>>>
>>>>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Gosha
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Craig L Russell
>>>>>>>>>> Secretary, Apache Software Foundation
>>>>>>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>>>>>>>
>>>>>>>>> Craig L Russell
>>>>>>>>> Architect
>>>>>>>>> craig.russell@oracle.com
>>>>>>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Craig L Russell
>>>>>>>> Architect
>>>>>>>> craig.russell@oracle.com
>>>>>>>> P.S. A good JDO? O, Gasp!
>>>>>>
>>>>>> Craig L Russell
>>>>>> Architect
>>>>>> craig.russell@oracle.com
>>>>>> P.S. A good JDO? O, Gasp!
>>>>>
>>>>> Craig L Russell
>>>>> Architect
>>>>> craig.russell@oracle.com
>>>>> P.S. A good JDO? O, Gasp!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>> Craig L Russell
>>> Architect
>>> craig.russell@oracle.com
>>> P.S. A good JDO? O, Gasp!
>>>
>>>
>>>
>>>
>>>
>
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S. A good JDO? O, Gasp!
>
>
>
>
>

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
This morning I had a similar issue, but with the email:cc: !binary encoding. 

Patching the subject line handling doesn’t fix the cc line.

I’m concerned that *any* UTF8 encoding of email fields will cause the same issue. 

Can we find the place where the !binary encoding is chosen instead of “normal” UTF8?

Thanks,

Craig

> On Aug 28, 2016, at 5:37 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> 
> On Sun, Aug 28, 2016 at 8:04 PM, Craig Russell <cr...@oracle.com> wrote:
>> Can you please take a look and see why the rescue didn’t work?
> 
> Logs can be found here:
> 
> https://whimsy.apache.org/members/log/
> 
> In particular, https://whimsy.apache.org/members/log/whimsy_error.log
> 
> What I am still seeing is:
> 
> _ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT
> to UTF-8>, referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> 
> And further up the stack traceback:
> 
> _WARN   /usr/local/rvm/gems/ruby-2.3.1/gems/mail-2.6.4/lib/mail/message.rb:1887:in
> `to_s', referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> _WARN   /x1/srv/whimsy/www/secretary/workbench/file.cgi:318:in `block
> in send_email', referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> 
> So, you are not hitting the exception handler, and you are dying later
> when trying to convert the message (which includes a binary subject)
> into a string.
> 
> The reason why you are not hitting the exception handler is that you
> are not calling force_encoding.  A second problem is that if an
> exception were to be raised, you wouldn't be catching it as the
> exception needs to be qualified: Encoding::UndefinedConversionError
> 
>> Thanks,
>> 
>> Craig
> 
> - Sam Ruby
> 
>>> On Aug 28, 2016, at 4:30 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>> 
>>> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <cr...@oracle.com> wrote:
>>>> I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.
>>>> 
>>>> What tools do I need to see what’s going on?
>>> 
>>> What code is actually deployed can be seen on the last two lines of
>>> the status page: https://whimsy.apache.org/status/
>>> 
>>> Nothing in the (current) workbench shows the raw contents of
>>> pending.yml.  It would be easy to add as a new CGI script.  It could
>>> even be added as a new action in file.cgi.
>>> 
>>> Alternately, we could ask for you to be added to have shell access to
>>> whimsy-vm3.
>>> 
>>>> Thanks,
>>>> 
>>>> Craig
>>> 
>>> - Sam Ruby
>>> 
>>>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>> 
>>>>>> 
>>>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>>>> 
>>>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>>>>> <cr...@oracle.com> wrote:
>>>>>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>>>>>> 
>>>>>>>       # override subject?
>>>>>>>       if vars.email_subject and !vars.email_subject.empty?
>>>>>>>         if vars.email_subject =~ /^re:\s/i
>>>>>>>           subject vars.email_subject
>>>>>>>         else
>>>>>>>           subject 'Re: ' + vars.email_subject
>>>>>>>         end
>>>>>>>       end
>>>>>>> 
>>>>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>>>> 
>>>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>>>>>> 
>>>>>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>>>>>> 
>>>>>> All of your analysis seems to be on target.
>>>>>> 
>>>>>> This is from the log:
>>>>>> 
>>>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>>>>> UTF-8>, referer:
>>>>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>>>> 
>>>>>> Looking at pending.yml with the interactive ruby shell:
>>>>>> 
>>>>>> $ irb
>>>>>> irb(main):001:0> require 'yaml'
>>>>>> => true
>>>>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>>>>> => [{"doctype"=>"icla",
>>>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>>>>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>>>>>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>>>>>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>>>>>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>>>>>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>>>>>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>>>>>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>>>>>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>>>>>> "email:addr"=>"me@goshakkk.name",
>>>>>> "email:id"=>"<CA...@mail.gmail.com>",
>>>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>>>>> irb(main):003:0> pending.first['email:subject']
>>>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>>>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>>>> 
>>>>>> Not surprising given the torturous path that the subject goes through
>>>>>> in the current workbench implementation.  A cron job extracts the
>>>>>> subject line from the email using python libraries and puts it into a
>>>>>> svn property associated with the file.  The workbench then uses the
>>>>>> command line to extract that property and parses the output from the
>>>>>> command.  What is surprising is that if there is an error in handling
>>>>>> non-ASCII characters why it hasn't shown up before and more
>>>>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>>>>> before, and I'm not sure what is different about this email.
>>>>> 
>>>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation.
>>>>>> 
>>>>>> In any case, suggested fixes:
>>>>>> 
>>>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>>>>> statement.  It should be harmless in cases that currently work, and
>>>>>> should fix this case.  In cases where the data is binary data that
>>>>>> can't be interpreted as utf-8, it will continue to blow up.
>>>>>> 
>>>>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>>>>> you don't need to set subject in the rescue clause as it was set by
>>>>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>>>> 
>>>>>> These changes should enable you to process the currently pending action.
>>>>> 
>>>>> Now waiting for deployment…
>>>>> 
>>>>> Craig
>>>>> 
>>>>>> 
>>>>>>> Craig
>>>>>> 
>>>>>> - Sam Ruby
>>>>>> 
>>>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>> 
>>>>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> - doctype: icla
>>>>>>>> source: craig-russell-copy.pdf
>>>>>>>> realname: Craig Russell Emdash
>>>>>>>> pubname: Craig Russell Emdash
>>>>>>>> email: craig.russell@oracle.com
>>>>>>>> filename: craig-russell-emdash.pdf
>>>>>>>> nname: Craig Russell
>>>>>>>> nemail: craig.russell@oracle.com
>>>>>>>> iname: Craig Russell
>>>>>>>> iemail: craig.russell@oracle.com
>>>>>>>> uname: Craig Russell
>>>>>>>> uemail: craig.russell@oracle.com
>>>>>>>> pname: Craig Russell
>>>>>>>> pemail: craig.russell@oracle.com
>>>>>>>> memail: craig.russell@oracle.com
>>>>>>>> gname: Craig Russell
>>>>>>>> gemail: craig.russell@oracle.com
>>>>>>>> contact: Craig Russell
>>>>>>>> cemail: craig.russell@oracle.com
>>>>>>>> ipodling: " "
>>>>>>>> email:addr: craig.russell@oracle.com
>>>>>>>> email:id: "<02...@oracle.com>"
>>>>>>>> email:name: Craig Russell
>>>>>>>> email:subject: !binary |-
>>>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>>>>> svn:mime-type: application/pdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>>> 
>>>>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>>>> 
>>>>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>>>> 
>>>>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>>>> 
>>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>> 
>>>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Craig
>>>>>>>>> 
>>>>>>>>>> Begin forwarded message:
>>>>>>>>>> 
>>>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Cheers,
>>>>>>>>>> Gosha
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Craig L Russell
>>>>>>>>> Secretary, Apache Software Foundation
>>>>>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>>>>>> 
>>>>>>>> Craig L Russell
>>>>>>>> Architect
>>>>>>>> craig.russell@oracle.com
>>>>>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> Craig L Russell
>>>>>>> Architect
>>>>>>> craig.russell@oracle.com
>>>>>>> P.S. A good JDO? O, Gasp!
>>>>> 
>>>>> Craig L Russell
>>>>> Architect
>>>>> craig.russell@oracle.com
>>>>> P.S. A good JDO? O, Gasp!
>>>> 
>>>> Craig L Russell
>>>> Architect
>>>> craig.russell@oracle.com
>>>> P.S. A good JDO? O, Gasp!
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> Craig L Russell
>> Architect
>> craig.russell@oracle.com
>> P.S. A good JDO? O, Gasp!
>> 
>> 
>> 
>> 
>> 

Craig L Russell
Architect
craig.russell@oracle.com
P.S. A good JDO? O, Gasp!






Re: ICLA — Gosha Arinich aka goshakkk

Posted by Sam Ruby <ru...@intertwingly.net>.
On Sun, Aug 28, 2016 at 8:04 PM, Craig Russell <cr...@oracle.com> wrote:
> Can you please take a look and see why the rescue didn’t work?

Logs can be found here:

https://whimsy.apache.org/members/log/

In particular, https://whimsy.apache.org/members/log/whimsy_error.log

What I am still seeing is:

_ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT
to UTF-8>, referer:
https://whimsy.apache.org/secretary/workbench/file.cgi

And further up the stack traceback:

_WARN   /usr/local/rvm/gems/ruby-2.3.1/gems/mail-2.6.4/lib/mail/message.rb:1887:in
`to_s', referer:
https://whimsy.apache.org/secretary/workbench/file.cgi
_WARN   /x1/srv/whimsy/www/secretary/workbench/file.cgi:318:in `block
in send_email', referer:
https://whimsy.apache.org/secretary/workbench/file.cgi

So, you are not hitting the exception handler, and you are dying later
when trying to convert the message (which includes a binary subject)
into a string.

The reason why you are not hitting the exception handler is that you
are not calling force_encoding.  A second problem is that if an
exception were to be raised, you wouldn't be catching it as the
exception needs to be qualified: Encoding::UndefinedConversionError

> Thanks,
>
> Craig

- Sam Ruby

>> On Aug 28, 2016, at 4:30 PM, Sam Ruby <ru...@intertwingly.net> wrote:
>>
>> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <cr...@oracle.com> wrote:
>>> I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.
>>>
>>> What tools do I need to see what’s going on?
>>
>> What code is actually deployed can be seen on the last two lines of
>> the status page: https://whimsy.apache.org/status/
>>
>> Nothing in the (current) workbench shows the raw contents of
>> pending.yml.  It would be easy to add as a new CGI script.  It could
>> even be added as a new action in file.cgi.
>>
>> Alternately, we could ask for you to be added to have shell access to
>> whimsy-vm3.
>>
>>> Thanks,
>>>
>>> Craig
>>
>> - Sam Ruby
>>
>>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>
>>>>>
>>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>>>
>>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>>>> <cr...@oracle.com> wrote:
>>>>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>>>>>
>>>>>>        # override subject?
>>>>>>        if vars.email_subject and !vars.email_subject.empty?
>>>>>>          if vars.email_subject =~ /^re:\s/i
>>>>>>            subject vars.email_subject
>>>>>>          else
>>>>>>            subject 'Re: ' + vars.email_subject
>>>>>>          end
>>>>>>        end
>>>>>>
>>>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>>>
>>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>>>>>
>>>>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>>>>>
>>>>> All of your analysis seems to be on target.
>>>>>
>>>>> This is from the log:
>>>>>
>>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>>>> UTF-8>, referer:
>>>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>>>
>>>>> Looking at pending.yml with the interactive ruby shell:
>>>>>
>>>>> $ irb
>>>>> irb(main):001:0> require 'yaml'
>>>>> => true
>>>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>>>> => [{"doctype"=>"icla",
>>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>>>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>>>>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>>>>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>>>>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>>>>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>>>>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>>>>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>>>>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>>>>> "email:addr"=>"me@goshakkk.name",
>>>>> "email:id"=>"<CA...@mail.gmail.com>",
>>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>>>> irb(main):003:0> pending.first['email:subject']
>>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>>>
>>>>> Not surprising given the torturous path that the subject goes through
>>>>> in the current workbench implementation.  A cron job extracts the
>>>>> subject line from the email using python libraries and puts it into a
>>>>> svn property associated with the file.  The workbench then uses the
>>>>> command line to extract that property and parses the output from the
>>>>> command.  What is surprising is that if there is an error in handling
>>>>> non-ASCII characters why it hasn't shown up before and more
>>>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>>>> before, and I'm not sure what is different about this email.
>>>>
>>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation.
>>>>>
>>>>> In any case, suggested fixes:
>>>>>
>>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>>>> statement.  It should be harmless in cases that currently work, and
>>>>> should fix this case.  In cases where the data is binary data that
>>>>> can't be interpreted as utf-8, it will continue to blow up.
>>>>>
>>>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>>>> you don't need to set subject in the rescue clause as it was set by
>>>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>>>
>>>>> These changes should enable you to process the currently pending action.
>>>>
>>>> Now waiting for deployment…
>>>>
>>>> Craig
>>>>
>>>>>
>>>>>> Craig
>>>>>
>>>>> - Sam Ruby
>>>>>
>>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>
>>>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>>>
>>>>>>> ---
>>>>>>> - doctype: icla
>>>>>>> source: craig-russell-copy.pdf
>>>>>>> realname: Craig Russell Emdash
>>>>>>> pubname: Craig Russell Emdash
>>>>>>> email: craig.russell@oracle.com
>>>>>>> filename: craig-russell-emdash.pdf
>>>>>>> nname: Craig Russell
>>>>>>> nemail: craig.russell@oracle.com
>>>>>>> iname: Craig Russell
>>>>>>> iemail: craig.russell@oracle.com
>>>>>>> uname: Craig Russell
>>>>>>> uemail: craig.russell@oracle.com
>>>>>>> pname: Craig Russell
>>>>>>> pemail: craig.russell@oracle.com
>>>>>>> memail: craig.russell@oracle.com
>>>>>>> gname: Craig Russell
>>>>>>> gemail: craig.russell@oracle.com
>>>>>>> contact: Craig Russell
>>>>>>> cemail: craig.russell@oracle.com
>>>>>>> ipodling: " "
>>>>>>> email:addr: craig.russell@oracle.com
>>>>>>> email:id: "<02...@oracle.com>"
>>>>>>> email:name: Craig Russell
>>>>>>> email:subject: !binary |-
>>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>>>> svn:mime-type: application/pdf
>>>>>>>
>>>>>>>
>>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>>>
>>>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>>>
>>>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>>>
>>>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>>>
>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>
>>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Craig
>>>>>>>>
>>>>>>>>> Begin forwarded message:
>>>>>>>>>
>>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers,
>>>>>>>>> Gosha
>>>>>>>>>
>>>>>>>>
>>>>>>>> Craig L Russell
>>>>>>>> Secretary, Apache Software Foundation
>>>>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>>>>>
>>>>>>> Craig L Russell
>>>>>>> Architect
>>>>>>> craig.russell@oracle.com
>>>>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Craig L Russell
>>>>>> Architect
>>>>>> craig.russell@oracle.com
>>>>>> P.S. A good JDO? O, Gasp!
>>>>
>>>> Craig L Russell
>>>> Architect
>>>> craig.russell@oracle.com
>>>> P.S. A good JDO? O, Gasp!
>>>
>>> Craig L Russell
>>> Architect
>>> craig.russell@oracle.com
>>> P.S. A good JDO? O, Gasp!
>>>
>>>
>>>
>>>
>>>
>
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S. A good JDO? O, Gasp!
>
>
>
>
>

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
Can you please take a look and see why the rescue didn’t work?

Thanks,

Craig

> On Aug 28, 2016, at 4:30 PM, Sam Ruby <ru...@intertwingly.net> wrote:
> 
> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <cr...@oracle.com> wrote:
>> I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.
>> 
>> What tools do I need to see what’s going on?
> 
> What code is actually deployed can be seen on the last two lines of
> the status page: https://whimsy.apache.org/status/
> 
> Nothing in the (current) workbench shows the raw contents of
> pending.yml.  It would be easy to add as a new CGI script.  It could
> even be added as a new action in file.cgi.
> 
> Alternately, we could ask for you to be added to have shell access to
> whimsy-vm3.
> 
>> Thanks,
>> 
>> Craig
> 
> - Sam Ruby
> 
>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
>>> 
>>>> 
>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>> 
>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>>> <cr...@oracle.com> wrote:
>>>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>>>> 
>>>>>        # override subject?
>>>>>        if vars.email_subject and !vars.email_subject.empty?
>>>>>          if vars.email_subject =~ /^re:\s/i
>>>>>            subject vars.email_subject
>>>>>          else
>>>>>            subject 'Re: ' + vars.email_subject
>>>>>          end
>>>>>        end
>>>>> 
>>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>> 
>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>>>> 
>>>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>>>> 
>>>> All of your analysis seems to be on target.
>>>> 
>>>> This is from the log:
>>>> 
>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>>> UTF-8>, referer:
>>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>> 
>>>> Looking at pending.yml with the interactive ruby shell:
>>>> 
>>>> $ irb
>>>> irb(main):001:0> require 'yaml'
>>>> => true
>>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>>> => [{"doctype"=>"icla",
>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>>>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>>>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>>>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>>>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>>>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>>>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>>>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>>>> "email:addr"=>"me@goshakkk.name",
>>>> "email:id"=>"<CA...@mail.gmail.com>",
>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>>> irb(main):003:0> pending.first['email:subject']
>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>> 
>>>> Not surprising given the torturous path that the subject goes through
>>>> in the current workbench implementation.  A cron job extracts the
>>>> subject line from the email using python libraries and puts it into a
>>>> svn property associated with the file.  The workbench then uses the
>>>> command line to extract that property and parses the output from the
>>>> command.  What is surprising is that if there is an error in handling
>>>> non-ASCII characters why it hasn't shown up before and more
>>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>>> before, and I'm not sure what is different about this email.
>>> 
>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation.
>>>> 
>>>> In any case, suggested fixes:
>>>> 
>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>>> statement.  It should be harmless in cases that currently work, and
>>>> should fix this case.  In cases where the data is binary data that
>>>> can't be interpreted as utf-8, it will continue to blow up.
>>>> 
>>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>>> you don't need to set subject in the rescue clause as it was set by
>>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>> 
>>>> These changes should enable you to process the currently pending action.
>>> 
>>> Now waiting for deployment…
>>> 
>>> Craig
>>> 
>>>> 
>>>>> Craig
>>>> 
>>>> - Sam Ruby
>>>> 
>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>> 
>>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>> 
>>>>>> ---
>>>>>> - doctype: icla
>>>>>> source: craig-russell-copy.pdf
>>>>>> realname: Craig Russell Emdash
>>>>>> pubname: Craig Russell Emdash
>>>>>> email: craig.russell@oracle.com
>>>>>> filename: craig-russell-emdash.pdf
>>>>>> nname: Craig Russell
>>>>>> nemail: craig.russell@oracle.com
>>>>>> iname: Craig Russell
>>>>>> iemail: craig.russell@oracle.com
>>>>>> uname: Craig Russell
>>>>>> uemail: craig.russell@oracle.com
>>>>>> pname: Craig Russell
>>>>>> pemail: craig.russell@oracle.com
>>>>>> memail: craig.russell@oracle.com
>>>>>> gname: Craig Russell
>>>>>> gemail: craig.russell@oracle.com
>>>>>> contact: Craig Russell
>>>>>> cemail: craig.russell@oracle.com
>>>>>> ipodling: " "
>>>>>> email:addr: craig.russell@oracle.com
>>>>>> email:id: "<02...@oracle.com>"
>>>>>> email:name: Craig Russell
>>>>>> email:subject: !binary |-
>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>>> svn:mime-type: application/pdf
>>>>>> 
>>>>>> 
>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>> 
>>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>> 
>>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>> 
>>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>> 
>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>> 
>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Craig
>>>>>>> 
>>>>>>>> Begin forwarded message:
>>>>>>>> 
>>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Cheers,
>>>>>>>> Gosha
>>>>>>>> 
>>>>>>> 
>>>>>>> Craig L Russell
>>>>>>> Secretary, Apache Software Foundation
>>>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>>>> 
>>>>>> Craig L Russell
>>>>>> Architect
>>>>>> craig.russell@oracle.com
>>>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Craig L Russell
>>>>> Architect
>>>>> craig.russell@oracle.com
>>>>> P.S. A good JDO? O, Gasp!
>>> 
>>> Craig L Russell
>>> Architect
>>> craig.russell@oracle.com
>>> P.S. A good JDO? O, Gasp!
>> 
>> Craig L Russell
>> Architect
>> craig.russell@oracle.com
>> P.S. A good JDO? O, Gasp!
>> 
>> 
>> 
>> 
>> 

Craig L Russell
Architect
craig.russell@oracle.com
P.S. A good JDO? O, Gasp!






Re: ICLA — Gosha Arinich aka goshakkk

Posted by Sam Ruby <ru...@intertwingly.net>.
On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <cr...@oracle.com> wrote:
> I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.
>
> What tools do I need to see what’s going on?

What code is actually deployed can be seen on the last two lines of
the status page: https://whimsy.apache.org/status/

Nothing in the (current) workbench shows the raw contents of
pending.yml.  It would be easy to add as a new CGI script.  It could
even be added as a new action in file.cgi.

Alternately, we could ask for you to be added to have shell access to
whimsy-vm3.

> Thanks,
>
> Craig

- Sam Ruby

>> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
>>
>>>
>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>>>
>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>> <cr...@oracle.com> wrote:
>>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>>>
>>>>         # override subject?
>>>>         if vars.email_subject and !vars.email_subject.empty?
>>>>           if vars.email_subject =~ /^re:\s/i
>>>>             subject vars.email_subject
>>>>           else
>>>>             subject 'Re: ' + vars.email_subject
>>>>           end
>>>>         end
>>>>
>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>
>>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>>>
>>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>>>
>>> All of your analysis seems to be on target.
>>>
>>> This is from the log:
>>>
>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>> UTF-8>, referer:
>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>
>>> Looking at pending.yml with the interactive ruby shell:
>>>
>>> $ irb
>>> irb(main):001:0> require 'yaml'
>>> => true
>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>> => [{"doctype"=>"icla",
>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>>> "email:addr"=>"me@goshakkk.name",
>>> "email:id"=>"<CA...@mail.gmail.com>",
>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>> irb(main):003:0> pending.first['email:subject']
>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>
>>> Not surprising given the torturous path that the subject goes through
>>> in the current workbench implementation.  A cron job extracts the
>>> subject line from the email using python libraries and puts it into a
>>> svn property associated with the file.  The workbench then uses the
>>> command line to extract that property and parses the output from the
>>> command.  What is surprising is that if there is an error in handling
>>> non-ASCII characters why it hasn't shown up before and more
>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>> before, and I'm not sure what is different about this email.
>>
>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation.
>>>
>>> In any case, suggested fixes:
>>>
>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>> statement.  It should be harmless in cases that currently work, and
>>> should fix this case.  In cases where the data is binary data that
>>> can't be interpreted as utf-8, it will continue to blow up.
>>>
>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>> you don't need to set subject in the rescue clause as it was set by
>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>
>>> These changes should enable you to process the currently pending action.
>>
>> Now waiting for deployment…
>>
>> Craig
>>
>>>
>>>> Craig
>>>
>>> - Sam Ruby
>>>
>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>>>
>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>
>>>>> ---
>>>>> - doctype: icla
>>>>> source: craig-russell-copy.pdf
>>>>> realname: Craig Russell Emdash
>>>>> pubname: Craig Russell Emdash
>>>>> email: craig.russell@oracle.com
>>>>> filename: craig-russell-emdash.pdf
>>>>> nname: Craig Russell
>>>>> nemail: craig.russell@oracle.com
>>>>> iname: Craig Russell
>>>>> iemail: craig.russell@oracle.com
>>>>> uname: Craig Russell
>>>>> uemail: craig.russell@oracle.com
>>>>> pname: Craig Russell
>>>>> pemail: craig.russell@oracle.com
>>>>> memail: craig.russell@oracle.com
>>>>> gname: Craig Russell
>>>>> gemail: craig.russell@oracle.com
>>>>> contact: Craig Russell
>>>>> cemail: craig.russell@oracle.com
>>>>> ipodling: " "
>>>>> email:addr: craig.russell@oracle.com
>>>>> email:id: "<02...@oracle.com>"
>>>>> email:name: Craig Russell
>>>>> email:subject: !binary |-
>>>>>  RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>> svn:mime-type: application/pdf
>>>>>
>>>>>
>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>>>
>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>
>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>
>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>
>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>
>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Craig
>>>>>>
>>>>>>> Begin forwarded message:
>>>>>>>
>>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Gosha
>>>>>>>
>>>>>>
>>>>>> Craig L Russell
>>>>>> Secretary, Apache Software Foundation
>>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>>>
>>>>> Craig L Russell
>>>>> Architect
>>>>> craig.russell@oracle.com
>>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Craig L Russell
>>>> Architect
>>>> craig.russell@oracle.com
>>>> P.S. A good JDO? O, Gasp!
>>
>> Craig L Russell
>> Architect
>> craig.russell@oracle.com
>> P.S. A good JDO? O, Gasp!
>
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S. A good JDO? O, Gasp!
>
>
>
>
>

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
I’m blind here. I can’t see the pending.yml. I can’t see the error console. I don’t even know if my change was pushed to production.

What tools do I need to see what’s going on?

Thanks,

Craig

> On Aug 28, 2016, at 2:30 PM, Craig Russell <cr...@oracle.com> wrote:
> 
>> 
>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
>> 
>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>> <cr...@oracle.com> wrote:
>>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>> 
>>>         # override subject?
>>>         if vars.email_subject and !vars.email_subject.empty?
>>>           if vars.email_subject =~ /^re:\s/i
>>>             subject vars.email_subject
>>>           else
>>>             subject 'Re: ' + vars.email_subject
>>>           end
>>>         end
>>> 
>>> I can’t see where the actual problem is, but is there a way to either;
>>> 
>>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>>> 
>>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
>> 
>> All of your analysis seems to be on target.
>> 
>> This is from the log:
>> 
>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>> UTF-8>, referer:
>> https://whimsy.apache.org/secretary/workbench/file.cgi
>> 
>> Looking at pending.yml with the interactive ruby shell:
>> 
>> $ irb
>> irb(main):001:0> require 'yaml'
>> => true
>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>> => [{"doctype"=>"icla",
>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
>> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
>> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
>> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
>> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
>> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
>> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
>> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
>> "email:addr"=>"me@goshakkk.name",
>> "email:id"=>"<CA...@mail.gmail.com>",
>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>> irb(main):003:0> pending.first['email:subject']
>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>> => "ICLA — Gosha Arinich aka goshakkk"
>> 
>> Not surprising given the torturous path that the subject goes through
>> in the current workbench implementation.  A cron job extracts the
>> subject line from the email using python libraries and puts it into a
>> svn property associated with the file.  The workbench then uses the
>> command line to extract that property and parses the output from the
>> command.  What is surprising is that if there is an error in handling
>> non-ASCII characters why it hasn't shown up before and more
>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>> before, and I'm not sure what is different about this email.
> 
> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation. 
>> 
>> In any case, suggested fixes:
>> 
>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>> statement.  It should be harmless in cases that currently work, and
>> should fix this case.  In cases where the data is binary data that
>> can't be interpreted as utf-8, it will continue to blow up.
>> 
>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>> you don't need to set subject in the rescue clause as it was set by
>> the relevant erb file (e.g. icla.erb).  More information on rescue
>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>> 
>> These changes should enable you to process the currently pending action.
> 
> Now waiting for deployment…
> 
> Craig
> 
>> 
>>> Craig
>> 
>> - Sam Ruby
>> 
>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>>> 
>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>> 
>>>> ---
>>>> - doctype: icla
>>>> source: craig-russell-copy.pdf
>>>> realname: Craig Russell Emdash
>>>> pubname: Craig Russell Emdash
>>>> email: craig.russell@oracle.com
>>>> filename: craig-russell-emdash.pdf
>>>> nname: Craig Russell
>>>> nemail: craig.russell@oracle.com
>>>> iname: Craig Russell
>>>> iemail: craig.russell@oracle.com
>>>> uname: Craig Russell
>>>> uemail: craig.russell@oracle.com
>>>> pname: Craig Russell
>>>> pemail: craig.russell@oracle.com
>>>> memail: craig.russell@oracle.com
>>>> gname: Craig Russell
>>>> gemail: craig.russell@oracle.com
>>>> contact: Craig Russell
>>>> cemail: craig.russell@oracle.com
>>>> ipodling: " "
>>>> email:addr: craig.russell@oracle.com
>>>> email:id: "<02...@oracle.com>"
>>>> email:name: Craig Russell
>>>> email:subject: !binary |-
>>>>  RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>> svn:mime-type: application/pdf
>>>> 
>>>> 
>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>>> 
>>>>> This email causes (still pending email) an error sending mail.
>>>>> 
>>>>> I suspect it is because of the em-dash in the subject.
>>>>> 
>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>> 
>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>> 
>>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Craig
>>>>> 
>>>>>> Begin forwarded message:
>>>>>> 
>>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Cheers,
>>>>>> Gosha
>>>>>> 
>>>>> 
>>>>> Craig L Russell
>>>>> Secretary, Apache Software Foundation
>>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>>> 
>>>> Craig L Russell
>>>> Architect
>>>> craig.russell@oracle.com
>>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> Craig L Russell
>>> Architect
>>> craig.russell@oracle.com
>>> P.S. A good JDO? O, Gasp!
> 
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S. A good JDO? O, Gasp!

Craig L Russell
Architect
craig.russell@oracle.com
P.S. A good JDO? O, Gasp!






Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
> On Aug 28, 2016, at 6:04 AM, Sam Ruby <ru...@intertwingly.net> wrote:
> 
> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
> <cr...@oracle.com> wrote:
>> The processing of email::subject seems to be localized to file.cgi ca. 261
>> 
>>          # override subject?
>>          if vars.email_subject and !vars.email_subject.empty?
>>            if vars.email_subject =~ /^re:\s/i
>>              subject vars.email_subject
>>            else
>>              subject 'Re: ' + vars.email_subject
>>            end
>>          end
>> 
>> I can’t see where the actual problem is, but is there a way to either;
>> 
>> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>> 
>> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.
> 
> All of your analysis seems to be on target.
> 
> This is from the log:
> 
> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
> UTF-8>, referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> 
> Looking at pending.yml with the interactive ruby shell:
> 
> $ irb
> irb(main):001:0> require 'yaml'
> => true
> irb(main):002:0> pending = YAML.load_file('pending.yml')
> => [{"doctype"=>"icla",
> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
> "email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
> "nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
> "iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
> "uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
> "pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
> "memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
> "gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
> "cemail"=>"me@goshakkk.name", "ipodling"=>" ",
> "email:addr"=>"me@goshakkk.name",
> "email:id"=>"<CA...@mail.gmail.com>",
> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
> irb(main):003:0> pending.first['email:subject']
> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
> => "ICLA — Gosha Arinich aka goshakkk"
> 
> Not surprising given the torturous path that the subject goes through
> in the current workbench implementation.  A cron job extracts the
> subject line from the email using python libraries and puts it into a
> svn property associated with the file.  The workbench then uses the
> command line to extract that property and parses the output from the
> command.  What is surprising is that if there is an error in handling
> non-ASCII characters why it hasn't shown up before and more
> frequently.  I'm pretty sure that non-ASCII characters have been seen
> before, and I'm not sure what is different about this email.

I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in the triple-character UTF8 representation. 
> 
> In any case, suggested fixes:
> 
> 1) add "'vars.email_subject.force_encoding('utf-8') if
> vars.email_subject.encoding == Encoding::BINARY" before the inner if
> statement.  It should be harmless in cases that currently work, and
> should fix this case.  In cases where the data is binary data that
> can't be interpreted as utf-8, it will continue to blow up.
> 
> 2) add 'begin...rescue...end' around the inner if statement.  Note:
> you don't need to set subject in the rescue clause as it was set by
> the relevant erb file (e.g. icla.erb).  More information on rescue
> statements: http://phrogz.net/programmingruby/tut_exceptions.html
> 
> These changes should enable you to process the currently pending action.

Now waiting for deployment…

Craig

> 
>> Craig
> 
> - Sam Ruby
> 
>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>> 
>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>> 
>>> ---
>>> - doctype: icla
>>> source: craig-russell-copy.pdf
>>> realname: Craig Russell Emdash
>>> pubname: Craig Russell Emdash
>>> email: craig.russell@oracle.com
>>> filename: craig-russell-emdash.pdf
>>> nname: Craig Russell
>>> nemail: craig.russell@oracle.com
>>> iname: Craig Russell
>>> iemail: craig.russell@oracle.com
>>> uname: Craig Russell
>>> uemail: craig.russell@oracle.com
>>> pname: Craig Russell
>>> pemail: craig.russell@oracle.com
>>> memail: craig.russell@oracle.com
>>> gname: Craig Russell
>>> gemail: craig.russell@oracle.com
>>> contact: Craig Russell
>>> cemail: craig.russell@oracle.com
>>> ipodling: " "
>>> email:addr: craig.russell@oracle.com
>>> email:id: "<02...@oracle.com>"
>>> email:name: Craig Russell
>>> email:subject: !binary |-
>>>   RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>> svn:mime-type: application/pdf
>>> 
>>> 
>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>> 
>>>> This email causes (still pending email) an error sending mail.
>>>> 
>>>> I suspect it is because of the em-dash in the subject.
>>>> 
>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>> 
>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>> To: secretary@apache.org <ma...@apache.org>
>>>> 
>>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>> 
>>>> Thanks,
>>>> 
>>>> Craig
>>>> 
>>>>> Begin forwarded message:
>>>>> 
>>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>> To: secretary@apache.org <ma...@apache.org>
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Cheers,
>>>>> Gosha
>>>>> 
>>>> 
>>>> Craig L Russell
>>>> Secretary, Apache Software Foundation
>>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>> 
>>> Craig L Russell
>>> Architect
>>> craig.russell@oracle.com
>>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> Craig L Russell
>> Architect
>> craig.russell@oracle.com
>> P.S. A good JDO? O, Gasp!

Craig L Russell
Architect
craig.russell@oracle.com
P.S. A good JDO? O, Gasp!






Re: ICLA — Gosha Arinich aka goshakkk

Posted by Sam Ruby <ru...@intertwingly.net>.
On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
<cr...@oracle.com> wrote:
> The processing of email::subject seems to be localized to file.cgi ca. 261
>
>           # override subject?
>           if vars.email_subject and !vars.email_subject.empty?
>             if vars.email_subject =~ /^re:\s/i
>               subject vars.email_subject
>             else
>               subject 'Re: ' + vars.email_subject
>             end
>           end
>
> I can’t see where the actual problem is, but is there a way to either;
>
> 1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary
>
> 2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.

All of your analysis seems to be on target.

This is from the log:

[Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
#<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
UTF-8>, referer:
https://whimsy.apache.org/secretary/workbench/file.cgi

Looking at pending.yml with the interactive ruby shell:

$ irb
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> pending = YAML.load_file('pending.yml')
=> [{"doctype"=>"icla",
"source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
"realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
"email"=>"me@goshakkk.name", "filename"=>"heorhi-arynich.pdf",
"nname"=>"Gosha Arinich", "nemail"=>"me@goshakkk.name",
"iname"=>"Gosha Arinich", "iemail"=>"me@goshakkk.name",
"uname"=>"Gosha Arinich", "uemail"=>"me@goshakkk.name",
"pname"=>"Gosha Arinich", "pemail"=>"me@goshakkk.name",
"memail"=>"me@goshakkk.name", "gname"=>"Gosha Arinich",
"gemail"=>"me@goshakkk.name", "contact"=>"Gosha Arinich",
"cemail"=>"me@goshakkk.name", "ipodling"=>" ",
"email:addr"=>"me@goshakkk.name",
"email:id"=>"<CA...@mail.gmail.com>",
"email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
irb(main):003:0> pending.first['email:subject']
=> "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
=> "ICLA — Gosha Arinich aka goshakkk"

Not surprising given the torturous path that the subject goes through
in the current workbench implementation.  A cron job extracts the
subject line from the email using python libraries and puts it into a
svn property associated with the file.  The workbench then uses the
command line to extract that property and parses the output from the
command.  What is surprising is that if there is an error in handling
non-ASCII characters why it hasn't shown up before and more
frequently.  I'm pretty sure that non-ASCII characters have been seen
before, and I'm not sure what is different about this email.

In any case, suggested fixes:

1) add "'vars.email_subject.force_encoding('utf-8') if
vars.email_subject.encoding == Encoding::BINARY" before the inner if
statement.  It should be harmless in cases that currently work, and
should fix this case.  In cases where the data is binary data that
can't be interpreted as utf-8, it will continue to blow up.

2) add 'begin...rescue...end' around the inner if statement.  Note:
you don't need to set subject in the rescue clause as it was set by
the relevant erb file (e.g. icla.erb).  More information on rescue
statements: http://phrogz.net/programmingruby/tut_exceptions.html

These changes should enable you to process the currently pending action.

> Craig

- Sam Ruby

>> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
>>
>> Here’s what happens to the em-dash in whimsy pending.yml:
>>
>> ---
>> - doctype: icla
>>  source: craig-russell-copy.pdf
>>  realname: Craig Russell Emdash
>>  pubname: Craig Russell Emdash
>>  email: craig.russell@oracle.com
>>  filename: craig-russell-emdash.pdf
>>  nname: Craig Russell
>>  nemail: craig.russell@oracle.com
>>  iname: Craig Russell
>>  iemail: craig.russell@oracle.com
>>  uname: Craig Russell
>>  uemail: craig.russell@oracle.com
>>  pname: Craig Russell
>>  pemail: craig.russell@oracle.com
>>  memail: craig.russell@oracle.com
>>  gname: Craig Russell
>>  gemail: craig.russell@oracle.com
>>  contact: Craig Russell
>>  cemail: craig.russell@oracle.com
>>  ipodling: " "
>>  email:addr: craig.russell@oracle.com
>>  email:id: "<02...@oracle.com>"
>>  email:name: Craig Russell
>>  email:subject: !binary |-
>>    RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>  svn:mime-type: application/pdf
>>
>>
>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>>>
>>> This email causes (still pending email) an error sending mail.
>>>
>>> I suspect it is because of the em-dash in the subject.
>>>
>>> I don’t know how to look at or edit the pending.yml on the server.
>>>
>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>> To: secretary@apache.org <ma...@apache.org>
>>>
>>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>>>
>>> Thanks,
>>>
>>> Craig
>>>
>>>> Begin forwarded message:
>>>>
>>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>> To: secretary@apache.org <ma...@apache.org>
>>>>
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>> Gosha
>>>>
>>>
>>> Craig L Russell
>>> Secretary, Apache Software Foundation
>>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
>>
>> Craig L Russell
>> Architect
>> craig.russell@oracle.com
>> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
>>
>>
>>
>>
>>
>
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S. A good JDO? O, Gasp!
>
>
>
>
>

Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
The processing of email::subject seems to be localized to file.cgi ca. 261

          # override subject?
          if vars.email_subject and !vars.email_subject.empty?
            if vars.email_subject =~ /^re:\s/i
              subject vars.email_subject
            else
              subject 'Re: ' + vars.email_subject
            end
          end

I can’t see where the actual problem is, but is there a way to either;

1. have whichever component created vars.email_subject recognize UTF-8 characters and pass them as characters instead of binary

2. recognize that this has happened here and replace the subject with an innocuous subject based on the document type.

Craig

> On Aug 27, 2016, at 12:11 PM, Craig Russell <cr...@oracle.com> wrote:
> 
> Here’s what happens to the em-dash in whimsy pending.yml:
> 
> ---
> - doctype: icla
>  source: craig-russell-copy.pdf
>  realname: Craig Russell Emdash
>  pubname: Craig Russell Emdash
>  email: craig.russell@oracle.com
>  filename: craig-russell-emdash.pdf
>  nname: Craig Russell
>  nemail: craig.russell@oracle.com
>  iname: Craig Russell
>  iemail: craig.russell@oracle.com
>  uname: Craig Russell
>  uemail: craig.russell@oracle.com
>  pname: Craig Russell
>  pemail: craig.russell@oracle.com
>  memail: craig.russell@oracle.com
>  gname: Craig Russell
>  gemail: craig.russell@oracle.com
>  contact: Craig Russell
>  cemail: craig.russell@oracle.com
>  ipodling: " "
>  email:addr: craig.russell@oracle.com
>  email:id: "<02...@oracle.com>"
>  email:name: Craig Russell
>  email:subject: !binary |-
>    RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>  svn:mime-type: application/pdf
> 
> 
>> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
>> 
>> This email causes (still pending email) an error sending mail.
>> 
>> I suspect it is because of the em-dash in the subject.
>> 
>> I don’t know how to look at or edit the pending.yml on the server.
>> 
>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>> To: secretary@apache.org <ma...@apache.org>
>> 
>> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
>> 
>> Thanks,
>> 
>> Craig
>> 
>>> Begin forwarded message:
>>> 
>>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>> To: secretary@apache.org <ma...@apache.org>
>>> 
>>> 
>>> 
>>> -- 
>>> Cheers,
>>> Gosha
>>> 
>> 
>> Craig L Russell
>> Secretary, Apache Software Foundation
>> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>
> 
> Craig L Russell
> Architect
> craig.russell@oracle.com
> P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!
> 
> 
> 
> 
> 

Craig L Russell
Architect
craig.russell@oracle.com
P.S. A good JDO? O, Gasp!






Re: ICLA — Gosha Arinich aka goshakkk

Posted by Craig Russell <cr...@oracle.com>.
Here’s what happens to the em-dash in whimsy pending.yml:

---
- doctype: icla
  source: craig-russell-copy.pdf
  realname: Craig Russell Emdash
  pubname: Craig Russell Emdash
  email: craig.russell@oracle.com
  filename: craig-russell-emdash.pdf
  nname: Craig Russell
  nemail: craig.russell@oracle.com
  iname: Craig Russell
  iemail: craig.russell@oracle.com
  uname: Craig Russell
  uemail: craig.russell@oracle.com
  pname: Craig Russell
  pemail: craig.russell@oracle.com
  memail: craig.russell@oracle.com
  gname: Craig Russell
  gemail: craig.russell@oracle.com
  contact: Craig Russell
  cemail: craig.russell@oracle.com
  ipodling: " "
  email:addr: craig.russell@oracle.com
  email:id: "<02...@oracle.com>"
  email:name: Craig Russell
  email:subject: !binary |-
    RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
  svn:mime-type: application/pdf


> On Aug 27, 2016, at 11:41 AM, Craig Russell <cr...@oracle.com> wrote:
> 
> This email causes (still pending email) an error sending mail.
> 
> I suspect it is because of the em-dash in the subject.
> 
> I don’t know how to look at or edit the pending.yml on the server.
> 
> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
> Date: Sat, 27 Aug 2016 03:03:00 +0300
> Message-ID: <CA+TtpJt-+D5_O4uQKsV+1dbS_FaFWfY4ZRMtjSpxeY48ae3eiQ@mail.gmail.com <ma...@mail.gmail.com>>
> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
> To: secretary@apache.org <ma...@apache.org>
> 
> So, two issues: the pending mail needs to be sent; the bug needs to be fixed.
> 
> Thanks,
> 
> Craig
> 
>> Begin forwarded message:
>> 
>> From: Gosha Arinich <me@goshakkk.name <ma...@goshakkk.name>>
>> Subject: ICLA — Gosha Arinich aka goshakkk
>> Date: August 26, 2016 at 5:03:00 PM PDT
>> To: secretary@apache.org <ma...@apache.org>
>> 
>> 
>> 
>> -- 
>> Cheers,
>> Gosha
>> 
> 
> Craig L Russell
> Secretary, Apache Software Foundation
> clr@apache.org <ma...@apache.org> http://db.apache.org/jdo <http://db.apache.org/jdo>

Craig L Russell
Architect
craig.russell@oracle.com
P.S <ma...@oracle.comP.S>. A good JDO? O, Gasp!