You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by André Warnier <aw...@ice-sa.com> on 2009/06/28 17:41:44 UTC

quick pure perl question

Hi.
By curiosity, and just in case anyone knows off-hand :

perl 5.8.8

In a script, I substantially do this :

open(FIRST,'<:utf8',$name1);
open(SECOND,'>:raw',$name2);
while(defined($line = <FIRST>)) {
  print SECOND $line;
}

and I get warnings : "wide character in print to <SECOND>,.."

I mean, I know that my data is UTF-8, and I know that some characters 
are going to be "wide", and that's how I want them.
I also know that I could specify the output I/O layer as 'utf8' (which 
avoids the warning).
But why do I get warnings when I specified 'raw' as the I/O layer ?
Doesn't 'raw' mean like 'as is' ?




Re: quick pure perl question

Posted by Bill Moseley <mo...@hank.org>.
On Tue, Jun 30, 2009 at 6:13 AM, André Warnier <aw...@ice-sa.com> wrote:

> Basically, by using the '>:raw' encoding for the output stream, I was not
> expecting perl to warn me that I was (knowingly) outputting "wide
> characters" there, so I was surprised at the warning.
>
> I /would/ have expected it if I was /not/ specifying an encoding, like
> using simply '>'.  But not when I am explicitly specifying '>:raw', which in
> my mind, and according to my interpretation of the on-line documentation, is
> equivalent to saying "output whatever you have as bytes in that string
> variable right now, as is, I know what I'm doing".


I think it's because it's not bytes.  Well, technically it's bytes of
course, but conceptually once you decode bytes you no longer have bytes.
You have that abstract idea of characters.  And the only way to output that
information into a file (which hold bytes) is by first converting it to
bytes, and that requires encoding.

It's just like a thought you have in your brain.  I'm not aware of any way
(yet) to output that in raw format -- must be encoded into typed, spoken, or
signed language first.  Even if most of what I write would be considered
pretty raw.

Isn't :raw mostly a way to use layers to say don't do CRLF conversion --
like the old use of binmode()?  Oh, maybe not according to the docs.
It's best to decode and encode all character data at program boundaries and
stay away form Windows.


-- 
Bill Moseley
moseley@hank.org

Re: quick pure perl question

Posted by Andy Armstrong <an...@hexten.net>.
On 30 Jun 2009, at 14:13, André Warnier wrote:
> I /would/ have expected it if I was /not/ specifying an encoding,  
> like using simply '>'.  But not when I am explicitly specifying  
> '>:raw', which in my mind, and according to my interpretation of the  
> on-line documentation, is equivalent to saying "output whatever you  
> have as bytes in that string variable right now, as is, I know what  
> I'm doing".


You have that bit right - but the string doesn't contain bytes[1] - it  
contains characters. Strings can either be an octet stream or a stream  
of wide characters. By reading utf8 into a string you've turned it  
into the latter. Perl's warning that you're pushing character data  
into an octet hole.

[1] of course it's /made/ of bytes but that's not how Perl sees it.

-- 
Andy Armstrong, Hexten


Re: quick pure perl question

Posted by André Warnier <aw...@ice-sa.com>.
Andy Armstrong wrote:
> On 28 Jun 2009, at 17:33, Bill Moseley wrote:
>> You need to encode the character data before writing back out either
>> by encoding explicitly or using a layer.
> 
> 
> Or possibly not decode it in the first place and treat it as an opaque 
> octet stream. All depending, of course, on what it is you're trying to 
> achieve.
> 

I was not trying to achieve anything, and I do understand the 
encoding/decoding aspect.
Basically, by using the '>:raw' encoding for the output stream, I was 
not expecting perl to warn me that I was (knowingly) outputting "wide 
characters" there, so I was surprised at the warning.

I /would/ have expected it if I was /not/ specifying an encoding, like 
using simply '>'.  But not when I am explicitly specifying '>:raw', 
which in my mind, and according to my interpretation of the on-line 
documentation, is equivalent to saying "output whatever you have as 
bytes in that string variable right now, as is, I know what I'm doing".
But I guess my interpretation of the documentation is incorrect then.


Re: quick pure perl question

Posted by Andy Armstrong <an...@hexten.net>.
On 28 Jun 2009, at 17:33, Bill Moseley wrote:
> You need to encode the character data before writing back out either
> by encoding explicitly or using a layer.


Or possibly not decode it in the first place and treat it as an opaque  
octet stream. All depending, of course, on what it is you're trying to  
achieve.

-- 
Andy Armstrong, Hexten


Re: quick pure perl question

Posted by Bill Moseley <mo...@hank.org>.
On Sun, Jun 28, 2009 at 8:41 AM, André Warnier<aw...@ice-sa.com> wrote:
> Hi.
> By curiosity, and just in case anyone knows off-hand :
>
> perl 5.8.8
>
> In a script, I substantially do this :
>
> open(FIRST,'<:utf8',$name1);
> open(SECOND,'>:raw',$name2);
> while(defined($line = <FIRST>)) {
>  print SECOND $line;
> }
>
> and I get warnings : "wide character in print to <SECOND>,.."
>
> I mean, I know that my data is UTF-8, and I know that some characters are
> going to be "wide", and that's how I want them.
> I also know that I could specify the output I/O layer as 'utf8' (which
> avoids the warning).
> But why do I get warnings when I specified 'raw' as the I/O layer ?
> Doesn't 'raw' mean like 'as is' ?

You are decoding into characters when reading in.  Perl sets the utf8
flag on $line to indicate that $line is character data.  Then you are
attempting to write characters (which is an abstraction) out as byte
data.  Perl warns you that you are doing this because the utf8 flag is
set.

You need to encode the character data before writing back out either
by encoding explicitly or using a layer.



-- 
Bill Moseley
moseley@hank.org

Re: quick pure perl question

Posted by Mike OK <mi...@acorg.com>.
Check out this man page http://perldoc.perl.org/functions/open.html  For 
encoding UTF8, the example is

open(FH, "<:encoding(UTF-8)", "file")

Mike

----- Original Message ----- 
From: "André Warnier" <aw...@ice-sa.com>
To: "mod_perl list" <mo...@perl.apache.org>
Sent: Sunday, June 28, 2009 11:41 AM
Subject: quick pure perl question


> Hi.
> By curiosity, and just in case anyone knows off-hand :
>
> perl 5.8.8
>
> In a script, I substantially do this :
>
> open(FIRST,'<:utf8',$name1);
> open(SECOND,'>:raw',$name2);
> while(defined($line = <FIRST>)) {
>  print SECOND $line;
> }
>
> and I get warnings : "wide character in print to <SECOND>,.."
>
> I mean, I know that my data is UTF-8, and I know that some characters are 
> going to be "wide", and that's how I want them.
> I also know that I could specify the output I/O layer as 'utf8' (which 
> avoids the warning).
> But why do I get warnings when I specified 'raw' as the I/O layer ?
> Doesn't 'raw' mean like 'as is' ?
>
>
>
>