You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@openoffice.apache.org by bu...@apache.org on 2019/03/28 15:41:07 UTC
[Issue 125469] Japanese input not working under Japanese OS: text from IME is never inserted

https://bz.apache.org/ooo/show_bug.cgi?id=125469

--- Comment #2 from Alex Taylor <al...@gmail.com> ---
I've been working on my own IME (https://github.com/altsan/os2-wnnim) which has
allowed me to do detailed testing of how DBCS input actually works, and how AOO
is receiving the messages.

When inserting a character via WM_CHAR, the first USHORT of mp2 is the
character code. However, in the event of a DBCS character (for the current
codepage), _both_ bytes are passed in the same USHORT.  This is how my IME (and
others) send the character value:

        usChar = (USHORT) pszBuffer[ i ];
        if ( IsDBCSLeadByte( usChar, global.dbcs ))
            usChar |= pszBuffer[ ++i ] << 0x8;
        WinSendMsg( hwndSource, WM_CHAR,
                    MPFROMSH2CH( KC_CHAR, 1, 0 ),
                    MPFROM2SHORT( usChar, 0 ));

So the first DBCS byte is in the high-order byte of the USHORT, and the second
is in the low-order byte.  Standard OS/2 PM behaviour is to simply separate out
the two bytes and either combine them into a DBCS character, or treat them as
two individual characters, depending on the current codepage.  So, for example,
passing byte value 0x82A0 in the WM_CHAR message to (say) E.EXE, it will render
as "あ" if running under codepage 932, or as "éá" under codepage 850.

I had a look at main/vcl/os2/source/window/salframe.cxx and I think I see the
problem.  The function ImplConvertKey() is casting mp2 to UCHAR and thus losing
the first byte.

        UCHAR nCharCode = (UCHAR)SHORT1FROMMP( aMP2 );

Now, normally I could work around this the way I do for some other applications
(like MED) which do the same thing.  The usual workaround is to simply send
both bytes as separate WM_CHAR messages (i.e. 0x82 then 0xA0).  However, this
won't work for AOO because it converts each byte to a Unicode character value,
instead of combining them into a single character first.  Also, a workaround
like that would only work for my IME program, not for the standard OS/2 IME
(which is the original subject of this ticket).

It seems to me that the solution in AOO is to adjust ImplConvertKey() so that
it detects a high-order byte and treats double-byte characters as such.  (It
might be as simple as casting aMP2 to USHORT instead of UCHAR, but that depends
on how sal_Char and OUString are defined and how gsl_getSystemTextEncoding()
works -- I wasn't able to trace the code that far.)

That should allow both my new IME and the standard OS/2 one to work properly. 
(The position of the IME entry box would still be wonky as long as
WM_QUERYCONVERTPOS is not handled, but that's mainly a cosmetic problem.)

An alternative (or additional) approach, which would only work for
applications/hooks that are aware of it, would be to implement a new message
like WM_UNICHAR, and allow a UCS-2 code to be passed in the MPARAMs directly. 
That might be a nice feature to allow IMEs to input Unicode directly, which
would really enhance AOO for OS/2. :)

-- 
You are receiving this mail because:
You are the assignee for the issue.