You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by jinesh kj <ji...@gmail.com> on 2007/11/28 10:42:25 UTC

reg:[reading data with ZWJ and ZWNJ]

hi all,

I was trying to read from an XML file where some data have ZERO Width Joiner
in it. I used the getTextContent in DOMNode. I was able to read the contents
without Zero width joiner, but there are some issues with these special
characters. What do i have to change? Do i have to make any special
settings? Or do i have to use any other function insttead?

cheers
Jinesh K J

-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
Thank you.

cheers

Jinesh K J

On Nov 28, 2007 9:49 PM, Jesse Pelton <js...@pkc.com> wrote:

> If by "print" you mean "write to the standard output," that will depend on
> your operating system and its configuration.  I'd expect garbage.  XMLCh is
> generally a two-byte (16-bit) quantity, and for ASCII characters, one byte
> will be ASCII and the other byte will be zero.  Other characters will not be
> recognizable if your operating system is expecting single-byte characters.
>
> -----Original Message-----
> From: jinesh kj [mailto:jinesh.k@gmail.com]
> Sent: Wednesday, November 28, 2007 10:26 AM
> To: c-users@xerces.apache.org
> Subject: Re: reg:[reading data with ZWJ and ZWNJ]
>
> Thank you,
>
> i will check this out. Also, what should i get if i try to print the XMLCh
> variable?
>
> regards
>
> Jinesh K J
>
>
> On Nov 28, 2007 8:45 PM, Jesse Pelton <js...@pkc.com> wrote:
>
> > The problem is probably in the transcoding. XMLString::transcode()
> > transcodes to whatever native code page your machine is set up with.
> Unless
> > that code page allows zwj and zwnj to be represented, your transcoding
> > results will not be what you expect. You should transcode to an encoding
> > that can represent any characters you can get (like Xerces' internal
> UTF-16
> > encoding). See XMLTransService.
> >
> > ________________________________
> >
> > From: jinesh kj [mailto:jinesh.k@gmail.com]
> > Sent: Wednesday, November 28, 2007 9:56 AM
> > To: c-users@xerces.apache.org
> > Subject: Re: reg:[reading data with ZWJ and ZWNJ]
> >
> >
> > hi,
> >
> > I actually need the whole text with the zwj. My code i am attaching.
> Only
> > the section which does interaction with xml file. Hope its enough. My
> code
> > is little big, so it may take a little time for you to understand i
> havent
> > commented it properly. If you need explanation on any part please let me
> > know.
> >
> > cheers
> >
> > Jinesh  K J
> >
> >
> > On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com>
> wrote:
> >
> >
> >        The file you attached is correct, and the same modified DOMPrint
> > that I
> >        used before return the ZWJ characters in the content of
> > getTextContent.
> >        Could you show us the code you are using to read the file?
> >
> >
> >        Alberto
> >
> >        jinesh kj wrote:
> >        > hi,
> >        >
> >
> >        > I dumped using mysql -X command which will give me output as
> xml
> > file.
> >        > I dont know whether there is any problem with my xml files. Is
> > there
> >        > any specific notation to represent the ZWJ and ZWNJ in xml
> files?
> >        >
> >        > I am attaching an xml file i have.
> >        >
> >        > Thank you for your help, and if you have a better idea what to
> do
> > with
> >        > the xml file when i get characters like these, or any links to
> > those
> >        > details, please point me.
> >        >
> >        > regards
> >        >
> >        > Jinesh K J
> >        >
> >        > On Nov 28, 2007 4:46 PM, Alberto Massari <
> amassari@datadirect.com
> >
> >        > <ma...@datadirect.com>> wrote:
> >        >
> >        >     If you can read the original file, but not when you edit
> it,
> > I
> >        >     would bet
> >        >     the reason is in the way you edit your XML files (and dump
> > from the
> >        >     database). What are you using? Could you attach a small
> > sample file?
> >        >
> >        >     Alberto
> >        >
> >        >     jinesh kj wrote:
> >        >     > hi,
> >        >     >
> >        >     > I tried reading the file you send. It didnt give any
> error,
> >        >     which means it
> >        >     > was reading perfectly. I dont know how to check  in the
> > debugger
> >        >     and all, so
> >        >     > dont know whether it  read 200d or not. But if i try to
> > edit the
> >        >     xml file,
> >        >     > with some text data along with, it is not reading the the
> > text.
> >        >     Do i have to
> >        >     > do anything for it? Basically i am trying to read through
> > an xml
> >        >     file, which
> >        >     > is a dump of mysql database. It have many zwj and all. I
> > dont
> >        >     know whether
> >        >     > it is according to specified encoding or so and
> all.Butsince it
> >        >     was dumped
> >        >     > from database, using the built in function, i think a
> > chance for
> >        >     error is
> >        >     > too low.
> >        >     >
> >        >     > I am trying to use a similar function only, in my
> program,
> > it
> >        >     returns
> >        >     > nothing when there is a ZWJ in my data.
> >        >     >
> >        >     > I hope i am clear. I am able to read xml files without
> ZWJ
> > easily.
> >        >     >
> >        >     > regards
> >        >     >
> >        >     > Jinesh K J
> >        >     >
> >        >     > On Nov 28, 2007 4:02 PM, Alberto Massari
> >
> >        >     <amassari@datadirect.com <ma...@datadirect.com>>
> > wrote:
> >        >     >
> >        >     >
> >        >     >> I am attaching a sample XML that contains a U+200D
> > character
> >        >     between a
> >        >     >> --| and |-- pattern; I modified DOMPrint to issue a
> >        >     >>
> >        >     >>            const XMLCh*
> >        >     data=doc->getDocumentElement()->getTextContent();
> >        >     >>
> >        >     >> and in the debugger I see that data[4] is \x200D
> >        >     >> Have you checked your source XML  really has that
> > character?
> >        >     Also, is
> >        >     >> the representation of the ZWJ character in the XML file
> > valid
> >        >     according
> >        >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80
> > 0x8D)?
> >        >     >>
> >        >     >> Alberto
> >        >     >>
> >        >     >> jinesh kj wrote:
> >        >     >>
> >        >     >>> hi,
> >        >     >>>
> >        >     >>> Actually, getTextContent is not returning any value
> when
> > there
> >        >     is a Zero
> >        >     >>> width joiner.
> >        >     >>>
> >        >     >>> cheers
> >        >     >>>
> >        >     >>> Jinesh K J
> >        >     >>>
> >        >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
> >
> >        >     < amassari@datadirect.com <ma...@datadirect.com>>
> >
> >        >     >>>
> >        >     >> wrote:
> >        >     >>
> >        >     >>>
> >        >     >>>> Hi Jinesh,
> >        >     >>>> which kind of issues are you having? The text returned
> > by
> >        >     >>>>
> >        >     >> getTextContent
> >        >     >>
> >        >     >>>> should contain a \x200D value inside. Or have you
> > transcoded
> >        >     it into
> >        >     >>>> chars?
> >        >     >>>>
> >        >     >>>> Alberto
> >        >     >>>>
> >        >     >>>> jinesh kj wrote:
> >        >     >>>>
> >        >     >>>>
> >        >     >>>>> hi all,
> >        >     >>>>>
> >        >     >>>>> I was trying to read from an XML file where some data
> > have
> >        >     ZERO Width
> >        >     >>>>>
> >        >     >>>>>
> >        >     >>>> Joiner
> >        >     >>>>
> >        >     >>>>
> >        >     >>>>> in it. I used the getTextContent in DOMNode. I was
> able
> > to
> >        >     read the
> >        >     >>>>>
> >        >     >>>>>
> >        >     >>>> contents
> >        >     >>>>
> >        >     >>>>
> >        >     >>>>> without Zero width joiner, but there are some issues
> > with these
> >        >     >>>>>
> >        >     >> special
> >        >     >>
> >        >     >>>>> characters. What do i have to change? Do i have to
> make
> > any
> >        >     special
> >        >     >>>>> settings? Or do i have to use any other function
> > insttead?
> >        >     >>>>>
> >        >     >>>>> cheers
> >        >     >>>>> Jinesh K J
> >        >     >>>>>
> >        >     >>>>>
> >        >     >>>>>
> >        >     >>>>>
> >        >     >>>
> >        >     >>>
> >        >     >>
> >        >     >
> >        >     >
> >        >     >
> >        >
> >        >
> >        >
> >        >
> >        > --
> >        > My Feelings,Expressions-
> >        > http://logbookofanobserver.blogspot.com
> >        >
> >        > SMC : My computer, My language http://smc.org.in
> >        > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ
> ഭാഷ
> >
> >
> >
> >
> >
> >
> > --
> > My Feelings,Expressions-
> > http://logbookofanobserver.blogspot.com
> >
> > SMC : My computer, My language http://smc.org.in
> > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
> >
>
>
>
> --
> My Feelings,Expressions-
> http://logbookofanobserver.blogspot.com
>
> SMC : My computer, My language http://smc.org.in
> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>



-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

RE: reg:[reading data with ZWJ and ZWNJ]

Posted by Jesse Pelton <js...@PKC.com>.
If by "print" you mean "write to the standard output," that will depend on your operating system and its configuration.  I'd expect garbage.  XMLCh is generally a two-byte (16-bit) quantity, and for ASCII characters, one byte will be ASCII and the other byte will be zero.  Other characters will not be recognizable if your operating system is expecting single-byte characters.

-----Original Message-----
From: jinesh kj [mailto:jinesh.k@gmail.com] 
Sent: Wednesday, November 28, 2007 10:26 AM
To: c-users@xerces.apache.org
Subject: Re: reg:[reading data with ZWJ and ZWNJ]

Thank you,

i will check this out. Also, what should i get if i try to print the XMLCh
variable?

regards

Jinesh K J


On Nov 28, 2007 8:45 PM, Jesse Pelton <js...@pkc.com> wrote:

> The problem is probably in the transcoding. XMLString::transcode()
> transcodes to whatever native code page your machine is set up with. Unless
> that code page allows zwj and zwnj to be represented, your transcoding
> results will not be what you expect. You should transcode to an encoding
> that can represent any characters you can get (like Xerces' internal UTF-16
> encoding). See XMLTransService.
>
> ________________________________
>
> From: jinesh kj [mailto:jinesh.k@gmail.com]
> Sent: Wednesday, November 28, 2007 9:56 AM
> To: c-users@xerces.apache.org
> Subject: Re: reg:[reading data with ZWJ and ZWNJ]
>
>
> hi,
>
> I actually need the whole text with the zwj. My code i am attaching. Only
> the section which does interaction with xml file. Hope its enough. My code
> is little big, so it may take a little time for you to understand i havent
> commented it properly. If you need explanation on any part please let me
> know.
>
> cheers
>
> Jinesh  K J
>
>
> On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com> wrote:
>
>
>        The file you attached is correct, and the same modified DOMPrint
> that I
>        used before return the ZWJ characters in the content of
> getTextContent.
>        Could you show us the code you are using to read the file?
>
>
>        Alberto
>
>        jinesh kj wrote:
>        > hi,
>        >
>
>        > I dumped using mysql -X command which will give me output as xml
> file.
>        > I dont know whether there is any problem with my xml files. Is
> there
>        > any specific notation to represent the ZWJ and ZWNJ in xml files?
>        >
>        > I am attaching an xml file i have.
>        >
>        > Thank you for your help, and if you have a better idea what to do
> with
>        > the xml file when i get characters like these, or any links to
> those
>        > details, please point me.
>        >
>        > regards
>        >
>        > Jinesh K J
>        >
>        > On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com
>
>        > <ma...@datadirect.com>> wrote:
>        >
>        >     If you can read the original file, but not when you edit it,
> I
>        >     would bet
>        >     the reason is in the way you edit your XML files (and dump
> from the
>        >     database). What are you using? Could you attach a small
> sample file?
>        >
>        >     Alberto
>        >
>        >     jinesh kj wrote:
>        >     > hi,
>        >     >
>        >     > I tried reading the file you send. It didnt give any error,
>        >     which means it
>        >     > was reading perfectly. I dont know how to check  in the
> debugger
>        >     and all, so
>        >     > dont know whether it  read 200d or not. But if i try to
> edit the
>        >     xml file,
>        >     > with some text data along with, it is not reading the the
> text.
>        >     Do i have to
>        >     > do anything for it? Basically i am trying to read through
> an xml
>        >     file, which
>        >     > is a dump of mysql database. It have many zwj and all. I
> dont
>        >     know whether
>        >     > it is according to specified encoding or so and all.Butsince it
>        >     was dumped
>        >     > from database, using the built in function, i think a
> chance for
>        >     error is
>        >     > too low.
>        >     >
>        >     > I am trying to use a similar function only, in my program,
> it
>        >     returns
>        >     > nothing when there is a ZWJ in my data.
>        >     >
>        >     > I hope i am clear. I am able to read xml files without ZWJ
> easily.
>        >     >
>        >     > regards
>        >     >
>        >     > Jinesh K J
>        >     >
>        >     > On Nov 28, 2007 4:02 PM, Alberto Massari
>
>        >     <amassari@datadirect.com <ma...@datadirect.com>>
> wrote:
>        >     >
>        >     >
>        >     >> I am attaching a sample XML that contains a U+200D
> character
>        >     between a
>        >     >> --| and |-- pattern; I modified DOMPrint to issue a
>        >     >>
>        >     >>            const XMLCh*
>        >     data=doc->getDocumentElement()->getTextContent();
>        >     >>
>        >     >> and in the debugger I see that data[4] is \x200D
>        >     >> Have you checked your source XML  really has that
> character?
>        >     Also, is
>        >     >> the representation of the ZWJ character in the XML file
> valid
>        >     according
>        >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80
> 0x8D)?
>        >     >>
>        >     >> Alberto
>        >     >>
>        >     >> jinesh kj wrote:
>        >     >>
>        >     >>> hi,
>        >     >>>
>        >     >>> Actually, getTextContent is not returning any value when
> there
>        >     is a Zero
>        >     >>> width joiner.
>        >     >>>
>        >     >>> cheers
>        >     >>>
>        >     >>> Jinesh K J
>        >     >>>
>        >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
>
>        >     < amassari@datadirect.com <ma...@datadirect.com>>
>
>        >     >>>
>        >     >> wrote:
>        >     >>
>        >     >>>
>        >     >>>> Hi Jinesh,
>        >     >>>> which kind of issues are you having? The text returned
> by
>        >     >>>>
>        >     >> getTextContent
>        >     >>
>        >     >>>> should contain a \x200D value inside. Or have you
> transcoded
>        >     it into
>        >     >>>> chars?
>        >     >>>>
>        >     >>>> Alberto
>        >     >>>>
>        >     >>>> jinesh kj wrote:
>        >     >>>>
>        >     >>>>
>        >     >>>>> hi all,
>        >     >>>>>
>        >     >>>>> I was trying to read from an XML file where some data
> have
>        >     ZERO Width
>        >     >>>>>
>        >     >>>>>
>        >     >>>> Joiner
>        >     >>>>
>        >     >>>>
>        >     >>>>> in it. I used the getTextContent in DOMNode. I was able
> to
>        >     read the
>        >     >>>>>
>        >     >>>>>
>        >     >>>> contents
>        >     >>>>
>        >     >>>>
>        >     >>>>> without Zero width joiner, but there are some issues
> with these
>        >     >>>>>
>        >     >> special
>        >     >>
>        >     >>>>> characters. What do i have to change? Do i have to make
> any
>        >     special
>        >     >>>>> settings? Or do i have to use any other function
> insttead?
>        >     >>>>>
>        >     >>>>> cheers
>        >     >>>>> Jinesh K J
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>
>        >     >>>
>        >     >>
>        >     >
>        >     >
>        >     >
>        >
>        >
>        >
>        >
>        > --
>        > My Feelings,Expressions-
>        > http://logbookofanobserver.blogspot.com
>        >
>        > SMC : My computer, My language http://smc.org.in
>        > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>
>
>
>
>
> --
> My Feelings,Expressions-
> http://logbookofanobserver.blogspot.com
>
> SMC : My computer, My language http://smc.org.in
> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>



-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
Thank you,

i will check this out. Also, what should i get if i try to print the XMLCh
variable?

regards

Jinesh K J


On Nov 28, 2007 8:45 PM, Jesse Pelton <js...@pkc.com> wrote:

> The problem is probably in the transcoding. XMLString::transcode()
> transcodes to whatever native code page your machine is set up with. Unless
> that code page allows zwj and zwnj to be represented, your transcoding
> results will not be what you expect. You should transcode to an encoding
> that can represent any characters you can get (like Xerces' internal UTF-16
> encoding). See XMLTransService.
>
> ________________________________
>
> From: jinesh kj [mailto:jinesh.k@gmail.com]
> Sent: Wednesday, November 28, 2007 9:56 AM
> To: c-users@xerces.apache.org
> Subject: Re: reg:[reading data with ZWJ and ZWNJ]
>
>
> hi,
>
> I actually need the whole text with the zwj. My code i am attaching. Only
> the section which does interaction with xml file. Hope its enough. My code
> is little big, so it may take a little time for you to understand i havent
> commented it properly. If you need explanation on any part please let me
> know.
>
> cheers
>
> Jinesh  K J
>
>
> On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com> wrote:
>
>
>        The file you attached is correct, and the same modified DOMPrint
> that I
>        used before return the ZWJ characters in the content of
> getTextContent.
>        Could you show us the code you are using to read the file?
>
>
>        Alberto
>
>        jinesh kj wrote:
>        > hi,
>        >
>
>        > I dumped using mysql -X command which will give me output as xml
> file.
>        > I dont know whether there is any problem with my xml files. Is
> there
>        > any specific notation to represent the ZWJ and ZWNJ in xml files?
>        >
>        > I am attaching an xml file i have.
>        >
>        > Thank you for your help, and if you have a better idea what to do
> with
>        > the xml file when i get characters like these, or any links to
> those
>        > details, please point me.
>        >
>        > regards
>        >
>        > Jinesh K J
>        >
>        > On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com
>
>        > <ma...@datadirect.com>> wrote:
>        >
>        >     If you can read the original file, but not when you edit it,
> I
>        >     would bet
>        >     the reason is in the way you edit your XML files (and dump
> from the
>        >     database). What are you using? Could you attach a small
> sample file?
>        >
>        >     Alberto
>        >
>        >     jinesh kj wrote:
>        >     > hi,
>        >     >
>        >     > I tried reading the file you send. It didnt give any error,
>        >     which means it
>        >     > was reading perfectly. I dont know how to check  in the
> debugger
>        >     and all, so
>        >     > dont know whether it  read 200d or not. But if i try to
> edit the
>        >     xml file,
>        >     > with some text data along with, it is not reading the the
> text.
>        >     Do i have to
>        >     > do anything for it? Basically i am trying to read through
> an xml
>        >     file, which
>        >     > is a dump of mysql database. It have many zwj and all. I
> dont
>        >     know whether
>        >     > it is according to specified encoding or so and all.Butsince it
>        >     was dumped
>        >     > from database, using the built in function, i think a
> chance for
>        >     error is
>        >     > too low.
>        >     >
>        >     > I am trying to use a similar function only, in my program,
> it
>        >     returns
>        >     > nothing when there is a ZWJ in my data.
>        >     >
>        >     > I hope i am clear. I am able to read xml files without ZWJ
> easily.
>        >     >
>        >     > regards
>        >     >
>        >     > Jinesh K J
>        >     >
>        >     > On Nov 28, 2007 4:02 PM, Alberto Massari
>
>        >     <amassari@datadirect.com <ma...@datadirect.com>>
> wrote:
>        >     >
>        >     >
>        >     >> I am attaching a sample XML that contains a U+200D
> character
>        >     between a
>        >     >> --| and |-- pattern; I modified DOMPrint to issue a
>        >     >>
>        >     >>            const XMLCh*
>        >     data=doc->getDocumentElement()->getTextContent();
>        >     >>
>        >     >> and in the debugger I see that data[4] is \x200D
>        >     >> Have you checked your source XML  really has that
> character?
>        >     Also, is
>        >     >> the representation of the ZWJ character in the XML file
> valid
>        >     according
>        >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80
> 0x8D)?
>        >     >>
>        >     >> Alberto
>        >     >>
>        >     >> jinesh kj wrote:
>        >     >>
>        >     >>> hi,
>        >     >>>
>        >     >>> Actually, getTextContent is not returning any value when
> there
>        >     is a Zero
>        >     >>> width joiner.
>        >     >>>
>        >     >>> cheers
>        >     >>>
>        >     >>> Jinesh K J
>        >     >>>
>        >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
>
>        >     < amassari@datadirect.com <ma...@datadirect.com>>
>
>        >     >>>
>        >     >> wrote:
>        >     >>
>        >     >>>
>        >     >>>> Hi Jinesh,
>        >     >>>> which kind of issues are you having? The text returned
> by
>        >     >>>>
>        >     >> getTextContent
>        >     >>
>        >     >>>> should contain a \x200D value inside. Or have you
> transcoded
>        >     it into
>        >     >>>> chars?
>        >     >>>>
>        >     >>>> Alberto
>        >     >>>>
>        >     >>>> jinesh kj wrote:
>        >     >>>>
>        >     >>>>
>        >     >>>>> hi all,
>        >     >>>>>
>        >     >>>>> I was trying to read from an XML file where some data
> have
>        >     ZERO Width
>        >     >>>>>
>        >     >>>>>
>        >     >>>> Joiner
>        >     >>>>
>        >     >>>>
>        >     >>>>> in it. I used the getTextContent in DOMNode. I was able
> to
>        >     read the
>        >     >>>>>
>        >     >>>>>
>        >     >>>> contents
>        >     >>>>
>        >     >>>>
>        >     >>>>> without Zero width joiner, but there are some issues
> with these
>        >     >>>>>
>        >     >> special
>        >     >>
>        >     >>>>> characters. What do i have to change? Do i have to make
> any
>        >     special
>        >     >>>>> settings? Or do i have to use any other function
> insttead?
>        >     >>>>>
>        >     >>>>> cheers
>        >     >>>>> Jinesh K J
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>
>        >     >>>
>        >     >>
>        >     >
>        >     >
>        >     >
>        >
>        >
>        >
>        >
>        > --
>        > My Feelings,Expressions-
>        > http://logbookofanobserver.blogspot.com
>        >
>        > SMC : My computer, My language http://smc.org.in
>        > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>
>
>
>
>
> --
> My Feelings,Expressions-
> http://logbookofanobserver.blogspot.com
>
> SMC : My computer, My language http://smc.org.in
> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>



-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
thank you. I will check them.

regards

Jinesh K J

On Nov 29, 2007 8:46 PM, David Bertoni <db...@apache.org> wrote:

> jinesh kj wrote:
> > So i have to change the encoding format of XMLString::transcode() ?
> Whats
> > the best method, i tried reading through the XMLTransService, cant get a
> > bigger idea. If you can explain a little more it will be helpful.
> There are two other threads right now on the mailing list that are
> addressing transcoding.  You should read them.
>
> Dave
>



-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by David Bertoni <db...@apache.org>.
jinesh kj wrote:
> So i have to change the encoding format of XMLString::transcode() ? Whats
> the best method, i tried reading through the XMLTransService, cant get a
> bigger idea. If you can explain a little more it will be helpful.
There are two other threads right now on the mailing list that are 
addressing transcoding.  You should read them.

Dave

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
So i have to change the encoding format of XMLString::transcode() ? Whats
the best method, i tried reading through the XMLTransService, cant get a
bigger idea. If you can explain a little more it will be helpful.

Thank you

Jinesh  K J

On Nov 28, 2007 8:45 PM, Jesse Pelton <js...@pkc.com> wrote:

> The problem is probably in the transcoding. XMLString::transcode()
> transcodes to whatever native code page your machine is set up with. Unless
> that code page allows zwj and zwnj to be represented, your transcoding
> results will not be what you expect. You should transcode to an encoding
> that can represent any characters you can get (like Xerces' internal UTF-16
> encoding). See XMLTransService.
>
> ________________________________
>
> From: jinesh kj [mailto:jinesh.k@gmail.com]
> Sent: Wednesday, November 28, 2007 9:56 AM
> To: c-users@xerces.apache.org
> Subject: Re: reg:[reading data with ZWJ and ZWNJ]
>
>
> hi,
>
> I actually need the whole text with the zwj. My code i am attaching. Only
> the section which does interaction with xml file. Hope its enough. My code
> is little big, so it may take a little time for you to understand i havent
> commented it properly. If you need explanation on any part please let me
> know.
>
> cheers
>
> Jinesh  K J
>
>
> On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com> wrote:
>
>
>        The file you attached is correct, and the same modified DOMPrint
> that I
>        used before return the ZWJ characters in the content of
> getTextContent.
>        Could you show us the code you are using to read the file?
>
>
>        Alberto
>
>        jinesh kj wrote:
>        > hi,
>        >
>
>        > I dumped using mysql -X command which will give me output as xml
> file.
>        > I dont know whether there is any problem with my xml files. Is
> there
>        > any specific notation to represent the ZWJ and ZWNJ in xml files?
>        >
>        > I am attaching an xml file i have.
>        >
>        > Thank you for your help, and if you have a better idea what to do
> with
>        > the xml file when i get characters like these, or any links to
> those
>        > details, please point me.
>        >
>        > regards
>        >
>        > Jinesh K J
>        >
>        > On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com
>
>        > <ma...@datadirect.com>> wrote:
>        >
>        >     If you can read the original file, but not when you edit it,
> I
>        >     would bet
>        >     the reason is in the way you edit your XML files (and dump
> from the
>        >     database). What are you using? Could you attach a small
> sample file?
>        >
>        >     Alberto
>        >
>        >     jinesh kj wrote:
>        >     > hi,
>        >     >
>        >     > I tried reading the file you send. It didnt give any error,
>        >     which means it
>        >     > was reading perfectly. I dont know how to check  in the
> debugger
>        >     and all, so
>        >     > dont know whether it  read 200d or not. But if i try to
> edit the
>        >     xml file,
>        >     > with some text data along with, it is not reading the the
> text.
>        >     Do i have to
>        >     > do anything for it? Basically i am trying to read through
> an xml
>        >     file, which
>        >     > is a dump of mysql database. It have many zwj and all. I
> dont
>        >     know whether
>        >     > it is according to specified encoding or so and all.Butsince it
>        >     was dumped
>        >     > from database, using the built in function, i think a
> chance for
>        >     error is
>        >     > too low.
>        >     >
>        >     > I am trying to use a similar function only, in my program,
> it
>        >     returns
>        >     > nothing when there is a ZWJ in my data.
>        >     >
>        >     > I hope i am clear. I am able to read xml files without ZWJ
> easily.
>        >     >
>        >     > regards
>        >     >
>        >     > Jinesh K J
>        >     >
>        >     > On Nov 28, 2007 4:02 PM, Alberto Massari
>
>        >     <amassari@datadirect.com <ma...@datadirect.com>>
> wrote:
>        >     >
>        >     >
>        >     >> I am attaching a sample XML that contains a U+200D
> character
>        >     between a
>        >     >> --| and |-- pattern; I modified DOMPrint to issue a
>        >     >>
>        >     >>            const XMLCh*
>        >     data=doc->getDocumentElement()->getTextContent();
>        >     >>
>        >     >> and in the debugger I see that data[4] is \x200D
>        >     >> Have you checked your source XML  really has that
> character?
>        >     Also, is
>        >     >> the representation of the ZWJ character in the XML file
> valid
>        >     according
>        >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80
> 0x8D)?
>        >     >>
>        >     >> Alberto
>        >     >>
>        >     >> jinesh kj wrote:
>        >     >>
>        >     >>> hi,
>        >     >>>
>        >     >>> Actually, getTextContent is not returning any value when
> there
>        >     is a Zero
>        >     >>> width joiner.
>        >     >>>
>        >     >>> cheers
>        >     >>>
>        >     >>> Jinesh K J
>        >     >>>
>        >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
>
>        >     < amassari@datadirect.com <ma...@datadirect.com>>
>
>        >     >>>
>        >     >> wrote:
>        >     >>
>        >     >>>
>        >     >>>> Hi Jinesh,
>        >     >>>> which kind of issues are you having? The text returned
> by
>        >     >>>>
>        >     >> getTextContent
>        >     >>
>        >     >>>> should contain a \x200D value inside. Or have you
> transcoded
>        >     it into
>        >     >>>> chars?
>        >     >>>>
>        >     >>>> Alberto
>        >     >>>>
>        >     >>>> jinesh kj wrote:
>        >     >>>>
>        >     >>>>
>        >     >>>>> hi all,
>        >     >>>>>
>        >     >>>>> I was trying to read from an XML file where some data
> have
>        >     ZERO Width
>        >     >>>>>
>        >     >>>>>
>        >     >>>> Joiner
>        >     >>>>
>        >     >>>>
>        >     >>>>> in it. I used the getTextContent in DOMNode. I was able
> to
>        >     read the
>        >     >>>>>
>        >     >>>>>
>        >     >>>> contents
>        >     >>>>
>        >     >>>>
>        >     >>>>> without Zero width joiner, but there are some issues
> with these
>        >     >>>>>
>        >     >> special
>        >     >>
>        >     >>>>> characters. What do i have to change? Do i have to make
> any
>        >     special
>        >     >>>>> settings? Or do i have to use any other function
> insttead?
>        >     >>>>>
>        >     >>>>> cheers
>        >     >>>>> Jinesh K J
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>>>
>        >     >>>
>        >     >>>
>        >     >>
>        >     >
>        >     >
>        >     >
>        >
>        >
>        >
>        >
>        > --
>        > My Feelings,Expressions-
>        > http://logbookofanobserver.blogspot.com
>        >
>        > SMC : My computer, My language http://smc.org.in
>        > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>
>
>
>
>
> --
> My Feelings,Expressions-
> http://logbookofanobserver.blogspot.com
>
> SMC : My computer, My language http://smc.org.in
> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>



-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

RE: reg:[reading data with ZWJ and ZWNJ]

Posted by Jesse Pelton <js...@PKC.com>.
The problem is probably in the transcoding. XMLString::transcode() transcodes to whatever native code page your machine is set up with. Unless that code page allows zwj and zwnj to be represented, your transcoding results will not be what you expect. You should transcode to an encoding that can represent any characters you can get (like Xerces' internal UTF-16 encoding). See XMLTransService.

________________________________

From: jinesh kj [mailto:jinesh.k@gmail.com] 
Sent: Wednesday, November 28, 2007 9:56 AM
To: c-users@xerces.apache.org
Subject: Re: reg:[reading data with ZWJ and ZWNJ]


hi,

I actually need the whole text with the zwj. My code i am attaching. Only the section which does interaction with xml file. Hope its enough. My code is little big, so it may take a little time for you to understand i havent commented it properly. If you need explanation on any part please let me know. 

cheers 

Jinesh  K J


On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com> wrote:


	The file you attached is correct, and the same modified DOMPrint that I
	used before return the ZWJ characters in the content of getTextContent.
	Could you show us the code you are using to read the file?
	

	Alberto
	
	jinesh kj wrote:
	> hi,
	>
	
	> I dumped using mysql -X command which will give me output as xml file.
	> I dont know whether there is any problem with my xml files. Is there 
	> any specific notation to represent the ZWJ and ZWNJ in xml files?
	>
	> I am attaching an xml file i have.
	>
	> Thank you for your help, and if you have a better idea what to do with
	> the xml file when i get characters like these, or any links to those 
	> details, please point me.
	>
	> regards
	>
	> Jinesh K J
	>
	> On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com
	
	> <ma...@datadirect.com>> wrote:
	>
	>     If you can read the original file, but not when you edit it, I
	>     would bet
	>     the reason is in the way you edit your XML files (and dump from the
	>     database). What are you using? Could you attach a small sample file?
	>
	>     Alberto
	>
	>     jinesh kj wrote: 
	>     > hi,
	>     >
	>     > I tried reading the file you send. It didnt give any error,
	>     which means it
	>     > was reading perfectly. I dont know how to check  in the debugger 
	>     and all, so
	>     > dont know whether it  read 200d or not. But if i try to edit the
	>     xml file,
	>     > with some text data along with, it is not reading the the text.
	>     Do i have to 
	>     > do anything for it? Basically i am trying to read through an xml
	>     file, which
	>     > is a dump of mysql database. It have many zwj and all. I dont
	>     know whether
	>     > it is according to specified encoding or so and all.But since it
	>     was dumped
	>     > from database, using the built in function, i think a chance for
	>     error is
	>     > too low.
	>     >
	>     > I am trying to use a similar function only, in my program, it 
	>     returns
	>     > nothing when there is a ZWJ in my data.
	>     >
	>     > I hope i am clear. I am able to read xml files without ZWJ easily.
	>     >
	>     > regards
	>     >
	>     > Jinesh K J
	>     >
	>     > On Nov 28, 2007 4:02 PM, Alberto Massari
	
	>     <amassari@datadirect.com <ma...@datadirect.com>> wrote:
	>     >
	>     >
	>     >> I am attaching a sample XML that contains a U+200D character
	>     between a 
	>     >> --| and |-- pattern; I modified DOMPrint to issue a
	>     >>
	>     >>            const XMLCh*
	>     data=doc->getDocumentElement()->getTextContent();
	>     >> 
	>     >> and in the debugger I see that data[4] is \x200D
	>     >> Have you checked your source XML  really has that character?
	>     Also, is
	>     >> the representation of the ZWJ character in the XML file valid 
	>     according
	>     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
	>     >>
	>     >> Alberto
	>     >>
	>     >> jinesh kj wrote:
	>     >>
	>     >>> hi,
	>     >>>
	>     >>> Actually, getTextContent is not returning any value when there
	>     is a Zero
	>     >>> width joiner.
	>     >>>
	>     >>> cheers
	>     >>>
	>     >>> Jinesh K J
	>     >>>
	>     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
	
	>     < amassari@datadirect.com <ma...@datadirect.com>>
	
	>     >>>
	>     >> wrote:
	>     >>
	>     >>>
	>     >>>> Hi Jinesh,
	>     >>>> which kind of issues are you having? The text returned by
	>     >>>> 
	>     >> getTextContent
	>     >>
	>     >>>> should contain a \x200D value inside. Or have you transcoded
	>     it into
	>     >>>> chars?
	>     >>>> 
	>     >>>> Alberto
	>     >>>>
	>     >>>> jinesh kj wrote:
	>     >>>>
	>     >>>>
	>     >>>>> hi all,
	>     >>>>> 
	>     >>>>> I was trying to read from an XML file where some data have
	>     ZERO Width
	>     >>>>>
	>     >>>>>
	>     >>>> Joiner
	>     >>>>
	>     >>>>
	>     >>>>> in it. I used the getTextContent in DOMNode. I was able to
	>     read the
	>     >>>>>
	>     >>>>> 
	>     >>>> contents
	>     >>>>
	>     >>>>
	>     >>>>> without Zero width joiner, but there are some issues with these
	>     >>>>> 
	>     >> special
	>     >>
	>     >>>>> characters. What do i have to change? Do i have to make any
	>     special
	>     >>>>> settings? Or do i have to use any other function insttead? 
	>     >>>>>
	>     >>>>> cheers
	>     >>>>> Jinesh K J
	>     >>>>>
	>     >>>>>
	>     >>>>>
	>     >>>>> 
	>     >>>
	>     >>>
	>     >>
	>     >
	>     >
	>     >
	>
	>
	>
	>
	> --
	> My Feelings,Expressions-
	> http://logbookofanobserver.blogspot.com
	>
	> SMC : My computer, My language http://smc.org.in
	> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ 
	
	




-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ 

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
hi,

I actually need the whole text with the zwj. My code i am attaching. Only
the section which does interaction with xml file. Hope its enough. My code
is little big, so it may take a little time for you to understand i havent
commented it properly. If you need explanation on any part please let me
know.

cheers

Jinesh  K J

On Nov 28, 2007 5:43 PM, Alberto Massari <am...@datadirect.com> wrote:

> The file you attached is correct, and the same modified DOMPrint that I
> used before return the ZWJ characters in the content of getTextContent.
> Could you show us the code you are using to read the file?
>
> Alberto
>
> jinesh kj wrote:
> > hi,
> >
> > I dumped using mysql -X command which will give me output as xml file.
> > I dont know whether there is any problem with my xml files. Is there
> > any specific notation to represent the ZWJ and ZWNJ in xml files?
> >
> > I am attaching an xml file i have.
> >
> > Thank you for your help, and if you have a better idea what to do with
> > the xml file when i get characters like these, or any links to those
> > details, please point me.
> >
> > regards
> >
> > Jinesh K J
> >
> > On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com
> > <ma...@datadirect.com>> wrote:
> >
> >     If you can read the original file, but not when you edit it, I
> >     would bet
> >     the reason is in the way you edit your XML files (and dump from the
> >     database). What are you using? Could you attach a small sample file?
> >
> >     Alberto
> >
> >     jinesh kj wrote:
> >     > hi,
> >     >
> >     > I tried reading the file you send. It didnt give any error,
> >     which means it
> >     > was reading perfectly. I dont know how to check  in the debugger
> >     and all, so
> >     > dont know whether it  read 200d or not. But if i try to edit the
> >     xml file,
> >     > with some text data along with, it is not reading the the text.
> >     Do i have to
> >     > do anything for it? Basically i am trying to read through an xml
> >     file, which
> >     > is a dump of mysql database. It have many zwj and all. I dont
> >     know whether
> >     > it is according to specified encoding or so and all.But since it
> >     was dumped
> >     > from database, using the built in function, i think a chance for
> >     error is
> >     > too low.
> >     >
> >     > I am trying to use a similar function only, in my program, it
> >     returns
> >     > nothing when there is a ZWJ in my data.
> >     >
> >     > I hope i am clear. I am able to read xml files without ZWJ easily.
> >     >
> >     > regards
> >     >
> >     > Jinesh K J
> >     >
> >     > On Nov 28, 2007 4:02 PM, Alberto Massari
> >     <amassari@datadirect.com <ma...@datadirect.com>> wrote:
> >     >
> >     >
> >     >> I am attaching a sample XML that contains a U+200D character
> >     between a
> >     >> --| and |-- pattern; I modified DOMPrint to issue a
> >     >>
> >     >>            const XMLCh*
> >     data=doc->getDocumentElement()->getTextContent();
> >     >>
> >     >> and in the debugger I see that data[4] is \x200D
> >     >> Have you checked your source XML  really has that character?
> >     Also, is
> >     >> the representation of the ZWJ character in the XML file valid
> >     according
> >     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
> >     >>
> >     >> Alberto
> >     >>
> >     >> jinesh kj wrote:
> >     >>
> >     >>> hi,
> >     >>>
> >     >>> Actually, getTextContent is not returning any value when there
> >     is a Zero
> >     >>> width joiner.
> >     >>>
> >     >>> cheers
> >     >>>
> >     >>> Jinesh K J
> >     >>>
> >     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
> >     <amassari@datadirect.com <ma...@datadirect.com>>
> >     >>>
> >     >> wrote:
> >     >>
> >     >>>
> >     >>>> Hi Jinesh,
> >     >>>> which kind of issues are you having? The text returned by
> >     >>>>
> >     >> getTextContent
> >     >>
> >     >>>> should contain a \x200D value inside. Or have you transcoded
> >     it into
> >     >>>> chars?
> >     >>>>
> >     >>>> Alberto
> >     >>>>
> >     >>>> jinesh kj wrote:
> >     >>>>
> >     >>>>
> >     >>>>> hi all,
> >     >>>>>
> >     >>>>> I was trying to read from an XML file where some data have
> >     ZERO Width
> >     >>>>>
> >     >>>>>
> >     >>>> Joiner
> >     >>>>
> >     >>>>
> >     >>>>> in it. I used the getTextContent in DOMNode. I was able to
> >     read the
> >     >>>>>
> >     >>>>>
> >     >>>> contents
> >     >>>>
> >     >>>>
> >     >>>>> without Zero width joiner, but there are some issues with
> these
> >     >>>>>
> >     >> special
> >     >>
> >     >>>>> characters. What do i have to change? Do i have to make any
> >     special
> >     >>>>> settings? Or do i have to use any other function insttead?
> >     >>>>>
> >     >>>>> cheers
> >     >>>>> Jinesh K J
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>
> >     >>>
> >     >>
> >     >
> >     >
> >     >
> >
> >
> >
> >
> > --
> > My Feelings,Expressions-
> > http://logbookofanobserver.blogspot.com
> >
> > SMC : My computer, My language http://smc.org.in
> > സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ
>
>


-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by Alberto Massari <am...@datadirect.com>.
The file you attached is correct, and the same modified DOMPrint that I 
used before return the ZWJ characters in the content of getTextContent. 
Could you show us the code you are using to read the file?

Alberto

jinesh kj wrote:
> hi,
>
> I dumped using mysql -X command which will give me output as xml file. 
> I dont know whether there is any problem with my xml files. Is there 
> any specific notation to represent the ZWJ and ZWNJ in xml files?
>
> I am attaching an xml file i have.
>
> Thank you for your help, and if you have a better idea what to do with 
> the xml file when i get characters like these, or any links to those 
> details, please point me.
>
> regards
>
> Jinesh K J
>
> On Nov 28, 2007 4:46 PM, Alberto Massari <amassari@datadirect.com 
> <ma...@datadirect.com>> wrote:
>
>     If you can read the original file, but not when you edit it, I
>     would bet
>     the reason is in the way you edit your XML files (and dump from the
>     database). What are you using? Could you attach a small sample file?
>
>     Alberto
>
>     jinesh kj wrote:
>     > hi,
>     >
>     > I tried reading the file you send. It didnt give any error,
>     which means it
>     > was reading perfectly. I dont know how to check  in the debugger
>     and all, so
>     > dont know whether it  read 200d or not. But if i try to edit the
>     xml file,
>     > with some text data along with, it is not reading the the text.
>     Do i have to
>     > do anything for it? Basically i am trying to read through an xml
>     file, which
>     > is a dump of mysql database. It have many zwj and all. I dont
>     know whether
>     > it is according to specified encoding or so and all.But since it
>     was dumped
>     > from database, using the built in function, i think a chance for
>     error is
>     > too low.
>     >
>     > I am trying to use a similar function only, in my program, it
>     returns
>     > nothing when there is a ZWJ in my data.
>     >
>     > I hope i am clear. I am able to read xml files without ZWJ easily.
>     >
>     > regards
>     >
>     > Jinesh K J
>     >
>     > On Nov 28, 2007 4:02 PM, Alberto Massari
>     <amassari@datadirect.com <ma...@datadirect.com>> wrote:
>     >
>     >
>     >> I am attaching a sample XML that contains a U+200D character
>     between a
>     >> --| and |-- pattern; I modified DOMPrint to issue a
>     >>
>     >>            const XMLCh*
>     data=doc->getDocumentElement()->getTextContent();
>     >>
>     >> and in the debugger I see that data[4] is \x200D
>     >> Have you checked your source XML  really has that character?
>     Also, is
>     >> the representation of the ZWJ character in the XML file valid
>     according
>     >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
>     >>
>     >> Alberto
>     >>
>     >> jinesh kj wrote:
>     >>
>     >>> hi,
>     >>>
>     >>> Actually, getTextContent is not returning any value when there
>     is a Zero
>     >>> width joiner.
>     >>>
>     >>> cheers
>     >>>
>     >>> Jinesh K J
>     >>>
>     >>> On Nov 28, 2007 3:28 PM, Alberto Massari
>     <amassari@datadirect.com <ma...@datadirect.com>>
>     >>>
>     >> wrote:
>     >>
>     >>>
>     >>>> Hi Jinesh,
>     >>>> which kind of issues are you having? The text returned by
>     >>>>
>     >> getTextContent
>     >>
>     >>>> should contain a \x200D value inside. Or have you transcoded
>     it into
>     >>>> chars?
>     >>>>
>     >>>> Alberto
>     >>>>
>     >>>> jinesh kj wrote:
>     >>>>
>     >>>>
>     >>>>> hi all,
>     >>>>>
>     >>>>> I was trying to read from an XML file where some data have
>     ZERO Width
>     >>>>>
>     >>>>>
>     >>>> Joiner
>     >>>>
>     >>>>
>     >>>>> in it. I used the getTextContent in DOMNode. I was able to
>     read the
>     >>>>>
>     >>>>>
>     >>>> contents
>     >>>>
>     >>>>
>     >>>>> without Zero width joiner, but there are some issues with these
>     >>>>>
>     >> special
>     >>
>     >>>>> characters. What do i have to change? Do i have to make any
>     special
>     >>>>> settings? Or do i have to use any other function insttead?
>     >>>>>
>     >>>>> cheers
>     >>>>> Jinesh K J
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>
>     >>>
>     >>
>     >
>     >
>     >
>
>
>
>
> -- 
> My Feelings,Expressions-
> http://logbookofanobserver.blogspot.com
>
> SMC : My computer, My language http://smc.org.in
> സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ 


Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
hi,

I dumped using mysql -X command which will give me output as xml file. I
dont know whether there is any problem with my xml files. Is there any
specific notation to represent the ZWJ and ZWNJ in xml files?

I am attaching an xml file i have.

Thank you for your help, and if you have a better idea what to do with the
xml file when i get characters like these, or any links to those details,
please point me.

regards

Jinesh K J

On Nov 28, 2007 4:46 PM, Alberto Massari <am...@datadirect.com> wrote:

> If you can read the original file, but not when you edit it, I would bet
> the reason is in the way you edit your XML files (and dump from the
> database). What are you using? Could you attach a small sample file?
>
> Alberto
>
> jinesh kj wrote:
> > hi,
> >
> > I tried reading the file you send. It didnt give any error, which means
> it
> > was reading perfectly. I dont know how to check  in the debugger and
> all, so
> > dont know whether it  read 200d or not. But if i try to edit the xml
> file,
> > with some text data along with, it is not reading the the text. Do i
> have to
> > do anything for it? Basically i am trying to read through an xml file,
> which
> > is a dump of mysql database. It have many zwj and all. I dont know
> whether
> > it is according to specified encoding or so and all.But since it was
> dumped
> > from database, using the built in function, i think a chance for error
> is
> > too low.
> >
> > I am trying to use a similar function only, in my program, it returns
> > nothing when there is a ZWJ in my data.
> >
> > I hope i am clear. I am able to read xml files without ZWJ easily.
> >
> > regards
> >
> > Jinesh K J
> >
> > On Nov 28, 2007 4:02 PM, Alberto Massari <am...@datadirect.com>
> wrote:
> >
> >
> >> I am attaching a sample XML that contains a U+200D character between a
> >> --| and |-- pattern; I modified DOMPrint to issue a
> >>
> >>            const XMLCh*
> data=doc->getDocumentElement()->getTextContent();
> >>
> >> and in the debugger I see that data[4] is \x200D
> >> Have you checked your source XML  really has that character? Also, is
> >> the representation of the ZWJ character in the XML file valid according
> >> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
> >>
> >> Alberto
> >>
> >> jinesh kj wrote:
> >>
> >>> hi,
> >>>
> >>> Actually, getTextContent is not returning any value when there is a
> Zero
> >>> width joiner.
> >>>
> >>> cheers
> >>>
> >>> Jinesh K J
> >>>
> >>> On Nov 28, 2007 3:28 PM, Alberto Massari <am...@datadirect.com>
> >>>
> >> wrote:
> >>
> >>>
> >>>> Hi Jinesh,
> >>>> which kind of issues are you having? The text returned by
> >>>>
> >> getTextContent
> >>
> >>>> should contain a \x200D value inside. Or have you transcoded it into
> >>>> chars?
> >>>>
> >>>> Alberto
> >>>>
> >>>> jinesh kj wrote:
> >>>>
> >>>>
> >>>>> hi all,
> >>>>>
> >>>>> I was trying to read from an XML file where some data have ZERO
> Width
> >>>>>
> >>>>>
> >>>> Joiner
> >>>>
> >>>>
> >>>>> in it. I used the getTextContent in DOMNode. I was able to read the
> >>>>>
> >>>>>
> >>>> contents
> >>>>
> >>>>
> >>>>> without Zero width joiner, but there are some issues with these
> >>>>>
> >> special
> >>
> >>>>> characters. What do i have to change? Do i have to make any special
> >>>>> settings? Or do i have to use any other function insttead?
> >>>>>
> >>>>> cheers
> >>>>> Jinesh K J
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
> >
> >
> >
>
>


-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by Alberto Massari <am...@datadirect.com>.
If you can read the original file, but not when you edit it, I would bet 
the reason is in the way you edit your XML files (and dump from the 
database). What are you using? Could you attach a small sample file?

Alberto

jinesh kj wrote:
> hi,
>
> I tried reading the file you send. It didnt give any error, which means it
> was reading perfectly. I dont know how to check  in the debugger and all, so
> dont know whether it  read 200d or not. But if i try to edit the xml file,
> with some text data along with, it is not reading the the text. Do i have to
> do anything for it? Basically i am trying to read through an xml file, which
> is a dump of mysql database. It have many zwj and all. I dont know whether
> it is according to specified encoding or so and all.But since it was dumped
> from database, using the built in function, i think a chance for error is
> too low.
>
> I am trying to use a similar function only, in my program, it returns
> nothing when there is a ZWJ in my data.
>
> I hope i am clear. I am able to read xml files without ZWJ easily.
>
> regards
>
> Jinesh K J
>
> On Nov 28, 2007 4:02 PM, Alberto Massari <am...@datadirect.com> wrote:
>
>   
>> I am attaching a sample XML that contains a U+200D character between a
>> --| and |-- pattern; I modified DOMPrint to issue a
>>
>>            const XMLCh* data=doc->getDocumentElement()->getTextContent();
>>
>> and in the debugger I see that data[4] is \x200D
>> Have you checked your source XML  really has that character? Also, is
>> the representation of the ZWJ character in the XML file valid according
>> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
>>
>> Alberto
>>
>> jinesh kj wrote:
>>     
>>> hi,
>>>
>>> Actually, getTextContent is not returning any value when there is a Zero
>>> width joiner.
>>>
>>> cheers
>>>
>>> Jinesh K J
>>>
>>> On Nov 28, 2007 3:28 PM, Alberto Massari <am...@datadirect.com>
>>>       
>> wrote:
>>     
>>>       
>>>> Hi Jinesh,
>>>> which kind of issues are you having? The text returned by
>>>>         
>> getTextContent
>>     
>>>> should contain a \x200D value inside. Or have you transcoded it into
>>>> chars?
>>>>
>>>> Alberto
>>>>
>>>> jinesh kj wrote:
>>>>
>>>>         
>>>>> hi all,
>>>>>
>>>>> I was trying to read from an XML file where some data have ZERO Width
>>>>>
>>>>>           
>>>> Joiner
>>>>
>>>>         
>>>>> in it. I used the getTextContent in DOMNode. I was able to read the
>>>>>
>>>>>           
>>>> contents
>>>>
>>>>         
>>>>> without Zero width joiner, but there are some issues with these
>>>>>           
>> special
>>     
>>>>> characters. What do i have to change? Do i have to make any special
>>>>> settings? Or do i have to use any other function insttead?
>>>>>
>>>>> cheers
>>>>> Jinesh K J
>>>>>
>>>>>
>>>>>
>>>>>           
>>>
>>>       
>>     
>
>
>   


Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
hi,

I tried reading the file you send. It didnt give any error, which means it
was reading perfectly. I dont know how to check  in the debugger and all, so
dont know whether it  read 200d or not. But if i try to edit the xml file,
with some text data along with, it is not reading the the text. Do i have to
do anything for it? Basically i am trying to read through an xml file, which
is a dump of mysql database. It have many zwj and all. I dont know whether
it is according to specified encoding or so and all.But since it was dumped
from database, using the built in function, i think a chance for error is
too low.

I am trying to use a similar function only, in my program, it returns
nothing when there is a ZWJ in my data.

I hope i am clear. I am able to read xml files without ZWJ easily.

regards

Jinesh K J

On Nov 28, 2007 4:02 PM, Alberto Massari <am...@datadirect.com> wrote:

> I am attaching a sample XML that contains a U+200D character between a
> --| and |-- pattern; I modified DOMPrint to issue a
>
>            const XMLCh* data=doc->getDocumentElement()->getTextContent();
>
> and in the debugger I see that data[4] is \x200D
> Have you checked your source XML  really has that character? Also, is
> the representation of the ZWJ character in the XML file valid according
> to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?
>
> Alberto
>
> jinesh kj wrote:
> > hi,
> >
> > Actually, getTextContent is not returning any value when there is a Zero
> > width joiner.
> >
> > cheers
> >
> > Jinesh K J
> >
> > On Nov 28, 2007 3:28 PM, Alberto Massari <am...@datadirect.com>
> wrote:
> >
> >
> >> Hi Jinesh,
> >> which kind of issues are you having? The text returned by
> getTextContent
> >> should contain a \x200D value inside. Or have you transcoded it into
> >> chars?
> >>
> >> Alberto
> >>
> >> jinesh kj wrote:
> >>
> >>> hi all,
> >>>
> >>> I was trying to read from an XML file where some data have ZERO Width
> >>>
> >> Joiner
> >>
> >>> in it. I used the getTextContent in DOMNode. I was able to read the
> >>>
> >> contents
> >>
> >>> without Zero width joiner, but there are some issues with these
> special
> >>> characters. What do i have to change? Do i have to make any special
> >>> settings? Or do i have to use any other function insttead?
> >>>
> >>> cheers
> >>> Jinesh K J
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>
>


-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by Alberto Massari <am...@datadirect.com>.
I am attaching a sample XML that contains a U+200D character between a 
--| and |-- pattern; I modified DOMPrint to issue a

            const XMLCh* data=doc->getDocumentElement()->getTextContent();

and in the debugger I see that data[4] is \x200D
Have you checked your source XML  really has that character? Also, is 
the representation of the ZWJ character in the XML file valid according 
to the specified encoding (e.g. in UTF-8, it's 0xE2 0x80 0x8D)?

Alberto

jinesh kj wrote:
> hi,
>
> Actually, getTextContent is not returning any value when there is a Zero
> width joiner.
>
> cheers
>
> Jinesh K J
>
> On Nov 28, 2007 3:28 PM, Alberto Massari <am...@datadirect.com> wrote:
>
>   
>> Hi Jinesh,
>> which kind of issues are you having? The text returned by getTextContent
>> should contain a \x200D value inside. Or have you transcoded it into
>> chars?
>>
>> Alberto
>>
>> jinesh kj wrote:
>>     
>>> hi all,
>>>
>>> I was trying to read from an XML file where some data have ZERO Width
>>>       
>> Joiner
>>     
>>> in it. I used the getTextContent in DOMNode. I was able to read the
>>>       
>> contents
>>     
>>> without Zero width joiner, but there are some issues with these special
>>> characters. What do i have to change? Do i have to make any special
>>> settings? Or do i have to use any other function insttead?
>>>
>>> cheers
>>> Jinesh K J
>>>
>>>
>>>       
>>     
>
>
>   


Re: reg:[reading data with ZWJ and ZWNJ]

Posted by jinesh kj <ji...@gmail.com>.
hi,

Actually, getTextContent is not returning any value when there is a Zero
width joiner.

cheers

Jinesh K J

On Nov 28, 2007 3:28 PM, Alberto Massari <am...@datadirect.com> wrote:

> Hi Jinesh,
> which kind of issues are you having? The text returned by getTextContent
> should contain a \x200D value inside. Or have you transcoded it into
> chars?
>
> Alberto
>
> jinesh kj wrote:
> > hi all,
> >
> > I was trying to read from an XML file where some data have ZERO Width
> Joiner
> > in it. I used the getTextContent in DOMNode. I was able to read the
> contents
> > without Zero width joiner, but there are some issues with these special
> > characters. What do i have to change? Do i have to make any special
> > settings? Or do i have to use any other function insttead?
> >
> > cheers
> > Jinesh K J
> >
> >
>
>


-- 
My Feelings,Expressions-
http://logbookofanobserver.blogspot.com

SMC : My computer, My language http://smc.org.in
സ്വതന്ത്ര മലയാളം കമ്പ്യൂട്ടിങ്ങ്, എന്റെ കമ്പ്യൂട്ടറിന് എന്റെ ഭാഷ

Re: reg:[reading data with ZWJ and ZWNJ]

Posted by Alberto Massari <am...@datadirect.com>.
Hi Jinesh,
which kind of issues are you having? The text returned by getTextContent 
should contain a \x200D value inside. Or have you transcoded it into chars?

Alberto

jinesh kj wrote:
> hi all,
>
> I was trying to read from an XML file where some data have ZERO Width Joiner
> in it. I used the getTextContent in DOMNode. I was able to read the contents
> without Zero width joiner, but there are some issues with these special
> characters. What do i have to change? Do i have to make any special
> settings? Or do i have to use any other function insttead?
>
> cheers
> Jinesh K J
>
>