You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Mayur <ma...@gmail.com> on 2014/01/28 13:47:02 UTC

About openxml handling code

Hi,

I am very new to the OpenOffice code, and need some help understanding the
open-xml handling code. Could someone please answer the following
questions?

 i. There seem to be two distinct pieces of code that do open-xml parsing
in different ways. There's one part in "writerfilter" that has some
generated code (xslt based) that provides factories and classes for
creating different object types. And then, for sc and sd, all of the
parsing code is in the "oox" module and seems to be hand-written. Why is
that? Are there plans to move the parsing code to a common module? (perhaps
oox ...)

ii. Probably a related question - why are drawing-ml shapes and pictures
not supported in sw, while they are supported in sc and sd? The parsing
code seems to be there. The tag wps:wsp has very little delta with the p:sp
tag. Is this in works?

thanks,
mayur

Re: About openxml handling code

Posted by Andre Fischer <aw...@gmail.com>.
On 03.02.2014 18:54, Mayur wrote:
> Found something interesting. Writer seems to reject any graphics in OOXML
> documents - even VML ones. But there does seem to be code to support it.
> Only, if a couple of tiny glitches were fixed, possibly Writer will start
> showing VML shapes (at least). That'd work for all the 2007 MS word
> documents, as well as for some 2010 documents which would have the vml data
> in their mc:Alternativecontent tags. Here're the problems:
>    i.  A function getNamespace( )  in
> oox/source/shape/ShapeContextHandler.cxx always returned 0. The problem
> seems to be a
>        rather strange looking definition of the NMSP_MASK constant in
> oox/source/token/namespaces.hxx.tail. It says there:
>             *const sal_int32 TOKEN_MASK* = static_cast<sal_int32>* ( (1 <<
> 16) - 1 );  *
> *           const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
> SAL_MAX_INT16 & ~TOKEN_MASK );*
>
>       Why SAL_MAX_INT16? That would translate into (for windows)
> *              TOKEN_MASK = static_cast<long>(0xFFFF);  // 65535*
>                 *and NMSP_MASK = static_cast<long>(0x7FFF & ~TOKEN_MASK). //
> which is 0x00007FFF & 0xFFFF0000 = 0.*
>        And
>        Where as really, we should be looking for is the namespace value
> which is in the higher two bytes. i.e. the following change fixes it.
>         *        const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
> SAL_MAX_INT32 & ~TOKEN_MASK );*
>         That should set NMSP_MASK to the required 0xFFFF0000 to obtain the
> higher two bytes.

Good analysis.

>
>         To my mind, this sort of compactness isn't called for. Maybe, we
> could have simply used a compact struct to store namespace and tag.

I always wondered why we keep namespace and token separable once they 
have been read into memory.  While the XML file is scanned it does makes 
sense to use the same token for names in different namespaces.  This 
keeps the number of tokens and thus the complexity of the scanner small 
(well, as small as possible).  But once we have the tag (namespace and 
name) in memory we should not have to extract namespace or name from a 
tag.  After all, if I have a name n in two namespaces a and b then a:n 
and b:n are two different things and we can use enum values a_n and b_n 
with arbitrary values to represent them.  But maybe I am missing something?

-Andre

>
>   ii. In oox/source/vml/vmldrawingfragment.cxx, there's a switch in the
> function onCreateContext that says:
>          case VMLDRAWING_WORD:
>                    if ( isRootElement() )  {... }
>
>        Is this so that whenever a vml file is received as a separate
> document fragment, only then we create a shape context? Why not for the
> inline (v:rect or other) objects? I tried removing the check, and instead
> checking simply if nElement is a VML, then vml drawings were suddenly
> visible in writer.
>
> Is this a valid fix?
>
>
>
> On Wed, Jan 29, 2014 at 1:24 PM, Andre Fischer <aw...@gmail.com> wrote:
>
>> On 28.01.2014 13:47, Mayur wrote:
>>
>>> Hi,
>>>
>>> I am very new to the OpenOffice code, and need some help understanding the
>>> open-xml handling code. Could someone please answer the following
>>> questions?
>>>
>>>    i. There seem to be two distinct pieces of code that do open-xml parsing
>>> in different ways. There's one part in "writerfilter" that has some
>>> generated code (xslt based) that provides factories and classes for
>>> creating different object types. And then, for sc and sd, all of the
>>> parsing code is in the "oox" module and seems to be hand-written. Why is
>>> that? Are there plans to move the parsing code to a common module?
>>> (perhaps
>>> oox ...)
>>>
>> Re why: OOXML import has been developed while OpenOffice was maintained by
>> Sun, later Oracle.  There where at least three development teams involved
>> (for Writer, Calc, Draw/Impress). Sometimes they did not communicate with
>> each other as well as they should have.  Having different modules is one of
>> the results.  But, as far as I know, writerfilter has some calls into oox
>> for shared functionality.
>>
>> Re future plans: Some of us are thinking about improving the OOXML
>> support.  Consolidation of the code base into one module is a long term
>> goal.
>>
>>
>>
>>> ii. Probably a related question - why are drawing-ml shapes and pictures
>>> not supported in sw, while they are supported in sc and sd? The parsing
>>> code seems to be there. The tag wps:wsp has very little delta with the
>>> p:sp
>>> tag. Is this in works?
>>>
>> Well, see my above comments.
>> And, parsing OOXML is the easy part, importing the data into the
>> application model is the hard part.  Calc and Impress use the same model
>> for representing graphical objects, Writer has its own.
>>
>> If you are interested in OOXML import/export then maybe we can work
>> together on improving it?
>>
>> Regards,
>> Andre
>>
>>
>>> thanks,
>>> mayur
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: About openxml handling code

Posted by Oliver-Rainer Wittmann <or...@googlemail.com>.
Hi,

we had issue 123723 [1], solved in revision 1560326 regarding certain 
token IDs used in module oox and module writerfilter.
I am not sure, if the fix would be relevant to want you have observed.

[1] https://issues.apache.org/ooo/show_bug.cgi?id=123723


Best regards, Oliver.

On 03.02.2014 18:54, Mayur wrote:
> Found something interesting. Writer seems to reject any graphics in OOXML
> documents - even VML ones. But there does seem to be code to support it.
> Only, if a couple of tiny glitches were fixed, possibly Writer will start
> showing VML shapes (at least). That'd work for all the 2007 MS word
> documents, as well as for some 2010 documents which would have the vml data
> in their mc:Alternativecontent tags. Here're the problems:
>    i.  A function getNamespace( )  in
> oox/source/shape/ShapeContextHandler.cxx always returned 0. The problem
> seems to be a
>        rather strange looking definition of the NMSP_MASK constant in
> oox/source/token/namespaces.hxx.tail. It says there:
>             *const sal_int32 TOKEN_MASK* = static_cast<sal_int32>* ( (1 <<
> 16) - 1 );  *
> *           const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
> SAL_MAX_INT16 & ~TOKEN_MASK );*
>
>       Why SAL_MAX_INT16? That would translate into (for windows)
> *              TOKEN_MASK = static_cast<long>(0xFFFF);  // 65535*
>                 *and NMSP_MASK = static_cast<long>(0x7FFF & ~TOKEN_MASK). //
> which is 0x00007FFF & 0xFFFF0000 = 0.*
>        And
>        Where as really, we should be looking for is the namespace value
> which is in the higher two bytes. i.e. the following change fixes it.
>         *        const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
> SAL_MAX_INT32 & ~TOKEN_MASK );*
>         That should set NMSP_MASK to the required 0xFFFF0000 to obtain the
> higher two bytes.
>
>         To my mind, this sort of compactness isn't called for. Maybe, we
> could have simply used a compact struct to store namespace and tag.
>
>   ii. In oox/source/vml/vmldrawingfragment.cxx, there's a switch in the
> function onCreateContext that says:
>          case VMLDRAWING_WORD:
>                    if ( isRootElement() )  {... }
>
>        Is this so that whenever a vml file is received as a separate
> document fragment, only then we create a shape context? Why not for the
> inline (v:rect or other) objects? I tried removing the check, and instead
> checking simply if nElement is a VML, then vml drawings were suddenly
> visible in writer.
>
> Is this a valid fix?
>
>
>
> On Wed, Jan 29, 2014 at 1:24 PM, Andre Fischer <aw...@gmail.com> wrote:
>
>> On 28.01.2014 13:47, Mayur wrote:
>>
>>> Hi,
>>>
>>> I am very new to the OpenOffice code, and need some help understanding the
>>> open-xml handling code. Could someone please answer the following
>>> questions?
>>>
>>>    i. There seem to be two distinct pieces of code that do open-xml parsing
>>> in different ways. There's one part in "writerfilter" that has some
>>> generated code (xslt based) that provides factories and classes for
>>> creating different object types. And then, for sc and sd, all of the
>>> parsing code is in the "oox" module and seems to be hand-written. Why is
>>> that? Are there plans to move the parsing code to a common module?
>>> (perhaps
>>> oox ...)
>>>
>>
>> Re why: OOXML import has been developed while OpenOffice was maintained by
>> Sun, later Oracle.  There where at least three development teams involved
>> (for Writer, Calc, Draw/Impress). Sometimes they did not communicate with
>> each other as well as they should have.  Having different modules is one of
>> the results.  But, as far as I know, writerfilter has some calls into oox
>> for shared functionality.
>>
>> Re future plans: Some of us are thinking about improving the OOXML
>> support.  Consolidation of the code base into one module is a long term
>> goal.
>>
>>
>>
>>> ii. Probably a related question - why are drawing-ml shapes and pictures
>>> not supported in sw, while they are supported in sc and sd? The parsing
>>> code seems to be there. The tag wps:wsp has very little delta with the
>>> p:sp
>>> tag. Is this in works?
>>>
>>
>> Well, see my above comments.
>> And, parsing OOXML is the easy part, importing the data into the
>> application model is the hard part.  Calc and Impress use the same model
>> for representing graphical objects, Writer has its own.
>>
>> If you are interested in OOXML import/export then maybe we can work
>> together on improving it?
>>
>> Regards,
>> Andre
>>
>>
>>> thanks,
>>> mayur
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Re: About openxml handling code

Posted by Mayur <ma...@gmail.com>.
Found something interesting. Writer seems to reject any graphics in OOXML
documents - even VML ones. But there does seem to be code to support it.
Only, if a couple of tiny glitches were fixed, possibly Writer will start
showing VML shapes (at least). That'd work for all the 2007 MS word
documents, as well as for some 2010 documents which would have the vml data
in their mc:Alternativecontent tags. Here're the problems:
  i.  A function getNamespace( )  in
oox/source/shape/ShapeContextHandler.cxx always returned 0. The problem
seems to be a
      rather strange looking definition of the NMSP_MASK constant in
oox/source/token/namespaces.hxx.tail. It says there:
           *const sal_int32 TOKEN_MASK* = static_cast<sal_int32>* ( (1 <<
16) - 1 );  *
*           const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
SAL_MAX_INT16 & ~TOKEN_MASK );*

     Why SAL_MAX_INT16? That would translate into (for windows)
*              TOKEN_MASK = static_cast<long>(0xFFFF);  // 65535*
               *and NMSP_MASK = static_cast<long>(0x7FFF & ~TOKEN_MASK). //
which is 0x00007FFF & 0xFFFF0000 = 0.*
      And
      Where as really, we should be looking for is the namespace value
which is in the higher two bytes. i.e. the following change fixes it.
       *        const sal_Int32 NMSP_MASK       = static_cast< sal_Int32 >(
SAL_MAX_INT32 & ~TOKEN_MASK );*
       That should set NMSP_MASK to the required 0xFFFF0000 to obtain the
higher two bytes.

       To my mind, this sort of compactness isn't called for. Maybe, we
could have simply used a compact struct to store namespace and tag.

 ii. In oox/source/vml/vmldrawingfragment.cxx, there's a switch in the
function onCreateContext that says:
        case VMLDRAWING_WORD:
                  if ( isRootElement() )  {... }

      Is this so that whenever a vml file is received as a separate
document fragment, only then we create a shape context? Why not for the
inline (v:rect or other) objects? I tried removing the check, and instead
checking simply if nElement is a VML, then vml drawings were suddenly
visible in writer.

Is this a valid fix?



On Wed, Jan 29, 2014 at 1:24 PM, Andre Fischer <aw...@gmail.com> wrote:

> On 28.01.2014 13:47, Mayur wrote:
>
>> Hi,
>>
>> I am very new to the OpenOffice code, and need some help understanding the
>> open-xml handling code. Could someone please answer the following
>> questions?
>>
>>   i. There seem to be two distinct pieces of code that do open-xml parsing
>> in different ways. There's one part in "writerfilter" that has some
>> generated code (xslt based) that provides factories and classes for
>> creating different object types. And then, for sc and sd, all of the
>> parsing code is in the "oox" module and seems to be hand-written. Why is
>> that? Are there plans to move the parsing code to a common module?
>> (perhaps
>> oox ...)
>>
>
> Re why: OOXML import has been developed while OpenOffice was maintained by
> Sun, later Oracle.  There where at least three development teams involved
> (for Writer, Calc, Draw/Impress). Sometimes they did not communicate with
> each other as well as they should have.  Having different modules is one of
> the results.  But, as far as I know, writerfilter has some calls into oox
> for shared functionality.
>
> Re future plans: Some of us are thinking about improving the OOXML
> support.  Consolidation of the code base into one module is a long term
> goal.
>
>
>
>> ii. Probably a related question - why are drawing-ml shapes and pictures
>> not supported in sw, while they are supported in sc and sd? The parsing
>> code seems to be there. The tag wps:wsp has very little delta with the
>> p:sp
>> tag. Is this in works?
>>
>
> Well, see my above comments.
> And, parsing OOXML is the easy part, importing the data into the
> application model is the hard part.  Calc and Impress use the same model
> for representing graphical objects, Writer has its own.
>
> If you are interested in OOXML import/export then maybe we can work
> together on improving it?
>
> Regards,
> Andre
>
>
>> thanks,
>> mayur
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>
>

Re: About openxml handling code

Posted by Andre Fischer <aw...@gmail.com>.
On 28.01.2014 13:47, Mayur wrote:
> Hi,
>
> I am very new to the OpenOffice code, and need some help understanding the
> open-xml handling code. Could someone please answer the following
> questions?
>
>   i. There seem to be two distinct pieces of code that do open-xml parsing
> in different ways. There's one part in "writerfilter" that has some
> generated code (xslt based) that provides factories and classes for
> creating different object types. And then, for sc and sd, all of the
> parsing code is in the "oox" module and seems to be hand-written. Why is
> that? Are there plans to move the parsing code to a common module? (perhaps
> oox ...)

Re why: OOXML import has been developed while OpenOffice was maintained 
by Sun, later Oracle.  There where at least three development teams 
involved (for Writer, Calc, Draw/Impress). Sometimes they did not 
communicate with each other as well as they should have.  Having 
different modules is one of the results.  But, as far as I know, 
writerfilter has some calls into oox for shared functionality.

Re future plans: Some of us are thinking about improving the OOXML 
support.  Consolidation of the code base into one module is a long term 
goal.

>
> ii. Probably a related question - why are drawing-ml shapes and pictures
> not supported in sw, while they are supported in sc and sd? The parsing
> code seems to be there. The tag wps:wsp has very little delta with the p:sp
> tag. Is this in works?

Well, see my above comments.
And, parsing OOXML is the easy part, importing the data into the 
application model is the hard part.  Calc and Impress use the same model 
for representing graphical objects, Writer has its own.

If you are interested in OOXML import/export then maybe we can work 
together on improving it?

Regards,
Andre

>
> thanks,
> mayur
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org