You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@corinthia.apache.org by Peter Kelly <pm...@apache.org> on 2015/05/12 09:04:02 UTC

ODF_to_HTML_keys (Branch"odf-filter-attempt2" review)

For mapping between ODF and HTML keys, I would suggest starting off with a more traditional/direct approach, where you have either a series of if statements or, equivalently, a switch statement, which checks what the current ODF element in the traversal is, and then goes ahead and creates the appropriate HTML node. So in traverseContent in ODFText.c you could have a switch statement there (remember the breaks at the end each case! - and also { } inside each case for scoping purposes).

I noticed also that in traverseContent there’s this line:

    if (odfChild->tag == 2) { // we have some text here.

I advise against using “magic numbers” like this, because it’s not at all clear what the two means (well, actually your comment makes it clear). But whenever you’re about to write a specific number, the question to ask is can you define a macro or constant whose name matches what the number means.

In fact in DFDOM.h there are the following macros defined:

    #define DOM_DOCUMENT                 1
    #define DOM_TEXT                     2
    #define DOM_COMMENT                  3
    #define DOM_CDATA                    4
    #define DOM_PROCESSING_INSTRUCTION   5

So you could change the line above to:

    if (odfChild->tag == DOM_TEXT) {
 
and then that makes the code self-describing, removing the need for the comment. Also, if for some reason the specific integer value used for text node was ever changed, then this code would still work correctly as long as the macro was updated. While the DOM_ numbers above are extremely unlikely to ever change, the other pre-defined constants (actually enums) like HTML_H1 defined in DFXMLNames.h are almost certain to change (when the file is re-generated from the script that assigns these numbers when someone adds some new names). So you should always use the symbolic names rather than writing out the numbers directly.

Despite my suggestion of starting with if statements or a switch statement in the traversal to begin with, I like where you’re going conceptually with the idea of representing the information necessary for translation in a data structure rather than code. In fact, whether or not this was a conscious thing or not, you’ve taken the very first step in designing your own domain-specific programming language for expressing transformations on XML data. Code that uses the data structure you define constitutes an interpreter for this language, and the sophistication of both the data structure and the interpreter can be expanded over time to cater for more complex needs. This is something I hope we can explore a lot further, and I’ve got a lot of ideas on this I’ve been thinking about for quite a while now.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

> On 10 May 2015, at 8:52 am, Gabriela Gibson <ga...@gmail.com> wrote:
> 
> Hi,
> 
> So far I got my branch to produce a list of html nodes (and report on
> still missing stuff).
> 
> This is probably a good point to have a look if the approach I'm using
> here is any good.
> 
> It of course has quite a few warts still, and I think I will need to
> add function pointers to the ODF_to_HTML_key struct to deal with some
> special cases.  If that struct is a good idea that is.
> 
> The branch can be found here:
> 
> https://github.com/apache/incubator-corinthia/commit/c81e68626489b9515e7e8f3a5ce5d38ac8f59af0
> 
> I added the test odt file I was using, plus the current output of the program.
> 
> thanks for looking,
> 
> G
> 
> -- 
> Visit my Coding Diary: http://gabriela-gibson.blogspot.com/