You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@corinthia.apache.org by Gabriela Gibson <ga...@gmail.com> on 2015/05/11 20:07:16 UTC

Using regex.h?

In my gbg_test.c file, I have produced the following monstrosity:

Tag text_h(DFNode *node)
{
    char *s = node->attrs->value;
    if ((int)s[11] > 55 || strlen(s) == 13)
        return HTML_H6;
    else
        return HTML_H1 + (int)s[11] - 49;
}

Because I will need to make more such things to match the attribute
values, I'm wondering if we could use regex.h instead, or if that is
too unix specific and not available on other platforms.

G
-- 
Visit my Coding Diary: http://gabriela-gibson.blogspot.com/

Re: Using regex.h?

Posted by Peter Kelly <pm...@apache.org>.
> On 12 May 2015, at 1:07 am, Gabriela Gibson <ga...@gmail.com> wrote:
> 
> In my gbg_test.c file, I have produced the following monstrosity:
> 
> Tag text_h(DFNode *node)
> {
>    char *s = node->attrs->value;

Referencing node->attrs->value is incorrect, as you don’t know whether the node will have any attributes, and it if does, whether the style name (which is what I assume you’re looking for here) will happen to be the first one. DFGetAttribute(node,TEXT_STYLE_NAME) is how you would get this reliably.

>    if ((int)s[11] > 55 || strlen(s) == 13)

I saw a quote once that went something along the lines of “C gives you enough rope to hang yourself, and then a bit extra just to make sure”. The ability to index into arrays arbitrarily, without any bounds checking, is one of the many strands of such rope ;)

This code makes the assumption that the style name will contain at least 12 characters. If it doesn’t, s[11] will be some random value, and the test will randomly do the wrong thing based on whatever happened to be in that part of memory as a result of previous stuff the program did. Or crash, if you’re unlucky enough to be handed a string that is right near the end of an allocated block of memory.

What is 55? I had to look that up in an ascii chart. Ok, it’s the character code for ‘7’. In C, you can use character literals an integers interchangeably, so if you were going t do such a comparison (which is the wrong approach here, see below), you should have >= ‘7’.

>        return HTML_H6;
>    else
>        return HTML_H1 + (int)s[11] - 49;

In ODF, we can’t rely on style names to determine the heading level, because it’s perfectly legal to call them something other than Heading_20_n, which is what OpenOffice seems to do by default. The text:h element has an outline-level attribute; this indicates the level of the heading. So you should get the value of the TEXT_OUTLINE_LEVEL attribute and use that to determine which HTML heading tag to use.

I’m not sure what the best way of dealign with outline levels beyond 7 is. I’d suggest for now just making that a normal paragraph.

> Because I will need to make more such things to match the attribute
> values, I'm wondering if we could use regex.h instead, or if that is
> too unix specific and not available on other platforms.

I don’t believe it’s available on windows. At any rate, I would suggest avoiding regular expressions in the codebase unless there’s a really compelling need. If you come across other situations where you think a regex would be appropriate let me know; from what I’ve seen of ODF I think we should be able to get away without them.

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)