You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stephan Michels <st...@vern.chem.tu-berlin.de> on 2002/02/16 17:39:29 UTC

[Ann] chaperon project launched at SF(was: textparser)

Hi,

I finally found a name for my project;-) I have now check in the source
on Sourceforge.

http://sourceforge.net/projects/chaperon/

Chaperon is a LALR(1) parser, which parse structured text documents and
generate XML documents as output. It includes a parser generator like yacc
and a regex scaner like lex. As input use Chaperon a grammar written in
XML.

I wrote two little examples. The first example creates a XMl document from
a little mathematical expression and colorize it
http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg

The second example transform a mathematical expression, written in a form
similar to latex, to MathML and produce a GIF using the my serializer from
the jeuclid project.
http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg

I were glad to hear your opinions od suggestions.

Stephan Michels.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stephan Michels wrote:

>Hi,
>
>I finally found a name for my project;-) I have now check in the source
>on Sourceforge.
>
>http://sourceforge.net/projects/chaperon/
>
>Chaperon is a LALR(1) parser, which parse structured text documents and
>generate XML documents as output. It includes a parser generator like yacc
>and a regex scaner like lex. As input use Chaperon a grammar written in
>XML.
>
>I wrote two little examples. The first example creates a XMl document from
>a little mathematical expression and colorize it
>http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg
>
>The second example transform a mathematical expression, written in a form
>similar to latex, to MathML and produce a GIF using the my serializer from
>the jeuclid project.
>http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg
>
>I were glad to hear your opinions od suggestions.
>
>Stephan Michels.
>
To give you some input, do you know the JEDI parser, at 
http://www.darmstadt.gmd.de/oasys/projects/jedi/index.html ?

I used it for some demos, and It's very porwerful to parse text content 
to XML. Unfortunately, it's not opensource and seems to be frozen (I 
sent a mail to the author but got no answer), but maybe you'll find some 
inspiration there.

Sylvain.




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Stefano Mazzocchi <st...@apache.org>.
Stephan Michels wrote:

> > Stephan,
> >
> > I would love to ship chaperon with Cocoon and with a few of those useful
> > grammars of yours. Do you have a few significant samples that we could
> > ship to show the functionality?
> 
> I glad to hear this :)
> 
> Which samples do you prefer to see? Any ideas?

Impress me :)

No, seriously, I think LaTeX to PDF would be a great one :)

Another great one would be java to HTML. Sort of LXR, what Jakarta
Alexandria was trying to clone (but stopped, AFAIK)

And talking about dreams, having a way to transform Mailbox or MIME
messages into their equivalent XML versions would be *great* for an XML
based (XIndice stored) mail archive (with Lucene on top to provide the
textual search)

What do you think?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
On Sunday 24 February 2002 10:50, Stephan Michels wrote:
>. . .
> Which samples do you prefer to see? Any ideas?

I think a sample grammar for a wiki-like structured text language would 
be very interesting for Cocooners.

I'm thinking of a simple language a la PHPwiki or aptconvert, something 
along the lines of

H1 this is a heading level one
A normal paragraph here, as many
lines as needed, ends with a blank or "code" line.
* a bulleted list item
% a numbered list item
%% numbered list sub-item
Here we have some _bold_ and __some italic text__.
+****this line starts with a star sign escaped with a plus sign.

Which the text parser would convert to XML (docbook subset ideally?)

aptconvert uses tabs at the start of lines for markup, it makes the 
structured text look nicer but I don't think this would work when using 
HTML forms for text entry.

The PHPwiki language 
(http://phpwiki.sourceforge.net/phpwiki/TextFormattingRules) looks 
fairly complete for simple text formatting, IMHO this or a similar 
language would be a very interesting sample.

-- 
 -- Bertrand Delacrétaz, www.codeconsult.ch
 -- web technologies consultant - OO, Java, XML, C++






---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Stephan Michels <st...@vern.chem.tu-berlin.de>.

On Sun, 24 Feb 2002, Stefano Mazzocchi wrote:

> Stephan Michels wrote:
> >
> > On Sun, 17 Feb 2002, Stefano Mazzocchi wrote:
> >
> > > > I wrote two little examples. The first example creates a XMl document from
> > > > a little mathematical expression and colorize it
> > > > http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg
> > > >
> > > > The second example transform a mathematical expression, written in a form
> > > > similar to latex, to MathML and produce a GIF using the my serializer from
> > > > the jeuclid project.
> > > > http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg
> > > >
> > > > I were glad to hear your opinions od suggestions.
> > >
> > > I'm downloading it right now.
> > >
> > > Hmmmm, just wondering: how hard would it be to write a Chaperon grammar
> > > for the email MBOX format?
> > >
> > > That way, we could have Chaperon transform all our email into XML, place
> > > it into an XIndice, publish it with Cocoon and index it with Lucene...
> > > and voila' here is the Forrest module for mail archiving :)
> >
> > Hmm, MBOX format? Is this the format from RFC 822?
> > http://www.faqs.org/rfcs/rfc822.html
> >
> > Then yes, I have a near completed Grammar on the CVS of chaperon.
>
> Wow, impressive.
>
> Stephan,
>
> I would love to ship chaperon with Cocoon and with a few of those useful
> grammars of yours. Do you have a few significant samples that we could
> ship to show the functionality?

I glad to hear this :)

Which samples do you prefer to see? Any ideas?


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Stefano Mazzocchi <st...@apache.org>.
Stephan Michels wrote:
> 
> On Sun, 17 Feb 2002, Stefano Mazzocchi wrote:
> 
> > > I wrote two little examples. The first example creates a XMl document from
> > > a little mathematical expression and colorize it
> > > http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg
> > >
> > > The second example transform a mathematical expression, written in a form
> > > similar to latex, to MathML and produce a GIF using the my serializer from
> > > the jeuclid project.
> > > http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg
> > >
> > > I were glad to hear your opinions od suggestions.
> >
> > I'm downloading it right now.
> >
> > Hmmmm, just wondering: how hard would it be to write a Chaperon grammar
> > for the email MBOX format?
> >
> > That way, we could have Chaperon transform all our email into XML, place
> > it into an XIndice, publish it with Cocoon and index it with Lucene...
> > and voila' here is the Forrest module for mail archiving :)
> 
> Hmm, MBOX format? Is this the format from RFC 822?
> http://www.faqs.org/rfcs/rfc822.html
> 
> Then yes, I have a near completed Grammar on the CVS of chaperon.

Wow, impressive.

Stephan, 

I would love to ship chaperon with Cocoon and with a few of those useful
grammars of yours. Do you have a few significant samples that we could
ship to show the functionality?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Stephan Michels <st...@vern.chem.tu-berlin.de>.

On Sun, 17 Feb 2002, Stefano Mazzocchi wrote:

> > I wrote two little examples. The first example creates a XMl document from
> > a little mathematical expression and colorize it
> > http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg
> >
> > The second example transform a mathematical expression, written in a form
> > similar to latex, to MathML and produce a GIF using the my serializer from
> > the jeuclid project.
> > http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg
> >
> > I were glad to hear your opinions od suggestions.
>
> I'm downloading it right now.
>
> Hmmmm, just wondering: how hard would it be to write a Chaperon grammar
> for the email MBOX format?
>
> That way, we could have Chaperon transform all our email into XML, place
> it into an XIndice, publish it with Cocoon and index it with Lucene...
> and voila' here is the Forrest module for mail archiving :)

Hmm, MBOX format? Is this the format from RFC 822?
http://www.faqs.org/rfcs/rfc822.html

Then yes, I have a near completed Grammar on the CVS of chaperon.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Stefano Mazzocchi <st...@apache.org>.
Stephan Michels wrote:
> 
> Hi,
> 
> I finally found a name for my project;-) I have now check in the source
> on Sourceforge.
> 
> http://sourceforge.net/projects/chaperon/

Cool!

> Chaperon is a LALR(1) parser, which parse structured text documents and
> generate XML documents as output. It includes a parser generator like yacc
> and a regex scaner like lex. As input use Chaperon a grammar written in
> XML.

Very cool!
 
> I wrote two little examples. The first example creates a XMl document from
> a little mathematical expression and colorize it
> http://chaperon.sourceforge.net/chaperon-screenshot-1.jpg
> 
> The second example transform a mathematical expression, written in a form
> similar to latex, to MathML and produce a GIF using the my serializer from
> the jeuclid project.
> http://chaperon.sourceforge.net/chaperon-screenshot-2.jpg
> 
> I were glad to hear your opinions od suggestions.

I'm downloading it right now.

Hmmmm, just wondering: how hard would it be to write a Chaperon grammar
for the email MBOX format?

That way, we could have Chaperon transform all our email into XML, place
it into an XIndice, publish it with Cocoon and index it with Lucene...
and voila' here is the Forrest module for mail archiving :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Ann] chaperon project launched at SF(was: textparser)

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.
On Saturday 16 February 2002 17:39, Stephan Michels wrote:
>. . .
> Chaperon is a LALR(1) parser, which parse structured text documents
> and generate XML documents as output. 
>. . .

If anyone wants to play with Chaperon, here's a small command-line 
driver that I wrote for this parser, allows Chaperon to run in a 
minimal environment.

Tested a few minutes ago by compiling it with the current CVS of 
chaperon (http://sourceforge.net/projects/chaperon).

Stephan, feel free to include this code in your project if you like it!

-Bertrand

-------- CODE STARTS HERE ------------------------
/*
 *  Simple command-line driver for the chaperon parser.
 *  See http://www.sourceforge.net/projects/chaperon
 *  Copyright (C) Bertrand Delacretaz, www.codeconsult.ch. 
 *  All rights reserved.
 *  -------------------------------------------------------------------
 *  This software is published under the terms of the Apache Software 
 *  License version 1.1, a copy of which has been included  with this 
 *  distribution in the LICENSE file.
 */

package net.sourceforge.chaperon.cmdline;

import java.io.OutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.OutputStreamWriter;

import org.apache.xml.serialize.Method;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;

import org.apache.xerces.parsers.SAXParser;

import org.xml.sax.SAXParseException;
import org.xml.sax.InputSource;

import net.sourceforge.chaperon.grammar.Grammar;
import net.sourceforge.chaperon.grammar.SyntaxErrorException;
import net.sourceforge.chaperon.grammar.generator.SAXGrammarGenerator;
import net.sourceforge.chaperon.parser.generator.ParserTableGenerator;
import net.sourceforge.chaperon.parser.ParserTable;
import net.sourceforge.chaperon.parser.Parser;
import net.sourceforge.chaperon.parser.CompressedDocument;

/**
 * Simple command-line driver for the chaperon parser
 *
 * @author Bertrand Delacretaz bdelacretaz@codeconsult.ch
 * @version $Revision$
 */
public class CmdLineParser {
    
    /** Parse grammarFile and use its grammar to parse inputFile.
     *  Write the result to os.
     *  Does not store the compiled grammar, recompiles it every time.
     */
    CmdLineParser(File grammarFile, File inputFile, OutputStream os)
    throws Exception {
        final ParserTable pt = parseGrammar(grammarFile);
        final CompressedDocument cd = parseInput(pt,inputFile,os);
        dumpDocument(cd,os);
    }
    
    /** parse supplied grammarFile */
    private ParserTable parseGrammar(File grammarFile)
    throws Exception {
        final SAXParser parser = new SAXParser();
        final SAXGrammarGenerator gg = new SAXGrammarGenerator();
        parser.setContentHandler(gg);
        
        info("parsing grammar file " + grammarFile.getName() + "...");
        parser.parse(grammarFile.getAbsolutePath());
        
        info("building parser table...");
        final Grammar g = gg.getGrammar();
        if(g == null) {
            throw new Exception("no Grammar was generated while parsing 
grammar file");
        }
        final ParserTableGenerator ptg = new ParserTableGenerator(g);
        return ptg.getParserTable();
    }
    
    /** parse supplied inputFile using supplied ParserTable (compiled 
grammar) */
    private CompressedDocument parseInput(ParserTable pt,File 
inputFile,OutputStream os)
    throws Exception {
        info("parsing input file " + inputFile.getName() + "...");
        final Parser p = new Parser();
        final InputSource is = new InputSource(new 
FileInputStream(inputFile));
        return p.parse(pt,is);
    }

    /** dump supplied CompressedDocument to supplied OutputStream */
    private void dumpDocument(CompressedDocument cd,OutputStream os)
    throws Exception {
        info("dumping parsed XML document...");
        final String encoding = "iso-8859-1";
        final OutputFormat format = new 
OutputFormat(Method.XML,encoding,true);
        format.setIndenting(true);
        format.setIndent(1);
        
        final OutputStreamWriter osw = new OutputStreamWriter(os);
        final XMLSerializer xmls = new XMLSerializer(osw,format);
        
        cd.toSAX(xmls.asContentHandler(), null);
        os.flush();
    }

    /** trivial logging mechanism */
    protected static void warn(String msg) {
        System.err.println("Chaperon CmdLineParser WARNING : " + msg);
    }
    
    /** trivial info mechanism */
    protected static void info(String msg) {
        System.err.println("Chaperon CmdLineParser: " + msg);
    }
    
    /** Entry point to parser from the command-line.
     *  Compiles the given grammar file and runs the parser on given 
input file.
     *  Output goes to stdout unless an output filename is specified
     */
    public static void main(String args[])
    throws Exception {
        if(args.length < 2) {
            warn("usage: CmdLineParser <grammarFile> <inputFile> 
[outputFile]");
            System.exit(1);
        }
        
        final String grammarFile = args[0];
        final String inputFile = args[1];
        final String outputFile = args.length > 2 ? args[2] : null;
        
        info("using grammar file: " + grammarFile);
        info("using input file: " + grammarFile);
        info("using output file: " + (outputFile == null ? "stdout" : 
outputFile));
        
        OutputStream os = System.out;
        if(outputFile != null) {
            os = new FileOutputStream(outputFile);
        }
        
        new CmdLineParser(new File(grammarFile),new File(inputFile),os);
        info("all done.");
    }
}
--- CODE ENDS HERE - no kidding, you read it entirely? ----------------







---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org