You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by SAXESS - Hussayn Dabbous <da...@saxess.com> on 2003/01/13 14:22:21 UTC

cocoon and non XML content ... (was Jackson... five?)

Hy;

I struggled over following problem and wonder, if this is relevant
and has been solved within cocoon:

assume, you have some content, that is plain text, e.g. log reports.
Now you want to use this text with cocoon. Naturaly you have to
convert the text to XML. This could be done by writing a new
generator of course, which would be specific to the data, it has
to convert.

Now assume, you have many different sources, that have to be
transformed into XML.

Wouldn't it be nice to have a generator at hand, that could be
controlled via configuration? By this i can use one generator,
then configure the conversion rules as needed, get the XML data
out of it, then proceed within cocoon pipelines ...


One possible use case (sounds like beeing a JTidy task, but it isn't):

i have several servers, that produce very dirty HTML, intermixed with
javascript. My generator shall gather data from these sites and
not only convert html to xhtml, but also do some necessary modifications 
within the javascript, which is certainly not a suitable task for XSLT 
processing, nor for JTidy. i could think of regexp processing here...

Rather than creating dedicated generators for every site, i want one
generator, that can be configured to convert data dependent on the
url, or whatever... I think, this is just another step towards
real content syndication ...

What do you mean?
Any thoughts are welcome ...

regards, hussayn

-- 
Dr. Hussayn Dabbous
SAXESS Software Design GmbH
Neuenhöfer Allee 125
50935 Köln
Telefon: +49-221-56011-0
Fax:     +49-221-56011-20
E-Mail:  dabbous@saxess.com


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: cocoon and non XML content ... (was Jackson... five?)

Posted by SAXESS - Hussayn Dabbous <da...@saxess.com>.
Stephen,

Thank you very much for this info;
Do you know, if the new TextParser will be back portable
to cocoon-2.0.4 ?

regards, Hussayn

Stephan Michels wrote:
> 
> On Mon, 13 Jan 2003, SAXESS - Hussayn Dabbous wrote:
> 
> 
>>Oh, sorry for my question...
>>
>>Is it possibly the TextParser generator, i am looking for ?
>>Could this parser also handle "unstructured text" as follows:
>>
>>"Take out a peace of data from the input, replace it by
>>something else and finally make all of the stuff a valid
>>XML output..."
>>
>>

<snip/>

> The next version of the chaperon components will have an text generator
> included, which will likely be design as a XMLizer.
> The version will also have a lexical scanner included, which use pattern
> similar to regex to tokenize the text. If you don't have structured
> text. This LexicalTransformer can be use for example in syntax
> highlighting.
> 
> This version will be finished in the next days, so staty tuned.
> 
> Stephan Michels.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
> 

-- 
Dr. Hussayn Dabbous
SAXESS Software Design GmbH
Neuenhöfer Allee 125
50935 Köln
Telefon: +49-221-56011-0
Fax:     +49-221-56011-20
E-Mail:  dabbous@saxess.com


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: cocoon and non XML content ... (was Jackson... five?)

Posted by Stephan Michels <st...@apache.org>.

On Mon, 13 Jan 2003, SAXESS - Hussayn Dabbous wrote:

> Oh, sorry for my question...
>
> Is it possibly the TextParser generator, i am looking for ?
> Could this parser also handle "unstructured text" as follows:
>
> "Take out a peace of data from the input, replace it by
> something else and finally make all of the stuff a valid
> XML output..."
>
>
> regards,
> hussayn
>
> SAXESS - Hussayn Dabbous wrote:
> > Hy;
> >
> > I struggled over following problem and wonder, if this is relevant
> > and has been solved within cocoon:
> >
> > assume, you have some content, that is plain text, e.g. log reports.
> > Now you want to use this text with cocoon. Naturaly you have to
> > convert the text to XML. This could be done by writing a new
> > generator of course, which would be specific to the data, it has
> > to convert.
> >
> > Now assume, you have many different sources, that have to be
> > transformed into XML.
> >
> > Wouldn't it be nice to have a generator at hand, that could be
> > controlled via configuration? By this i can use one generator,
> > then configure the conversion rules as needed, get the XML data
> > out of it, then proceed within cocoon pipelines ...
> >
> >
> > One possible use case (sounds like beeing a JTidy task, but it isn't):
> >
> > i have several servers, that produce very dirty HTML, intermixed with
> > javascript. My generator shall gather data from these sites and
> > not only convert html to xhtml, but also do some necessary modifications
> > within the javascript, which is certainly not a suitable task for XSLT
> > processing, nor for JTidy. i could think of regexp processing here...
> >
> > Rather than creating dedicated generators for every site, i want one
> > generator, that can be configured to convert data dependent on the
> > url, or whatever... I think, this is just another step towards
> > real content syndication ...
> >
> > What do you mean?
> > Any thoughts are welcome ...

The next version of the chaperon components will have an text generator
included, which will likely be design as a XMLizer.
The version will also have a lexical scanner included, which use pattern
similar to regex to tokenize the text. If you don't have structured
text. This LexicalTransformer can be use for example in syntax
highlighting.

This version will be finished in the next days, so staty tuned.

Stephan Michels.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: cocoon and non XML content ... (was Jackson... five?)

Posted by SAXESS - Hussayn Dabbous <da...@saxess.com>.
Oh, sorry for my question...

Is it possibly the TextParser generator, i am looking for ?
Could this parser also handle "unstructured text" as follows:

"Take out a peace of data from the input, replace it by
something else and finally make all of the stuff a valid
XML output..."


regards,
hussayn

SAXESS - Hussayn Dabbous wrote:
> Hy;
> 
> I struggled over following problem and wonder, if this is relevant
> and has been solved within cocoon:
> 
> assume, you have some content, that is plain text, e.g. log reports.
> Now you want to use this text with cocoon. Naturaly you have to
> convert the text to XML. This could be done by writing a new
> generator of course, which would be specific to the data, it has
> to convert.
> 
> Now assume, you have many different sources, that have to be
> transformed into XML.
> 
> Wouldn't it be nice to have a generator at hand, that could be
> controlled via configuration? By this i can use one generator,
> then configure the conversion rules as needed, get the XML data
> out of it, then proceed within cocoon pipelines ...
> 
> 
> One possible use case (sounds like beeing a JTidy task, but it isn't):
> 
> i have several servers, that produce very dirty HTML, intermixed with
> javascript. My generator shall gather data from these sites and
> not only convert html to xhtml, but also do some necessary modifications 
> within the javascript, which is certainly not a suitable task for XSLT 
> processing, nor for JTidy. i could think of regexp processing here...
> 
> Rather than creating dedicated generators for every site, i want one
> generator, that can be configured to convert data dependent on the
> url, or whatever... I think, this is just another step towards
> real content syndication ...
> 
> What do you mean?
> Any thoughts are welcome ...
> 
> regards, hussayn
> 

-- 
Dr. Hussayn Dabbous
SAXESS Software Design GmbH
Neuenhöfer Allee 125
50935 Köln
Telefon: +49-221-56011-0
Fax:     +49-221-56011-20
E-Mail:  dabbous@saxess.com


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org