You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@camel.apache.org by Jan Bernhardt <jb...@talend.com> on 2016/11/25 07:15:43 UTC

Parsing unstructured Text in Camel

Hi Camel Users,


is there any component which helps me to parse plain text? Not JSON, XML or CSV.


My use case is that I receive an E-Mail with multiple keywords in the Subject as well as in the body.

I could not find any component that would help me to parse certain values from my multiline plaintext.


I need something like freemarker but the other way around. Getting the fulltext and parsing certain values from this text (for example with regex).


Any help would be much appreciated.


Many thanks

Jan

AW: Parsing unstructured Text in Camel

Posted by Jan Bernhardt <jb...@talend.com>.
Jira is created. Not sure if I can find the time to implement this feature myself.

https://issues.apache.org/jira/browse/CAMEL-10540

Best regards
Jan

> -----Ursprüngliche Nachricht-----
> Von: Claus Ibsen [mailto:claus.ibsen@gmail.com]
> Gesendet: Montag, 28. November 2016 09:29
> An: users@camel.apache.org
> Betreff: Re: Parsing unstructured Text in Camel
> 
> Hi
> 
> No there is no grok component in Camel. It would be really nice to have, so
> you are welcome to log a JIRA ticket and help work on such a component. We
> love contributions http://camel.apache.org/contributing
> 
> I guess we can try to see if we can use the grok parser for elasticsearch
> https://github.com/elastic/elasticsearch/tree/master/modules/ingest-
> common/src/main/java/org/elasticsearch/ingest/common
> 
> 
> 
> 
> 
> On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <jb...@talend.com>
> wrote:
> > Hi JB,
> >
> > I know self-coding is always possible, I was just wondering if there is an
> easier way. For example logstash provides a grok parser for this:
> > https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.
> > html
> >
> > I was wondering if camel provides something similar or if it would be a good
> idea to add a camel-grok component.
> >
> > Best regards
> > Jan
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Jean-Baptiste Onofré [mailto:jb@nanthrax.net]
> >> Gesendet: Freitag, 25. November 2016 08:23
> >> An: users@camel.apache.org
> >> Cc: users@camel.apache.org
> >> Betreff: Re: Parsing unstructured Text in Camel
> >>
> >> Hi Jan
> >>
> >> You can always use a custom processor for that.
> >>
> >> Regards
> >> JB⁣
> >>
> >> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt
> >> <jb...@talend.com>
> >> wrote:
> >> >Hi Camel Users,
> >> >
> >> >
> >> >is there any component which helps me to parse plain text? Not JSON,
> >> >XML or CSV.
> >> >
> >> >
> >> >My use case is that I receive an E-Mail with multiple keywords in
> >> >the Subject as well as in the body.
> >> >
> >> >I could not find any component that would help me to parse certain
> >> >values from my multiline plaintext.
> >> >
> >> >
> >> >I need something like freemarker but the other way around. Getting
> >> >the fulltext and parsing certain values from this text (for example
> >> >with regex).
> >> >
> >> >
> >> >Any help would be much appreciated.
> >> >
> >> >
> >> >Many thanks
> >> >
> >> >Jan
> 
> 
> 
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2

Re: Parsing unstructured Text in Camel

Posted by Andrea Cosentino <an...@yahoo.com.INVALID>.
+1 for a Grok component :-)
 --
Andrea Cosentino 
----------------------------------
Apache Camel PMC Member
Apache Karaf Committer
Apache Servicemix Committer
Email: ancosen1985@yahoo.com
Twitter: @oscerd2
Github: oscerd



On Monday, November 28, 2016 9:29 AM, Claus Ibsen <cl...@gmail.com> wrote:
Hi

No there is no grok component in Camel. It would be really nice to
have, so you are welcome to log a JIRA ticket and help work on such a
component. We love contributions
http://camel.apache.org/contributing

I guess we can try to see if we can use the grok parser for elasticsearch
https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common






On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <jb...@talend.com> wrote:
> Hi JB,
>
> I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
> https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
>
> I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.
>
> Best regards
> Jan
>
>> -----Ursprüngliche Nachricht-----
>> Von: Jean-Baptiste Onofré [mailto:jb@nanthrax.net]
>> Gesendet: Freitag, 25. November 2016 08:23
>> An: users@camel.apache.org
>> Cc: users@camel.apache.org
>> Betreff: Re: Parsing unstructured Text in Camel
>>
>> Hi Jan
>>
>> You can always use a custom processor for that.
>>
>> Regards
>> JB⁣
>>
>> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <jb...@talend.com>
>> wrote:
>> >Hi Camel Users,
>> >
>> >
>> >is there any component which helps me to parse plain text? Not JSON,
>> >XML or CSV.
>> >
>> >
>> >My use case is that I receive an E-Mail with multiple keywords in the
>> >Subject as well as in the body.
>> >
>> >I could not find any component that would help me to parse certain
>> >values from my multiline plaintext.
>> >
>> >
>> >I need something like freemarker but the other way around. Getting the
>> >fulltext and parsing certain values from this text (for example with
>> >regex).
>> >
>> >
>> >Any help would be much appreciated.
>> >
>> >
>> >Many thanks
>> >
>> >Jan



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2 

Re: Parsing unstructured Text in Camel

Posted by Claus Ibsen <cl...@gmail.com>.
Hi

No there is no grok component in Camel. It would be really nice to
have, so you are welcome to log a JIRA ticket and help work on such a
component. We love contributions
http://camel.apache.org/contributing

I guess we can try to see if we can use the grok parser for elasticsearch
https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common





On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <jb...@talend.com> wrote:
> Hi JB,
>
> I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
> https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
>
> I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.
>
> Best regards
> Jan
>
>> -----Ursprüngliche Nachricht-----
>> Von: Jean-Baptiste Onofré [mailto:jb@nanthrax.net]
>> Gesendet: Freitag, 25. November 2016 08:23
>> An: users@camel.apache.org
>> Cc: users@camel.apache.org
>> Betreff: Re: Parsing unstructured Text in Camel
>>
>> Hi Jan
>>
>> You can always use a custom processor for that.
>>
>> Regards
>> JB⁣
>>
>> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <jb...@talend.com>
>> wrote:
>> >Hi Camel Users,
>> >
>> >
>> >is there any component which helps me to parse plain text? Not JSON,
>> >XML or CSV.
>> >
>> >
>> >My use case is that I receive an E-Mail with multiple keywords in the
>> >Subject as well as in the body.
>> >
>> >I could not find any component that would help me to parse certain
>> >values from my multiline plaintext.
>> >
>> >
>> >I need something like freemarker but the other way around. Getting the
>> >fulltext and parsing certain values from this text (for example with
>> >regex).
>> >
>> >
>> >Any help would be much appreciated.
>> >
>> >
>> >Many thanks
>> >
>> >Jan



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

AW: Parsing unstructured Text in Camel

Posted by Jan Bernhardt <jb...@talend.com>.
Hi JB,

I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.

Best regards
Jan

> -----Ursprüngliche Nachricht-----
> Von: Jean-Baptiste Onofré [mailto:jb@nanthrax.net]
> Gesendet: Freitag, 25. November 2016 08:23
> An: users@camel.apache.org
> Cc: users@camel.apache.org
> Betreff: Re: Parsing unstructured Text in Camel
> 
> Hi Jan
> 
> You can always use a custom processor for that.
> 
> Regards
> JB⁣​
> 
> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <jb...@talend.com>
> wrote:
> >Hi Camel Users,
> >
> >
> >is there any component which helps me to parse plain text? Not JSON,
> >XML or CSV.
> >
> >
> >My use case is that I receive an E-Mail with multiple keywords in the
> >Subject as well as in the body.
> >
> >I could not find any component that would help me to parse certain
> >values from my multiline plaintext.
> >
> >
> >I need something like freemarker but the other way around. Getting the
> >fulltext and parsing certain values from this text (for example with
> >regex).
> >
> >
> >Any help would be much appreciated.
> >
> >
> >Many thanks
> >
> >Jan

Re: Parsing unstructured Text in Camel

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Jan

You can always use a custom processor for that.

Regards
JB\u2063\u200b

On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <jb...@talend.com> wrote:
>Hi Camel Users,
>
>
>is there any component which helps me to parse plain text? Not JSON,
>XML or CSV.
>
>
>My use case is that I receive an E-Mail with multiple keywords in the
>Subject as well as in the body.
>
>I could not find any component that would help me to parse certain
>values from my multiline plaintext.
>
>
>I need something like freemarker but the other way around. Getting the
>fulltext and parsing certain values from this text (for example with
>regex).
>
>
>Any help would be much appreciated.
>
>
>Many thanks
>
>Jan

Re: Parsing unstructured Text in Camel

Posted by Anton <ku...@gmail.com>.
UIMA can work with any unstructured data.

On Nov 26, 2016 1:27 PM, "souciance" <so...@gmail.com>
wrote:

> Is UIMA useful as a tool for processing basic CSV files as well as
> complicated EDIFACT data? Or is it meant to be applied on other types of
> unstructured data?
>
> On Sat, Nov 26, 2016 at 1:03 PM, Anton-2 [via Camel] <
> ml-node+s465427n5790669h2@n5.nabble.com> wrote:
>
> > On Sat, Nov 26, 2016 at 12:29 PM, souciance <
> > [hidden email] <http:///user/SendEmail.jtp?type=node&node=5790669&i=0>>
> > wrote:
> >
> > > How does UIMA work
> > > with data that does not necessarily with a particular model at all
> > times?
> > >
> >
> > UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
> > that powers IBM Watson.
> > As to how it extracts knowledge from unstructured data, it uses A Common
> > Analysis Structure(CAS), which is a way of defining Analysis Engines.
> > Typically these are text based NLP process but can also be audio and
> > video.
> >
> > UIMA is a big topic. It has a strong and helpful community behind it.
> >
> >
> > ------------------------------
> > If you reply to this email, your message will be added to the discussion
> > below:
> > http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-
> > in-Camel-tp5790513p5790669.html
> > To start a new topic under Camel - Users, email
> > ml-node+s465427n465428h31@n5.nabble.com
> > To unsubscribe from Camel - Users, click here
> > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=
> unsubscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aU
> BnbWFpbC5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> > .
> > NAML
> > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_
> viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.
> BasicNamespace-nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace&breadcrumbs=
> notify_subscribers%21nabble%3Aemail.naml-instant_emails%
> 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.
> com/Parsing-unstructured-Text-in-Camel-tp5790513p5790670.html
> Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parsing unstructured Text in Camel

Posted by souciance <so...@gmail.com>.
Is UIMA useful as a tool for processing basic CSV files as well as
complicated EDIFACT data? Or is it meant to be applied on other types of
unstructured data?

On Sat, Nov 26, 2016 at 1:03 PM, Anton-2 [via Camel] <
ml-node+s465427n5790669h2@n5.nabble.com> wrote:

> On Sat, Nov 26, 2016 at 12:29 PM, souciance <
> [hidden email] <http:///user/SendEmail.jtp?type=node&node=5790669&i=0>>
> wrote:
>
> > How does UIMA work
> > with data that does not necessarily with a particular model at all
> times?
> >
>
> UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
> that powers IBM Watson.
> As to how it extracts knowledge from unstructured data, it uses A Common
> Analysis Structure(CAS), which is a way of defining Analysis Engines.
> Typically these are text based NLP process but can also be audio and
> video.
>
> UIMA is a big topic. It has a strong and helpful community behind it.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-
> in-Camel-tp5790513p5790669.html
> To start a new topic under Camel - Users, email
> ml-node+s465427n465428h31@n5.nabble.com
> To unsubscribe from Camel - Users, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpbC5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790670.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parsing unstructured Text in Camel

Posted by Anton <ku...@gmail.com>.
On Sat, Nov 26, 2016 at 12:29 PM, souciance <
souciance.eqdam.rashti@gmail.com> wrote:

> How does UIMA work
> with data that does not necessarily with a particular model at all times?
>

UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
that powers IBM Watson.
As to how it extracts knowledge from unstructured data, it uses A Common
Analysis Structure(CAS), which is a way of defining Analysis Engines.
Typically these are text based NLP process but can also be audio and video.

UIMA is a big topic. It has a strong and helpful community behind it.

Re: Parsing unstructured Text in Camel

Posted by souciance <so...@gmail.com>.
UIMA seems quit similar conceptually to how the IIB mapping framework works
where you provide  a message model and work with that. How does UIMA work
with data that does not necessarily with a particular model at all times?

On Sat, Nov 26, 2016 at 7:22 AM, Anton-2 [via Camel] <
ml-node+s465427n5790652h75@n5.nabble.com> wrote:

> On Sat, Nov 26, 2016 at 12:23 AM, souciance <
> [hidden email] <http:///user/SendEmail.jtp?type=node&node=5790652&i=0>>
> wrote:
>
> >
> > Actually there is no tool that can handle any unstructured data
>
>
> That is not correct.
>
> https://uima.apache.org/doc-uima-why.html
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-
> in-Camel-tp5790513p5790652.html
> To start a new topic under Camel - Users, email
> ml-node+s465427n465428h31@n5.nabble.com
> To unsubscribe from Camel - Users, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpbC5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790668.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parsing unstructured Text in Camel

Posted by Anton <ku...@gmail.com>.
On Sat, Nov 26, 2016 at 12:23 AM, souciance <
souciance.eqdam.rashti@gmail.com> wrote:

>
> Actually there is no tool that can handle any unstructured data


That is not correct.

https://uima.apache.org/doc-uima-why.html

Re: Parsing unstructured Text in Camel

Posted by souciance <so...@gmail.com>.
Hi,

Actually there is no tool that can handle any unstructured data unless you
want to put everything in some some sort of nosql database och run queries
against it. Otherwise even integration software with advanced mapping
capabilities like IBM's IIB 10 requires you to describe the structure of
the data if it is not XML or JSON. Off course this applies only if you are
interested in the content. If you just want to transfer data then there is
no need. The big problem is that the open source world is lacking this kind
of advanced mapping capability. In IIB 10 you can pretty much describe any
kind of textual format and as long as you can describe it it will parse it.

Best
Souciance

On Fri, Nov 25, 2016 at 9:26 PM, Anton-2 [via Camel] <
ml-node+s465427n5790649h82@n5.nabble.com> wrote:

> It might be over-kill, but you could use Apache UIMA -
> https://uima.apache.org/d/uima-as-current/apidocs/org/apache/uima/camel/
> UimaAsEndpoint.html
>
> On Fri, Nov 25, 2016 at 11:33 AM, Jan Matèrne (jhm) <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5790649&i=0>>
> wrote:
>
> > I dont think that there is such a component.
> >
> > Unless you have validated the input you can't rely on a structure.
> > So I would write a simple bean which parses the text. E.g. using the
> regexp
> > you mentioned.
> >
> >
> > Jan
> >
> > > is there any component which helps me to parse plain text? Not JSON,
> > > XML or CSV.
> > >
> > > My use case is that I receive an E-Mail with multiple keywords in the
> > > Subject as well as in the body.
> > >
> > > I could not find any component that would help me to parse certain
> > > values from my multiline plaintext.
> > >
> > > I need something like freemarker but the other way around. Getting the
> > > fulltext and parsing certain values from this text (for example with
> > > regex).
> >
> >
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-
> in-Camel-tp5790513p5790649.html
> To start a new topic under Camel - Users, email
> ml-node+s465427n465428h31@n5.nabble.com
> To unsubscribe from Camel - Users, click here
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aUBnbWFpbC5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> .
> NAML
> <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790650.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Re: Parsing unstructured Text in Camel

Posted by Anton <ku...@gmail.com>.
It might be over-kill, but you could use Apache UIMA -
https://uima.apache.org/d/uima-as-current/apidocs/org/apache/uima/camel/UimaAsEndpoint.html

On Fri, Nov 25, 2016 at 11:33 AM, Jan Matèrne (jhm) <ap...@materne.de>
wrote:

> I dont think that there is such a component.
>
> Unless you have validated the input you can't rely on a structure.
> So I would write a simple bean which parses the text. E.g. using the regexp
> you mentioned.
>
>
> Jan
>
> > is there any component which helps me to parse plain text? Not JSON,
> > XML or CSV.
> >
> > My use case is that I receive an E-Mail with multiple keywords in the
> > Subject as well as in the body.
> >
> > I could not find any component that would help me to parse certain
> > values from my multiline plaintext.
> >
> > I need something like freemarker but the other way around. Getting the
> > fulltext and parsing certain values from this text (for example with
> > regex).
>
>

AW: Parsing unstructured Text in Camel

Posted by "Jan Matèrne (jhm)" <ap...@materne.de>.
I dont think that there is such a component.

Unless you have validated the input you can't rely on a structure.
So I would write a simple bean which parses the text. E.g. using the regexp
you mentioned.


Jan

> is there any component which helps me to parse plain text? Not JSON,
> XML or CSV.
> 
> My use case is that I receive an E-Mail with multiple keywords in the
> Subject as well as in the body.
> 
> I could not find any component that would help me to parse certain
> values from my multiline plaintext.
> 
> I need something like freemarker but the other way around. Getting the
> fulltext and parsing certain values from this text (for example with
> regex).