You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by samir chabou <sa...@yahoo.com> on 2013/09/30 16:17:23 UTC

sentence number in WordToken

Hi Pei,

I though
this may be have some use …
 
Because I
need to know if two or more words tokens belong to the same sentence; and
since WordToken does not define the feature sentence number. I added it to the
TypeSystem. These are the steps:
 
1)      I added the sentence number
features for the type BaseToken in TypeSystem.xml file (I choose the supper
class in order that the feature be propagated to all subclasses
(wordToken,SymboleToken,NumToken …)
 
2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set the new feature
(BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
     
bta.setSentenceNumber(sentence.getSentenceNumber());
      bta.addToIndexes();
 
3)      Generate the JCASGen in the tab de TypeSystem of the
aggregate
 
4)      Add the feature in the source
tab of the aggregate
 
Probably I
could have used as alternative:
List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
entities to be checked at the same time or if the entity1 is found in many
places, I have to add some if conditions to get sentence number 


Thanks
Samir

RE: sentence number in WordToken

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Agreed.  I think we can use the ASF process here and the dev mailing list seems to work nicely.
I.e. discuss changes here; call a [VOTE] only if there is contention.  
But from what I have seen, the community has been able to reach a consensus and play nicely so far.

--Pei

> -----Original Message-----
> From: Wu, Stephen T., Ph.D. [mailto:Wu.Stephen@mayo.edu]
> Sent: Wednesday, October 02, 2013 11:32 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hmm, we should probably have a process to vote up or down type system
> changes like this, since they affect everyone.
> In this case I'd agree with the others: don't add it.
> 
> stephen
> 
> 
> 
> On 9/30/13 11:21 AM, "samir chabou" <sa...@yahoo.com> wrote:
> 
> >thanks for the feed back it's a good point, I did it also with
> >selectCovering but as Richard mention I'll changed to indexCovering
> >since it's faster.
> >Samir
> >
> >
> >
> >
> >________________________________
> > From: "Chen, Pei" <Pe...@childrens.harvard.edu>
> >To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou
> ><sa...@yahoo.com>
> >Sent: Monday, September 30, 2013 12:10:45 PM
> >Subject: RE: sentence number in  WordToken
> >
> >
> >Samir,
> >I think Richard has a good point here.   What is the use to require
> >adding sentenceNumber() to BaseToken in the TypeSystem?
> >If it's only temporary, It may be a good idea to do it programmatically
> >with local variable rather than modifying the type system and having it
> >stored in the CAS...?
> >
> >Maybe something like:
> >boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
> >Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class); --Pei
> >
> >
> >> -----Original Message-----
> >> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> >> Sent: Monday, September 30, 2013 11:59 AM
> >> To: dev@ctakes.apache.org; samir chabou
> >> Subject: Re: sentence number in WordToken
> >>
> >> Hi,
> >>
> >> if you do many selectCovering calls, you may be faster using
> >>indexCovering  once and then using the lookup index it produces.
> >>
> >> IMHO type systems should not contain information that can easily be
> >> calculated at runtime (e.g. sentence number, token number, etc.).
> >>
> >> Mind, I have no say here ;) Just my personal opinion.
> >>
> >> -- Richard
> >>
> >> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> >>
> >> > Hi Pei,
> >> >
> >> > I though
> >> > this may be have some use ...
> >> >
> >> > Because I
> >> > need to know if two or more words tokens belong to the same
> >> > sentence; and since WordToken does not define the feature sentence
> >> > number. I added it to the TypeSystem. These are the steps:
> >> >
> >> > 1)      I added the sentence number
> >> > features for the type BaseToken in TypeSystem.xml file (I choose
> >> > the supper class in order that the feature be propagated to all
> >> > subclasses (wordToken,SymboleToken,NumToken ...)
> >> >
> >> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> >> annotateRange) I set the new feature
> >> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> >> shown below :
> >> >
> >> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >> >       bta.addToIndexes();
> >> >
> >> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> >> > aggregate
> >> >
> >> > 4)      Add the feature in the source
> >> > tab of the aggregate
> >> >
> >> > Probably I
> >> > could have used as alternative:
> >> > List<Sentence> list = JCasUtil.selectCovering(aJcas,
> >> > Sentence.class, entity1.getBegin(), entity1.getEnd()); the issue
> >> > with this is : if I have many entities to be checked at the same
> >> > time or if the entity1 is found in many places, I have to add some
> >> > if conditions to get sentence number
> >> >
> >> >
> >> > Thanks
> >> > Samir

Re: sentence number in WordToken

Posted by "Wu, Stephen T., Ph.D." <Wu...@mayo.edu>.

Hmm, we should probably have a process to vote up or down type system
changes like this, since they affect everyone.
In this case I'd agree with the others: don't add it.

stephen



On 9/30/13 11:21 AM, "samir chabou" <sa...@yahoo.com> wrote:

>thanks for the feed back it's a good point,
>I did it also with selectCovering but as Richard mention I'll changed to
>indexCovering since it's faster.
>Samir
>
>
>
>
>________________________________
> From: "Chen, Pei" <Pe...@childrens.harvard.edu>
>To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou
><sa...@yahoo.com>
>Sent: Monday, September 30, 2013 12:10:45 PM
>Subject: RE: sentence number in  WordToken
> 
>
>Samir,
>I think Richard has a good point here.   What is the use to require
>adding sentenceNumber() to BaseToken in the TypeSystem?
>If it's only temporary, It may be a good idea to do it programmatically
>with local variable rather than modifying the type system and having it
>stored in the CAS...?
>
>Maybe something like:
>boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
>Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
>--Pei
>
>
>> -----Original Message-----
>> From: Richard Eckart de Castilho [mailto:rec@apache.org]
>> Sent: Monday, September 30, 2013 11:59 AM
>> To: dev@ctakes.apache.org; samir chabou
>> Subject: Re: sentence number in WordToken
>> 
>> Hi,
>> 
>> if you do many selectCovering calls, you may be faster using
>>indexCovering
>> once and then using the lookup index it produces.
>> 
>> IMHO type systems should not contain information that can easily be
>> calculated at runtime (e.g. sentence number, token number, etc.).
>> 
>> Mind, I have no say here ;) Just my personal opinion.
>> 
>> -- Richard
>> 
>> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
>> 
>> > Hi Pei,
>> >
>> > I though
>> > this may be have some use ...
>> >
>> > Because I
>> > need to know if two or more words tokens belong to the same sentence;
>> > and since WordToken does not define the feature sentence number. I
>> > added it to the TypeSystem. These are the steps:
>> >
>> > 1)      I added the sentence number
>> > features for the type BaseToken in TypeSystem.xml file (I choose the
>> > supper class in order that the feature be propagated to all subclasses
>> > (wordToken,SymboleToken,NumToken ...)
>> >
>> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
>> annotateRange) I set the new feature
>> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
>> shown below :
>> >
>> > bta.setSentenceNumber(sentence.getSentenceNumber());
>> >       bta.addToIndexes();
>> >
>> > 3)      Generate the JCASGen in the tab de TypeSystem of the
>> > aggregate
>> >
>> > 4)      Add the feature in the source
>> > tab of the aggregate
>> >
>> > Probably I
>> > could have used as alternative:
>> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
>> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
>> > have many entities to be checked at the same time or if the entity1 is
>> > found in many places, I have to add some if conditions to get sentence
>> > number
>> >
>> >
>> > Thanks
>> > Samir

Re: sentence number in WordToken

Posted by samir chabou <sa...@yahoo.com>.

thanks for the feed back it's a good point,
I did it also with selectCovering but as Richard mention I'll changed to indexCovering since it's faster.
Samir




________________________________
 From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; samir chabou <sa...@yahoo.com> 
Sent: Monday, September 30, 2013 12:10:45 PM
Subject: RE: sentence number in  WordToken
 

Samir,
I think Richard has a good point here.   What is the use to require adding sentenceNumber() to BaseToken in the TypeSystem?
If it's only temporary, It may be a good idea to do it programmatically with local variable rather than modifying the type system and having it stored in the CAS...?

Maybe something like:
boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
--Pei


> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> Sent: Monday, September 30, 2013 11:59 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hi,
> 
> if you do many selectCovering calls, you may be faster using indexCovering
> once and then using the lookup index it produces.
> 
> IMHO type systems should not contain information that can easily be
> calculated at runtime (e.g. sentence number, token number, etc.).
> 
> Mind, I have no say here ;) Just my personal opinion.
> 
> -- Richard
> 
> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> 
> > Hi Pei,
> >
> > I though
> > this may be have some use ...
> >
> > Because I
> > need to know if two or more words tokens belong to the same sentence;
> > and since WordToken does not define the feature sentence number. I
> > added it to the TypeSystem. These are the steps:
> >
> > 1)      I added the sentence number
> > features for the type BaseToken in TypeSystem.xml file (I choose the
> > supper class in order that the feature be propagated to all subclasses
> > (wordToken,SymboleToken,NumToken ...)
> >
> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> annotateRange) I set the new feature
> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> shown below :
> >
> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >       bta.addToIndexes();
> >
> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> > aggregate
> >
> > 4)      Add the feature in the source
> > tab of the aggregate
> >
> > Probably I
> > could have used as alternative:
> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
> > have many entities to be checked at the same time or if the entity1 is
> > found in many places, I have to add some if conditions to get sentence
> > number
> >
> >
> > Thanks
> > Samir

RE: sentence number in WordToken

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.

Samir,
I think Richard has a good point here.   What is the use to require adding sentenceNumber() to BaseToken in the TypeSystem?
If it's only temporary, It may be a good idea to do it programmatically with local variable rather than modifying the type system and having it stored in the CAS...?

Maybe something like:
boolean a = JCasUtil.isCovered(JCas, BaseToken1, Sentence.class);
Boolean b = JCasUtil.isCovered(JCas, BaseToken2, Sentence.class);
--Pei


> -----Original Message-----
> From: Richard Eckart de Castilho [mailto:rec@apache.org]
> Sent: Monday, September 30, 2013 11:59 AM
> To: dev@ctakes.apache.org; samir chabou
> Subject: Re: sentence number in WordToken
> 
> Hi,
> 
> if you do many selectCovering calls, you may be faster using indexCovering
> once and then using the lookup index it produces.
> 
> IMHO type systems should not contain information that can easily be
> calculated at runtime (e.g. sentence number, token number, etc.).
> 
> Mind, I have no say here ;) Just my personal opinion.
> 
> -- Richard
> 
> On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:
> 
> > Hi Pei,
> >
> > I though
> > this may be have some use ...
> >
> > Because I
> > need to know if two or more words tokens belong to the same sentence;
> > and since WordToken does not define the feature sentence number. I
> > added it to the TypeSystem. These are the steps:
> >
> > 1)      I added the sentence number
> > features for the type BaseToken in TypeSystem.xml file (I choose the
> > supper class in order that the feature be propagated to all subclasses
> > (wordToken,SymboleToken,NumToken ...)
> >
> > 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode
> annotateRange) I set the new feature
> > (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as
> shown below :
> >
> > bta.setSentenceNumber(sentence.getSentenceNumber());
> >       bta.addToIndexes();
> >
> > 3)      Generate the JCASGen in the tab de TypeSystem of the
> > aggregate
> >
> > 4)      Add the feature in the source
> > tab of the aggregate
> >
> > Probably I
> > could have used as alternative:
> > List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> > entity1.getBegin(), entity1.getEnd()); the issue with this is : if I
> > have many entities to be checked at the same time or if the entity1 is
> > found in many places, I have to add some if conditions to get sentence
> > number
> >
> >
> > Thanks
> > Samir

Re: sentence number in WordToken

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hi,

if you do many selectCovering calls, you may be faster using
indexCovering once and then using the lookup index it produces.

IMHO type systems should not contain information that can easily
be calculated at runtime (e.g. sentence number, token number, etc.).

Mind, I have no say here ;) Just my personal opinion.

-- Richard

On 30.09.2013, at 16:17, samir chabou <sa...@yahoo.com> wrote:

> Hi Pei,
> 
> I though
> this may be have some use …
>  
> Because I
> need to know if two or more words tokens belong to the same sentence; and
> since WordToken does not define the feature sentence number. I added it to the
> TypeSystem. These are the steps:
>  
> 1)      I added the sentence number
> features for the type BaseToken in TypeSystem.xml file (I choose the supper
> class in order that the feature be propagated to all subclasses
> (wordToken,SymboleToken,NumToken …)
>  
> 2)      In ctakes-core I in TokenizerAnnotatorPTB.java (methode annotateRange) I set the new feature
> (BaseToken.sentenceNumber = sentence.getSentenceNumber()) as shown below :
>      
> bta.setSentenceNumber(sentence.getSentenceNumber());
>       bta.addToIndexes();
>  
> 3)      Generate the JCASGen in the tab de TypeSystem of the
> aggregate
>  
> 4)      Add the feature in the source
> tab of the aggregate
>  
> Probably I
> could have used as alternative:
> List<Sentence> list = JCasUtil.selectCovering(aJcas, Sentence.class,
> entity1.getBegin(), entity1.getEnd()); the issue with this is : if I have many
> entities to be checked at the same time or if the entity1 is found in many
> places, I have to add some if conditions to get sentence number 
> 
> 
> Thanks
> Samir