You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by "Kevin B. Cohen" <ke...@gmail.com> on 2014/03/14 17:40:02 UTC

Parser for output of cTAKES coreference module to i2b2 scoring code format?

Hi,

I'm sitting here writing a little parser to convert the output of the
cTAKES coreference resolution module into the format of the scoring code
from the i2b2 coreference resolution task, and it occurred to me that
multiple people have probably done that already.  Would anyone be willing
to share code for that?

Kev

-- 
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen



 Call
Send SMS
Add to Skype
You'll need Skype CreditFree via Skype

Re: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Posted by "Kevin B. Cohen" <ke...@gmail.com>.

Thanks a lot, Tim!

Kev


On Fri, Mar 14, 2014 at 12:22 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

>  And... there are some helper methods:
>
>
>     private HashMap<Integer, Integer> mapTokensToIndices(JCas jc) {
>         HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();
>
>         int i = 1;
>         FSIterator iter = jc.getAnnotationIndex(BaseToken.type).iterator();
>         while(iter.hasNext()){
>             BaseToken tok = (BaseToken) iter.next();
>             map.put(tok.getBegin(), i);
>             map.put(tok.getEnd(), i);
>             i++;
>         }
>         return map;
>     }
>
>
>     private String getSpanString(Markable mark,
>             HashMap<Integer, Integer> tok2ind) {
>         Integer tok1 = tok2ind.get(mark.getBegin());
>         Integer tok2 = tok2ind.get(mark.getEnd());
>         return "1:" + tok1 + " 1:" + tok2;
>     }
>
>
> Hopefully that is it!
>
>  ------------------------------
> *From:* Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
> *Sent:* Friday, March 14, 2014 2:19 PM
>
> *To:* user@ctakes.apache.org
> *Subject:* RE: Parser for output of cTAKES coreference module to i2b2
> scoring code format?
>
>   Kev -- I think I found it. It looks like it was never checked in, as it
> was part of a separate eval module that used gpl'd code and couldn't be
> released with ctakes. It was also commented out for some reason. I'll just
> paste the relevant section here, and hopefully you can use it or at least
> save you some time. If you write a UIMA consumer or other tool that you
> think would help others and you are willing to share we would gladly
> incorporate it into ctakes.
>
> //                      if(i2b2){
> //                              // write system chain file:
> //                              try {
> //                                      PrintWriter writer = new
> PrintWriter(i2b2Path + "/sysChain/" + docName + ".chain");
> //                                      HashMap<Integer,Integer> tok2ind =
> mapTokensToIndices(jc);
> //                                      FSIterator iter =
> jc.getJFSIndexRepository().getAllIndexedFS(CoreferenceChain.type);
> //                                      while(iter.hasNext()){
> //                                              CoreferenceChain chain =
> (CoreferenceChain) iter.next();
> //                                              FSList members =
> chain.getMembers();
> //                                              while(members instanceof
> NonEmptyFSList){
> //                                                      NonEmptyFSList
> node = (NonEmptyFSList) members;
> //                                                      Markable mark =
> (Markable) node.getHead();
> //
> writer.print("c=\"");
> //
> writer.print(mark.getCoveredText());
> //                                                      writer.print("\"
> ");
> //
> writer.print(getSpanString(mark, tok2ind));
> //                                                      writer.print("||");
> //                                                      members =
> node.getTail();
> //                                              }
> //                                              // write the type
> information
> //                                              writer.println("t=\"coref
> problem\"");
> //                                      }
> //                                      writer.close();
> //                              } catch (FileNotFoundException e) {
> //                                      // TODO Auto-generated catch block
> //                                      e.printStackTrace();
> //                              }
> //
> //                      }
>
>
> Tim
>
>
>  ------------------------------
> *From:* Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
> *Sent:* Friday, March 14, 2014 1:52 PM
> *To:* user@ctakes.apache.org
> *Subject:* RE: Parser for output of cTAKES coreference module to i2b2
> scoring code format?
>
>   Kevin, I think I did write something like that at some point. I just
> spent the last 10 minutes looking for it and can't find it. I will poke
> around a bit more and let you know if I find anything.
> Tim
>
>  ------------------------------
> *From:* Kevin B. Cohen [kevin.cohen@gmail.com]
> *Sent:* Friday, March 14, 2014 12:41 PM
> *To:* user@ctakes.apache.org; Natural language processing for biology
> *Subject:* Parser for output of cTAKES coreference module to i2b2 scoring
> code format?
>
>   Hi,
>
>  I'm sitting here writing a little parser to convert the output of the
> cTAKES coreference resolution module into the format of the scoring code
> from the i2b2 coreference resolution task, and it occurred to me that
> multiple people have probably done that already.  Would anyone be willing
> to share code for that?
>
> Kev
>
> --
> Kevin Bretonnel Cohen, PhD
> Biomedical Text Mining Group Lead, Computational Bioscience Program,
> U. Colorado School of Medicine
> 303-916-2417303-916-2417
> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>
>
>
>  Call
> Send SMS
> Add to Skype
> You'll need Skype CreditFree via Skype
>



-- 
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen

RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

And... there are some helper methods:


    private HashMap<Integer, Integer> mapTokensToIndices(JCas jc) {
        HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();

        int i = 1;
        FSIterator iter = jc.getAnnotationIndex(BaseToken.type).iterator();
        while(iter.hasNext()){
            BaseToken tok = (BaseToken) iter.next();
            map.put(tok.getBegin(), i);
            map.put(tok.getEnd(), i);
            i++;
        }
        return map;
    }


    private String getSpanString(Markable mark,
            HashMap<Integer, Integer> tok2ind) {
        Integer tok1 = tok2ind.get(mark.getBegin());
        Integer tok2 = tok2ind.get(mark.getEnd());
        return "1:" + tok1 + " 1:" + tok2;
    }


Hopefully that is it!

________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Friday, March 14, 2014 2:19 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Kev -- I think I found it. It looks like it was never checked in, as it was part of a separate eval module that used gpl'd code and couldn't be released with ctakes. It was also commented out for some reason. I'll just paste the relevant section here, and hopefully you can use it or at least save you some time. If you write a UIMA consumer or other tool that you think would help others and you are willing to share we would gladly incorporate it into ctakes.

//                      if(i2b2){
//                              // write system chain file:
//                              try {
//                                      PrintWriter writer = new PrintWriter(i2b2Path + "/sysChain/" + docName + ".chain");
//                                      HashMap<Integer,Integer> tok2ind = mapTokensToIndices(jc);
//                                      FSIterator iter = jc.getJFSIndexRepository().getAllIndexedFS(CoreferenceChain.type);
//                                      while(iter.hasNext()){
//                                              CoreferenceChain chain = (CoreferenceChain) iter.next();
//                                              FSList members = chain.getMembers();
//                                              while(members instanceof NonEmptyFSList){
//                                                      NonEmptyFSList node = (NonEmptyFSList) members;
//                                                      Markable mark = (Markable) node.getHead();
//                                                      writer.print("c=\"");
//                                                      writer.print(mark.getCoveredText());
//                                                      writer.print("\" ");
//                                                      writer.print(getSpanString(mark, tok2ind));
//                                                      writer.print("||");
//                                                      members = node.getTail();
//                                              }
//                                              // write the type information
//                                              writer.println("t=\"coref problem\"");
//                                      }
//                                      writer.close();
//                              } catch (FileNotFoundException e) {
//                                      // TODO Auto-generated catch block
//                                      e.printStackTrace();
//                              }
//
//                      }


Tim


________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Friday, March 14, 2014 1:52 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Kevin, I think I did write something like that at some point. I just spent the last 10 minutes looking for it and can't find it. I will poke around a bit more and let you know if I find anything.
Tim

________________________________
From: Kevin B. Cohen [kevin.cohen@gmail.com]
Sent: Friday, March 14, 2014 12:41 PM
To: user@ctakes.apache.org; Natural language processing for biology
Subject: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Hi,

I'm sitting here writing a little parser to convert the output of the cTAKES coreference resolution module into the format of the scoring code from the i2b2 coreference resolution task, and it occurred to me that multiple people have probably done that already.  Would anyone be willing to share code for that?

Kev

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen



Call
Send SMS
Add to Skype
You'll need Skype CreditFree via Skype

RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Kev -- I think I found it. It looks like it was never checked in, as it was part of a separate eval module that used gpl'd code and couldn't be released with ctakes. It was also commented out for some reason. I'll just paste the relevant section here, and hopefully you can use it or at least save you some time. If you write a UIMA consumer or other tool that you think would help others and you are willing to share we would gladly incorporate it into ctakes.

//                      if(i2b2){
//                              // write system chain file:
//                              try {
//                                      PrintWriter writer = new PrintWriter(i2b2Path + "/sysChain/" + docName + ".chain");
//                                      HashMap<Integer,Integer> tok2ind = mapTokensToIndices(jc);
//                                      FSIterator iter = jc.getJFSIndexRepository().getAllIndexedFS(CoreferenceChain.type);
//                                      while(iter.hasNext()){
//                                              CoreferenceChain chain = (CoreferenceChain) iter.next();
//                                              FSList members = chain.getMembers();
//                                              while(members instanceof NonEmptyFSList){
//                                                      NonEmptyFSList node = (NonEmptyFSList) members;
//                                                      Markable mark = (Markable) node.getHead();
//                                                      writer.print("c=\"");
//                                                      writer.print(mark.getCoveredText());
//                                                      writer.print("\" ");
//                                                      writer.print(getSpanString(mark, tok2ind));
//                                                      writer.print("||");
//                                                      members = node.getTail();
//                                              }
//                                              // write the type information
//                                              writer.println("t=\"coref problem\"");
//                                      }
//                                      writer.close();
//                              } catch (FileNotFoundException e) {
//                                      // TODO Auto-generated catch block
//                                      e.printStackTrace();
//                              }
//
//                      }


Tim


________________________________
From: Miller, Timothy [Timothy.Miller@childrens.harvard.edu]
Sent: Friday, March 14, 2014 1:52 PM
To: user@ctakes.apache.org
Subject: RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Kevin, I think I did write something like that at some point. I just spent the last 10 minutes looking for it and can't find it. I will poke around a bit more and let you know if I find anything.
Tim

________________________________
From: Kevin B. Cohen [kevin.cohen@gmail.com]
Sent: Friday, March 14, 2014 12:41 PM
To: user@ctakes.apache.org; Natural language processing for biology
Subject: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Hi,

I'm sitting here writing a little parser to convert the output of the cTAKES coreference resolution module into the format of the scoring code from the i2b2 coreference resolution task, and it occurred to me that multiple people have probably done that already.  Would anyone be willing to share code for that?

Kev

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen



Call
Send SMS
Add to Skype
You'll need Skype CreditFree via Skype

RE: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Kevin, I think I did write something like that at some point. I just spent the last 10 minutes looking for it and can't find it. I will poke around a bit more and let you know if I find anything.
Tim

________________________________
From: Kevin B. Cohen [kevin.cohen@gmail.com]
Sent: Friday, March 14, 2014 12:41 PM
To: user@ctakes.apache.org; Natural language processing for biology
Subject: Parser for output of cTAKES coreference module to i2b2 scoring code format?

Hi,

I'm sitting here writing a little parser to convert the output of the cTAKES coreference resolution module into the format of the scoring code from the i2b2 coreference resolution task, and it occurred to me that multiple people have probably done that already.  Would anyone be willing to share code for that?

Kev

--
Kevin Bretonnel Cohen, PhD
Biomedical Text Mining Group Lead, Computational Bioscience Program,
U. Colorado School of Medicine
303-916-2417303-916-2417
http://compbio.ucdenver.edu/Hunter_lab/Cohen

Call
Send SMS
Add to Skype
You'll need Skype CreditFree via Skype