You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by John Green <jo...@gmail.com> on 2014/10/16 02:58:06 UTC

YTEX semantic similarity concept graph questions

Hope this finds everyone well.

It is not immediately clear to me why

        select distinct cui1, cui2
        from umls.MRREL
        where sab in ('SNOMEDCT')
        and rel in ('PAR')
        order by cui1, cui2

would only be selecting the relationship (REL) of PAR. Im not sure the
selection criteria. This is honestly probably directed mostly at Vijay, but
anyone else with experience in this domain would be a welcome voice. In the
paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does this
have to do with the "flattening" or "orphaning" that UMLS does to the
vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was this a
computational (speed/memory) consideration, or a functional one that my
lack of familiarity to the domain is keeping me from seeing.

Im posting this fairly specific question to the Dev because it directly
relates to building YTEX concept graphs, which is a functionality of our
distro here.

Best!
JG

Re: YTEX semantic similarity concept graph questions

Posted by John Green <jo...@gmail.com>.

That does, thank you, as always.

One other question: Your docs say it should take around an hour and a half
at 8g of ram for the umls... my times are turning out significantly lower
(3-5 minutes)... the *.gz output seems to be on an order of magnitude with
the included compressed concept graphs and queries seem to run OK, but it
makes me a little nervous that it is processing it that fast. Should I be
worried?

Thanks,
JG

On Thu, Oct 16, 2014 at 6:29 AM, vijay garla <vn...@gmail.com> wrote:

> I don't know what the difference between PAR/CHD (parent/child) and RB/RN
> (broader/narrower) is supposed to be.  some umls source vocabularies use
> PAR/CHD only/predominantly (e.g. SNOMED-CT), others use RB/RN (e.g.
> RXNORM).  You can use and experiment with whatever relationships you want
> (I think there might be part of/contains relationships too).
>
> the concept graph is a directed acyclic graph, and the query should return
> parent-child edges (or maybe the other way around, not sure).  If your
> query uses e.g. rel in ('PAR', 'CHD'), you will return edges going both
> directions.  This shouldn't cause any problems, as we discard edges that
> induce cycles, but it will create a bunch of overhead for no gain.
>
> If you look at other concept graph configs, e.g.
>
> https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml
> ,
> you will see that we use both PAR & RB relationships.
>
> HTH,
>
> VJ
>
>
>
>
>
> On Thu, Oct 16, 2014 at 2:58 AM, John Green <jo...@gmail.com>
> wrote:
>
> > Hope this finds everyone well.
> >
> > It is not immediately clear to me why
> >
> >         select distinct cui1, cui2
> >         from umls.MRREL
> >         where sab in ('SNOMEDCT')
> >         and rel in ('PAR')
> >         order by cui1, cui2
> >
> > would only be selecting the relationship (REL) of PAR. Im not sure the
> > selection criteria. This is honestly probably directed mostly at Vijay,
> but
> > anyone else with experience in this domain would be a welcome voice. In
> the
> > paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does
> this
> > have to do with the "flattening" or "orphaning" that UMLS does to the
> > vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was
> this a
> > computational (speed/memory) consideration, or a functional one that my
> > lack of familiarity to the domain is keeping me from seeing.
> >
> > Im posting this fairly specific question to the Dev because it directly
> > relates to building YTEX concept graphs, which is a functionality of our
> > distro here.
> >
> > Best!
> > JG
> >
>

Re: YTEX semantic similarity concept graph questions

Posted by vijay garla <vn...@gmail.com>.

I don't know what the difference between PAR/CHD (parent/child) and RB/RN
(broader/narrower) is supposed to be.  some umls source vocabularies use
PAR/CHD only/predominantly (e.g. SNOMED-CT), others use RB/RN (e.g.
RXNORM).  You can use and experiment with whatever relationships you want
(I think there might be part of/contains relationships too).

the concept graph is a directed acyclic graph, and the query should return
parent-child edges (or maybe the other way around, not sure).  If your
query uses e.g. rel in ('PAR', 'CHD'), you will return edges going both
directions.  This shouldn't cause any problems, as we discard edges that
induce cycles, but it will create a bunch of overhead for no gain.

If you look at other concept graph configs, e.g.
https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml,
you will see that we use both PAR & RB relationships.

HTH,

VJ

On Thu, Oct 16, 2014 at 2:58 AM, John Green <jo...@gmail.com>
wrote:

> Hope this finds everyone well.
>
> It is not immediately clear to me why
>
>         select distinct cui1, cui2
>         from umls.MRREL
>         where sab in ('SNOMEDCT')
>         and rel in ('PAR')
>         order by cui1, cui2
>
> would only be selecting the relationship (REL) of PAR. Im not sure the
> selection criteria. This is honestly probably directed mostly at Vijay, but
> anyone else with experience in this domain would be a welcome voice. In the
> paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does this
> have to do with the "flattening" or "orphaning" that UMLS does to the
> vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was this a
> computational (speed/memory) consideration, or a functional one that my
> lack of familiarity to the domain is keeping me from seeing.
>
> Im posting this fairly specific question to the Dev because it directly
> relates to building YTEX concept graphs, which is a functionality of our
> distro here.
>
> Best!
> JG
>