You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by "Hari, Sekhar" <se...@cgi.com> on 2019/05/20 15:01:52 UTC

Drugs' Primary Compound ID

Hi -

My question is a little different, and I'm OK if there is a way to solve this puzzle either through cTAKES, OR, through UMLS lookups, OR, through lookups in other published databases. At this time, I really don't know if this can be solved through Machine Learning algorithms.

Problem:
I've been asked to find out if the following is possible:
"Given a pharma regulatory document (say a searchable PDF document) related to drug(s), predict the corresponding 'Primary Compound ID'.

The format of a primary compound ID could be - <<pharma company name>>-<<numeric digits>>-<<three or two letters abbreviation>>.

To make the scenario easier, I'll consider the following case:
Primary Compound ID: CNTO148.
This is a deviation to the above format. If we split this ID, it would represent CNTO as the pharma company (Centocor Biotech, Inc). I don't know what the number 148 represent.

However, CNTO148 is the pre-marketing name given during clinical trial phases. It's actual trademark is "SIMPONI" and the International Non-proprietary name (INN) is "Golimumab". The condition mentioned for this drug is 'Rheumatoid Arthritis'

Question:
Using cTAKES if I could identify the product as "SIMPONI" and the indication as 'Rheumatoid Arthritis', is there a way to identify or derive its 'Primary Compound ID' - in this case CNTO148 - (or sometimes called as 'Controlling Product') through some mechanism?

My analysis:
If I query the ClinicalTrials.gov data using the drug name, I'm able to find the corresponding 'Primary Compound ID' that was used during clinical study. But this ID is not available for all drug products from ClinicalTrials.gov database. I'm looking at a consistent way to derive the 'Primary Compound ID' if these IDs are registered anywhere.

Other questions:
What meaning does the abbreviations used in 'Primary Compound ID' contain (three or two letters abbreviation in the format defined above)?
Some example abbreviations (there are many more):

*         AAB

*         AC

*         AN

*         AAA

*         AAC

*         AMK

*         ZBR

*         AER

*         AEN

Is there a vocabulary where these are listed that I could study?

Thanks
Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific Solutions Delivery Center
+91 814 7027 779 (C)

RE: Drugs' Primary Compound ID

Posted by "Hari, Sekhar" <se...@cgi.com>.
Yes, RXNORM can provide much or maybe all of that you highlighted. But I don't know if RXNORM also store the pre-marketing name or ID (like CNTO148 etc.) that can be queried through an API. Hope you can provide some insight on this. 

Thanks
Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific Solutions Delivery Center
+91 814 7027 779 (C)

-----Original Message-----
From: Peter Abramowitsch <pa...@gmail.com> 
Sent: Tuesday, May 21, 2019 12:41 AM
To: dev@ctakes.apache.org
Subject: Re: Drugs' Primary Compound ID

I used to work for a division of Hearst that also owns the company First Databank.  They have an electronic compendium of information about every drug where you can find out its generic and proprietary forms, its primary ingredient(s), its therapeutic class, forms, dosages, side effects, disease indications etc etc.  Much of this you can now get from RXNorm, I think.
The subscription fee for FDB is pretty high but the information is very well curated.

Peter

On Mon, May 20, 2019 at 5:02 PM Hari, Sekhar <se...@cgi.com> wrote:

> Hi -
>
> My question is a little different, and I'm OK if there is a way to 
> solve this puzzle either through cTAKES, OR, through UMLS lookups, OR, 
> through lookups in other published databases. At this time, I really 
> don't know if this can be solved through Machine Learning algorithms.
>
> Problem:
> I've been asked to find out if the following is possible:
> "Given a pharma regulatory document (say a searchable PDF document) 
> related to drug(s), predict the corresponding 'Primary Compound ID'.
>
> The format of a primary compound ID could be - <<pharma company
> name>>-<<numeric digits>>-<<three or two letters abbreviation>>.
>
> To make the scenario easier, I'll consider the following case:
> Primary Compound ID: CNTO148.
> This is a deviation to the above format. If we split this ID, it would 
> represent CNTO as the pharma company (Centocor Biotech, Inc). I don't 
> know what the number 148 represent.
>
> However, CNTO148 is the pre-marketing name given during clinical trial 
> phases. It's actual trademark is "SIMPONI" and the International 
> Non-proprietary name (INN) is "Golimumab". The condition mentioned for 
> this drug is 'Rheumatoid Arthritis'
>
> Question:
> Using cTAKES if I could identify the product as "SIMPONI" and the 
> indication as 'Rheumatoid Arthritis', is there a way to identify or 
> derive its 'Primary Compound ID' - in this case CNTO148 - (or 
> sometimes called as 'Controlling Product') through some mechanism?
>
> My analysis:
> If I query the ClinicalTrials.gov data using the drug name, I'm able 
> to find the corresponding 'Primary Compound ID' that was used during 
> clinical study. But this ID is not available for all drug products 
> from ClinicalTrials.gov database. I'm looking at a consistent way to 
> derive the 'Primary Compound ID' if these IDs are registered anywhere.
>
> Other questions:
> What meaning does the abbreviations used in 'Primary Compound ID' 
> contain (three or two letters abbreviation in the format defined above)?
> Some example abbreviations (there are many more):
>
> *         AAB
>
> *         AC
>
> *         AN
>
> *         AAA
>
> *         AAC
>
> *         AMK
>
> *         ZBR
>
> *         AER
>
> *         AEN
>
> Is there a vocabulary where these are listed that I could study?
>
> Thanks
> Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific 
> Solutions Delivery Center
> +91 814 7027 779 (C)
>

Re: Drugs' Primary Compound ID

Posted by Phil Shinn <ph...@gmail.com>.
This may be of use:
https://www.nlm.nih.gov/research/umls/rxnorm/overview.html

On Mon, May 20, 2019 at 3:19 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> I used to work for a division of Hearst that also owns the company First
> Databank.  They have an electronic compendium of information about every
> drug where you can find out its generic and proprietary forms, its primary
> ingredient(s), its therapeutic class, forms, dosages, side effects, disease
> indications etc etc.  Much of this you can now get from RXNorm, I think.
> The subscription fee for FDB is pretty high but the information is very
> well curated.
>
> Peter
>
> On Mon, May 20, 2019 at 5:02 PM Hari, Sekhar <se...@cgi.com> wrote:
>
> > Hi -
> >
> > My question is a little different, and I'm OK if there is a way to solve
> > this puzzle either through cTAKES, OR, through UMLS lookups, OR, through
> > lookups in other published databases. At this time, I really don't know
> if
> > this can be solved through Machine Learning algorithms.
> >
> > Problem:
> > I've been asked to find out if the following is possible:
> > "Given a pharma regulatory document (say a searchable PDF document)
> > related to drug(s), predict the corresponding 'Primary Compound ID'.
> >
> > The format of a primary compound ID could be - <<pharma company
> > name>>-<<numeric digits>>-<<three or two letters abbreviation>>.
> >
> > To make the scenario easier, I'll consider the following case:
> > Primary Compound ID: CNTO148.
> > This is a deviation to the above format. If we split this ID, it would
> > represent CNTO as the pharma company (Centocor Biotech, Inc). I don't
> know
> > what the number 148 represent.
> >
> > However, CNTO148 is the pre-marketing name given during clinical trial
> > phases. It's actual trademark is "SIMPONI" and the International
> > Non-proprietary name (INN) is "Golimumab". The condition mentioned for
> this
> > drug is 'Rheumatoid Arthritis'
> >
> > Question:
> > Using cTAKES if I could identify the product as "SIMPONI" and the
> > indication as 'Rheumatoid Arthritis', is there a way to identify or
> derive
> > its 'Primary Compound ID' - in this case CNTO148 - (or sometimes called
> as
> > 'Controlling Product') through some mechanism?
> >
> > My analysis:
> > If I query the ClinicalTrials.gov data using the drug name, I'm able to
> > find the corresponding 'Primary Compound ID' that was used during
> clinical
> > study. But this ID is not available for all drug products from
> > ClinicalTrials.gov database. I'm looking at a consistent way to derive
> the
> > 'Primary Compound ID' if these IDs are registered anywhere.
> >
> > Other questions:
> > What meaning does the abbreviations used in 'Primary Compound ID' contain
> > (three or two letters abbreviation in the format defined above)?
> > Some example abbreviations (there are many more):
> >
> > *         AAB
> >
> > *         AC
> >
> > *         AN
> >
> > *         AAA
> >
> > *         AAC
> >
> > *         AMK
> >
> > *         ZBR
> >
> > *         AER
> >
> > *         AEN
> >
> > Is there a vocabulary where these are listed that I could study?
> >
> > Thanks
> > Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific
> > Solutions Delivery Center
> > +91 814 7027 779 (C)
> >
>

Re: Drugs' Primary Compound ID

Posted by Peter Abramowitsch <pa...@gmail.com>.
I used to work for a division of Hearst that also owns the company First
Databank.  They have an electronic compendium of information about every
drug where you can find out its generic and proprietary forms, its primary
ingredient(s), its therapeutic class, forms, dosages, side effects, disease
indications etc etc.  Much of this you can now get from RXNorm, I think.
The subscription fee for FDB is pretty high but the information is very
well curated.

Peter

On Mon, May 20, 2019 at 5:02 PM Hari, Sekhar <se...@cgi.com> wrote:

> Hi -
>
> My question is a little different, and I'm OK if there is a way to solve
> this puzzle either through cTAKES, OR, through UMLS lookups, OR, through
> lookups in other published databases. At this time, I really don't know if
> this can be solved through Machine Learning algorithms.
>
> Problem:
> I've been asked to find out if the following is possible:
> "Given a pharma regulatory document (say a searchable PDF document)
> related to drug(s), predict the corresponding 'Primary Compound ID'.
>
> The format of a primary compound ID could be - <<pharma company
> name>>-<<numeric digits>>-<<three or two letters abbreviation>>.
>
> To make the scenario easier, I'll consider the following case:
> Primary Compound ID: CNTO148.
> This is a deviation to the above format. If we split this ID, it would
> represent CNTO as the pharma company (Centocor Biotech, Inc). I don't know
> what the number 148 represent.
>
> However, CNTO148 is the pre-marketing name given during clinical trial
> phases. It's actual trademark is "SIMPONI" and the International
> Non-proprietary name (INN) is "Golimumab". The condition mentioned for this
> drug is 'Rheumatoid Arthritis'
>
> Question:
> Using cTAKES if I could identify the product as "SIMPONI" and the
> indication as 'Rheumatoid Arthritis', is there a way to identify or derive
> its 'Primary Compound ID' - in this case CNTO148 - (or sometimes called as
> 'Controlling Product') through some mechanism?
>
> My analysis:
> If I query the ClinicalTrials.gov data using the drug name, I'm able to
> find the corresponding 'Primary Compound ID' that was used during clinical
> study. But this ID is not available for all drug products from
> ClinicalTrials.gov database. I'm looking at a consistent way to derive the
> 'Primary Compound ID' if these IDs are registered anywhere.
>
> Other questions:
> What meaning does the abbreviations used in 'Primary Compound ID' contain
> (three or two letters abbreviation in the format defined above)?
> Some example abbreviations (there are many more):
>
> *         AAB
>
> *         AC
>
> *         AN
>
> *         AAA
>
> *         AAC
>
> *         AMK
>
> *         ZBR
>
> *         AER
>
> *         AEN
>
> Is there a vocabulary where these are listed that I could study?
>
> Thanks
> Sekhar Hari | AI Program Lead | Health Sciences R&D | Asia Pacific
> Solutions Delivery Center
> +91 814 7027 779 (C)
>