You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Nguyen, Thien" <Th...@va.gov> on 2011/10/28 18:27:25 UTC

Getting started

Hello,

I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.

OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.

Thanks,
Thien

Re: Getting started

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Great, Dave, I agree!

Cheers,
Chris

On Oct 28, 2011, at 11:23 AM, David Kale wrote:

> Thien,
> 
> Actually, a great place to start would be the attached paper, "An
> Informatics Architecture for the Virtual Pediatric Intensive Care
> Unit" from CBMS 2011, describing an effort to do data management at
> Children's Hospital LA in a way that can scale to multiple
> institutions, using OODT principles and components.  There are a
> number of VPICU team members on this list (from both CHLA and JPL) who
> would be happy to answer your questions, on the list or offline.
> 
> My perspective is the following: data integration and management in
> large medical institutions and organizations is HARD.  That is a
> simple fact, and any company or organization who claims to have an
> out-of-the-box solution is either deluded or lying.  There is some
> amount of DYI (do it yourself) that is unavoidable, and so -- in my
> opinion -- it's best to go with something that gives you a lot of
> interoperable but loosely coupled tools, components, and modules that
> save you bits of work here and there but are also easily modified or
> extended.  From that standpoint, OODT and the larger Apache ecosystem
> are quite suitable.
> 
> As for specifics, I'd recommend diving into the FileManager and
> WorkFlowManager components, though perhaps others would offer
> different advice.
> 
> Anyway, sounds like you're clearly heading in the same direction we
> are, so we'd be happy to chat more!
> 
> Dave
> 
> 
> 
> On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:
>> Hello,
>> 
>> I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.
>> 
>> OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.
>> 
>> Thanks,
>> Thien
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Getting started

Posted by David Kale <da...@cs.stanford.edu>.
And the list removed my attachment.  Will send directly.


On Fri, Oct 28, 2011 at 11:23 AM, David Kale <da...@cs.stanford.edu> wrote:
> Thien,
>
> Actually, a great place to start would be the attached paper, "An
> Informatics Architecture for the Virtual Pediatric Intensive Care
> Unit" from CBMS 2011, describing an effort to do data management at
> Children's Hospital LA in a way that can scale to multiple
> institutions, using OODT principles and components.  There are a
> number of VPICU team members on this list (from both CHLA and JPL) who
> would be happy to answer your questions, on the list or offline.
>
> My perspective is the following: data integration and management in
> large medical institutions and organizations is HARD.  That is a
> simple fact, and any company or organization who claims to have an
> out-of-the-box solution is either deluded or lying.  There is some
> amount of DYI (do it yourself) that is unavoidable, and so -- in my
> opinion -- it's best to go with something that gives you a lot of
> interoperable but loosely coupled tools, components, and modules that
> save you bits of work here and there but are also easily modified or
> extended.  From that standpoint, OODT and the larger Apache ecosystem
> are quite suitable.
>
> As for specifics, I'd recommend diving into the FileManager and
> WorkFlowManager components, though perhaps others would offer
> different advice.
>
> Anyway, sounds like you're clearly heading in the same direction we
> are, so we'd be happy to chat more!
>
> Dave
>
>
>
> On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:
>> Hello,
>>
>> I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.
>>
>> OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.
>>
>> Thanks,
>> Thien
>>
>

Re: Getting started

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Thanks Thien, really appreciate it!

Cheers,
Chris

On Oct 28, 2011, at 12:03 PM, Nguyen, Thien wrote:

> Thanks for the information. I'll check this out and figure out our next [baby] steps.
> 
> -----Original Message-----
> From: David Kale [mailto:davekale@cs.stanford.edu] 
> Sent: Friday, October 28, 2011 2:27 PM
> To: Nguyen, Thien
> Cc: Sheryl John; Chris Mattmann; Paul Vee
> Subject: Fwd: Getting started
> 
> ---------- Forwarded message ----------
> From: David Kale <da...@cs.stanford.edu>
> Date: Fri, Oct 28, 2011 at 11:23 AM
> Subject: Re: Getting started
> To: dev@oodt.apache.org
> 
> 
> Thien,
> 
> Actually, a great place to start would be the attached paper, "An Informatics Architecture for the Virtual Pediatric Intensive Care Unit" from CBMS 2011, describing an effort to do data management at Children's Hospital LA in a way that can scale to multiple institutions, using OODT principles and components.  There are a number of VPICU team members on this list (from both CHLA and JPL) who would be happy to answer your questions, on the list or offline.
> 
> My perspective is the following: data integration and management in large medical institutions and organizations is HARD.  That is a simple fact, and any company or organization who claims to have an out-of-the-box solution is either deluded or lying.  There is some amount of DYI (do it yourself) that is unavoidable, and so -- in my opinion -- it's best to go with something that gives you a lot of interoperable but loosely coupled tools, components, and modules that save you bits of work here and there but are also easily modified or extended.  From that standpoint, OODT and the larger Apache ecosystem are quite suitable.
> 
> As for specifics, I'd recommend diving into the FileManager and WorkFlowManager components, though perhaps others would offer different advice.
> 
> Anyway, sounds like you're clearly heading in the same direction we are, so we'd be happy to chat more!
> 
> Dave
> 
> 
> 
> On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:
>> Hello,
>> 
>> I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.
>> 
>> OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.
>> 
>> Thanks,
>> Thien
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Getting started

Posted by David Kale <da...@cs.stanford.edu>.
Thien,

Actually, a great place to start would be the attached paper, "An
Informatics Architecture for the Virtual Pediatric Intensive Care
Unit" from CBMS 2011, describing an effort to do data management at
Children's Hospital LA in a way that can scale to multiple
institutions, using OODT principles and components.  There are a
number of VPICU team members on this list (from both CHLA and JPL) who
would be happy to answer your questions, on the list or offline.

My perspective is the following: data integration and management in
large medical institutions and organizations is HARD.  That is a
simple fact, and any company or organization who claims to have an
out-of-the-box solution is either deluded or lying.  There is some
amount of DYI (do it yourself) that is unavoidable, and so -- in my
opinion -- it's best to go with something that gives you a lot of
interoperable but loosely coupled tools, components, and modules that
save you bits of work here and there but are also easily modified or
extended.  From that standpoint, OODT and the larger Apache ecosystem
are quite suitable.

As for specifics, I'd recommend diving into the FileManager and
WorkFlowManager components, though perhaps others would offer
different advice.

Anyway, sounds like you're clearly heading in the same direction we
are, so we'd be happy to chat more!

Dave



On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:
> Hello,
>
> I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.
>
> OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.
>
> Thanks,
> Thien
>

Re: Getting started

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
+1, agreed, Cam.

Cheers,
Chris

On Oct 28, 2011, at 11:31 AM, Cameron Goodale wrote:

> Thien,
> 
> Thanks for the feedback on the site.  Below is my take on OODT, and I am
> sure others can chime in with maybe some publications you can read.  I hope
> you find this helpful.
> 
> OODT is not a turnkey solution.  (Just wanted to get that out of the way,
> since many people see OODT as a product and not a framework)
> 
> OODT is a framework for data management (this means ANY data, from Space to
> Climate to Medical to Images can be archive and processed with OODT)
> 
> The power of OODT is the flexibility of the framework, but this also makes
> it challenging to grasp and understand everything it can do.
> 
> The power of OODT is in the community of Devs and Users.  Chances are
> someone else in the community has encountered the same challenge or a
> similar one.
> 
> Keep the Questions and Comments coming.
> 
> 
> -Cameron
> 
> 
> 
> 
> On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:
> 
>> Hello,
>> 
>> I'm a medical informatics software developer at the Boston Department of
>> Veteran Affairs. We are reviewing tools to prototype for use in our systems
>> that would help manage, share, and track research information. We have
>> genomic data, health records, phenotypic data, and results of nlp/machine
>> learning techniques.
>> 
>> OODT sounds like a nice package that we would like to try out, but,
>> honestly, the website says very little. I'm not even sure where to begin to
>> even try interacting with the downloaded source. Any direction on how to get
>> started or lower-level resources to read up on would be greatly appreciated.
>> 
>> Thanks,
>> Thien
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Getting started

Posted by Cameron Goodale <go...@apache.org>.
Thien,

Thanks for the feedback on the site.  Below is my take on OODT, and I am
sure others can chime in with maybe some publications you can read.  I hope
you find this helpful.

OODT is not a turnkey solution.  (Just wanted to get that out of the way,
since many people see OODT as a product and not a framework)

OODT is a framework for data management (this means ANY data, from Space to
Climate to Medical to Images can be archive and processed with OODT)

The power of OODT is the flexibility of the framework, but this also makes
it challenging to grasp and understand everything it can do.

The power of OODT is in the community of Devs and Users.  Chances are
someone else in the community has encountered the same challenge or a
similar one.

Keep the Questions and Comments coming.


-Cameron




On Fri, Oct 28, 2011 at 9:27 AM, Nguyen, Thien <Th...@va.gov> wrote:

> Hello,
>
> I'm a medical informatics software developer at the Boston Department of
> Veteran Affairs. We are reviewing tools to prototype for use in our systems
> that would help manage, share, and track research information. We have
> genomic data, health records, phenotypic data, and results of nlp/machine
> learning techniques.
>
> OODT sounds like a nice package that we would like to try out, but,
> honestly, the website says very little. I'm not even sure where to begin to
> even try interacting with the downloaded source. Any direction on how to get
> started or lower-level resources to read up on would be greatly appreciated.
>
> Thanks,
> Thien
>

Re: Getting started

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Thien,

Thanks for your email message. 

Here are 2 papers that might help in expanding on OODT:

The "Information Integration" side of the components:

http://sunset.usc.edu/~mattmann/pubs/ICSE06.pdf

The "Data Processing and Cataloging/Archiving" side of the components:

http://sunset.usc.edu/~mattmann/pubs/SMCIT09.pdf

We have projects within the OODT community (e.g., working with the NCI, 
and the EDRN projects) that deal with proteomics data, so there's a good 
shot OODT might be the right fit. 

To get started with the OODT source code, you might want to check this out:

https://cwiki.apache.org/confluence/display/OODT/Home
https://cwiki.apache.org/confluence/display/OODT/Getting+started+with+Apache+OODT
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27361963

Those are links to our wiki that are pretty actively maintained and operated 
on. Your thoughts/ideas about how to improve the documentation and work 
with the source code are welcomed. I would echo Cameron and Dave's comments: 
please feel free to ask questions here and we'll help!

Cheers,
Chris



On Oct 28, 2011, at 9:27 AM, Nguyen, Thien wrote:

> Hello,
> 
> I'm a medical informatics software developer at the Boston Department of Veteran Affairs. We are reviewing tools to prototype for use in our systems that would help manage, share, and track research information. We have genomic data, health records, phenotypic data, and results of nlp/machine learning techniques.
> 
> OODT sounds like a nice package that we would like to try out, but, honestly, the website says very little. I'm not even sure where to begin to even try interacting with the downloaded source. Any direction on how to get started or lower-level resources to read up on would be greatly appreciated.
> 
> Thanks,
> Thien


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++