You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Matthew Vita <ma...@gmail.com> on 2017/10/08 00:15:54 UTC

Quick question on continuously ingesting clinical documents

Hi Sean, Tim, cTAKES Community,

How do I make the cTAKES Collection Reader continuously "listen" for more
clinical documents to process?

Some notes on how I am trying this:

   - I am using Tim's awesome cTAKES Docker solution:
   https://github.com/tmills/ctakes-docker
   - Target UIMA operation that is being targetted:
   *org.apache.uima.examples.as.RunRemoteAsyncAE*
   - Descriptor file that is passed in:
   https://github.com/tmills/ctakes-docker/blob/master/desc/localDeploymentDescriptorNoDeid.xml
   - The arguments being passed (note that ActiveMQ is used as the
broker): *tcp://<local
   ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
   desc/FilesInDirectoryCollectionReader.xml -o xmis/*


Thanks,

Matthew Vita
www.matthewvita.com

Re: Quick question on continuously ingesting clinical documents [EXTERNAL]

Posted by Matthew Vita <ma...@gmail.com>.
Thank you all for the suggestions. I also received some advice from Tim in
a Github comment.

As always, I will be thoroughly documenting the solution for the community.
Plan on tackling this right after I get the MySQL dictionary support
working and documented.

Thanks,

Matthew Vita
www.matthewvita.com

On Mon, Oct 9, 2017 at 1:18 PM, Alexandru Zbarcea <al...@apache.org> wrote:

> Hi Matthew,
>
> Another approach would be to use Apache Camel with the File[1] component.
>
> Alex
>
> [1] - http://camel.apache.org/file2.html
>
> On Mon, Oct 9, 2017 at 12:09 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Matthew,
> >
> > I haven't done anything  with Tim's Docker, so I don't know how it is
> > handling lifetimes.  As I understand it uima-as runs the pipelines as
> > services, and the reader is a client.  If one reader completes and later
> on
> > you want to run more documents, you should be able to run a new reader
> and
> > point it to the port and batch service name of your pipelines.  Otherwise
> > you can create a reader that never shuts down by overriding the hasNext()
> > and getNext() methods.  Since you are using the
> > FilesInDirectoryCollectionReader (consider using the newer
> > FileTreeReader) you could use something like a WatchService.
> > https://docs.oracle.com/javase/tutorial/essential/io/notification.html
> >
> > Sean
> >
> > -----Original Message-----
> > From: Matthew Vita [mailto:matthewvita48@gmail.com]
> > Sent: Saturday, October 07, 2017 8:16 PM
> > To: dev@ctakes.apache.org
> > Subject: Quick question on continuously ingesting clinical documents
> > [EXTERNAL]
> >
> > Hi Sean, Tim, cTAKES Community,
> >
> > How do I make the cTAKES Collection Reader continuously "listen" for more
> > clinical documents to process?
> >
> > Some notes on how I am trying this:
> >
> >    - I am using Tim's awesome cTAKES Docker solution:
> >    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bc
> > pKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqos
> > FvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e=
> >    - Target UIMA operation that is being targetted:
> >    *org.apache.uima.examples.as.RunRemoteAsyncAE*
> >    - Descriptor file that is passed in:
> >    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > com_tmills_ctakes-2Ddocker_blob_master_desc_
> localDeploymentDescriptorNoDei
> > d.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_
> > FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1N
> > g8zhwawR_YWsM&e=
> >    - The arguments being passed (note that ActiveMQ is used as the
> > broker): *tcp://<local
> >    ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
> >    desc/FilesInDirectoryCollectionReader.xml -o xmis/*
> >
> >
> > Thanks,
> >
> > Matthew Vita
> > www.matthewvita.com
> >
>

Re: Quick question on continuously ingesting clinical documents [EXTERNAL]

Posted by Alexandru Zbarcea <al...@apache.org>.
Hi Matthew,

Another approach would be to use Apache Camel with the File[1] component.

Alex

[1] - http://camel.apache.org/file2.html

On Mon, Oct 9, 2017 at 12:09 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Matthew,
>
> I haven't done anything  with Tim's Docker, so I don't know how it is
> handling lifetimes.  As I understand it uima-as runs the pipelines as
> services, and the reader is a client.  If one reader completes and later on
> you want to run more documents, you should be able to run a new reader and
> point it to the port and batch service name of your pipelines.  Otherwise
> you can create a reader that never shuts down by overriding the hasNext()
> and getNext() methods.  Since you are using the
> FilesInDirectoryCollectionReader (consider using the newer
> FileTreeReader) you could use something like a WatchService.
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html
>
> Sean
>
> -----Original Message-----
> From: Matthew Vita [mailto:matthewvita48@gmail.com]
> Sent: Saturday, October 07, 2017 8:16 PM
> To: dev@ctakes.apache.org
> Subject: Quick question on continuously ingesting clinical documents
> [EXTERNAL]
>
> Hi Sean, Tim, cTAKES Community,
>
> How do I make the cTAKES Collection Reader continuously "listen" for more
> clinical documents to process?
>
> Some notes on how I am trying this:
>
>    - I am using Tim's awesome cTAKES Docker solution:
>    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bc
> pKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqos
> FvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e=
>    - Target UIMA operation that is being targetted:
>    *org.apache.uima.examples.as.RunRemoteAsyncAE*
>    - Descriptor file that is passed in:
>    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_tmills_ctakes-2Ddocker_blob_master_desc_localDeploymentDescriptorNoDei
> d.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_
> FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1N
> g8zhwawR_YWsM&e=
>    - The arguments being passed (note that ActiveMQ is used as the
> broker): *tcp://<local
>    ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
>    desc/FilesInDirectoryCollectionReader.xml -o xmis/*
>
>
> Thanks,
>
> Matthew Vita
> www.matthewvita.com
>

RE: Quick question on continuously ingesting clinical documents [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Matthew,

I haven't done anything  with Tim's Docker, so I don't know how it is handling lifetimes.  As I understand it uima-as runs the pipelines as services, and the reader is a client.  If one reader completes and later on you want to run more documents, you should be able to run a new reader and point it to the port and batch service name of your pipelines.  Otherwise you can create a reader that never shuts down by overriding the hasNext() and getNext() methods.  Since you are using the FilesInDirectoryCollectionReader (consider using the newer FileTreeReader) you could use something like a WatchService.  https://docs.oracle.com/javase/tutorial/essential/io/notification.html

Sean

-----Original Message-----
From: Matthew Vita [mailto:matthewvita48@gmail.com] 
Sent: Saturday, October 07, 2017 8:16 PM
To: dev@ctakes.apache.org
Subject: Quick question on continuously ingesting clinical documents [EXTERNAL]

Hi Sean, Tim, cTAKES Community,

How do I make the cTAKES Collection Reader continuously "listen" for more clinical documents to process?

Some notes on how I am trying this:

   - I am using Tim's awesome cTAKES Docker solution:
   https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e= 
   - Target UIMA operation that is being targetted:
   *org.apache.uima.examples.as.RunRemoteAsyncAE*
   - Descriptor file that is passed in:
   https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmills_ctakes-2Ddocker_blob_master_desc_localDeploymentDescriptorNoDeid.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1Ng8zhwawR_YWsM&e= 
   - The arguments being passed (note that ActiveMQ is used as the
broker): *tcp://<local
   ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
   desc/FilesInDirectoryCollectionReader.xml -o xmis/*


Thanks,

Matthew Vita
www.matthewvita.com