You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2017/10/09 16:09:57 UTC
RE: Quick question on continuously ingesting clinical documents
[EXTERNAL]
Hi Matthew,
I haven't done anything with Tim's Docker, so I don't know how it is handling lifetimes. As I understand it uima-as runs the pipelines as services, and the reader is a client. If one reader completes and later on you want to run more documents, you should be able to run a new reader and point it to the port and batch service name of your pipelines. Otherwise you can create a reader that never shuts down by overriding the hasNext() and getNext() methods. Since you are using the FilesInDirectoryCollectionReader (consider using the newer FileTreeReader) you could use something like a WatchService. https://docs.oracle.com/javase/tutorial/essential/io/notification.html
Sean
-----Original Message-----
From: Matthew Vita [mailto:matthewvita48@gmail.com]
Sent: Saturday, October 07, 2017 8:16 PM
To: dev@ctakes.apache.org
Subject: Quick question on continuously ingesting clinical documents [EXTERNAL]
Hi Sean, Tim, cTAKES Community,
How do I make the cTAKES Collection Reader continuously "listen" for more clinical documents to process?
Some notes on how I am trying this:
- I am using Tim's awesome cTAKES Docker solution:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e=
- Target UIMA operation that is being targetted:
*org.apache.uima.examples.as.RunRemoteAsyncAE*
- Descriptor file that is passed in:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_tmills_ctakes-2Ddocker_blob_master_desc_localDeploymentDescriptorNoDeid.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1Ng8zhwawR_YWsM&e=
- The arguments being passed (note that ActiveMQ is used as the
broker): *tcp://<local
ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
desc/FilesInDirectoryCollectionReader.xml -o xmis/*
Thanks,
Matthew Vita
www.matthewvita.com
Re: Quick question on continuously ingesting clinical documents [EXTERNAL]
Posted by Matthew Vita <ma...@gmail.com>.
Thank you all for the suggestions. I also received some advice from Tim in
a Github comment.
As always, I will be thoroughly documenting the solution for the community.
Plan on tackling this right after I get the MySQL dictionary support
working and documented.
Thanks,
Matthew Vita
www.matthewvita.com
On Mon, Oct 9, 2017 at 1:18 PM, Alexandru Zbarcea <al...@apache.org> wrote:
> Hi Matthew,
>
> Another approach would be to use Apache Camel with the File[1] component.
>
> Alex
>
> [1] - http://camel.apache.org/file2.html
>
> On Mon, Oct 9, 2017 at 12:09 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Matthew,
> >
> > I haven't done anything with Tim's Docker, so I don't know how it is
> > handling lifetimes. As I understand it uima-as runs the pipelines as
> > services, and the reader is a client. If one reader completes and later
> on
> > you want to run more documents, you should be able to run a new reader
> and
> > point it to the port and batch service name of your pipelines. Otherwise
> > you can create a reader that never shuts down by overriding the hasNext()
> > and getNext() methods. Since you are using the
> > FilesInDirectoryCollectionReader (consider using the newer
> > FileTreeReader) you could use something like a WatchService.
> > https://docs.oracle.com/javase/tutorial/essential/io/notification.html
> >
> > Sean
> >
> > -----Original Message-----
> > From: Matthew Vita [mailto:matthewvita48@gmail.com]
> > Sent: Saturday, October 07, 2017 8:16 PM
> > To: dev@ctakes.apache.org
> > Subject: Quick question on continuously ingesting clinical documents
> > [EXTERNAL]
> >
> > Hi Sean, Tim, cTAKES Community,
> >
> > How do I make the cTAKES Collection Reader continuously "listen" for more
> > clinical documents to process?
> >
> > Some notes on how I am trying this:
> >
> > - I am using Tim's awesome cTAKES Docker solution:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> > 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bc
> > pKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqos
> > FvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e=
> > - Target UIMA operation that is being targetted:
> > *org.apache.uima.examples.as.RunRemoteAsyncAE*
> > - Descriptor file that is passed in:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> > com_tmills_ctakes-2Ddocker_blob_master_desc_
> localDeploymentDescriptorNoDei
> > d.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_
> > FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1N
> > g8zhwawR_YWsM&e=
> > - The arguments being passed (note that ActiveMQ is used as the
> > broker): *tcp://<local
> > ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
> > desc/FilesInDirectoryCollectionReader.xml -o xmis/*
> >
> >
> > Thanks,
> >
> > Matthew Vita
> > www.matthewvita.com
> >
>
Re: Quick question on continuously ingesting clinical documents [EXTERNAL]
Posted by Alexandru Zbarcea <al...@apache.org>.
Hi Matthew,
Another approach would be to use Apache Camel with the File[1] component.
Alex
[1] - http://camel.apache.org/file2.html
On Mon, Oct 9, 2017 at 12:09 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:
> Hi Matthew,
>
> I haven't done anything with Tim's Docker, so I don't know how it is
> handling lifetimes. As I understand it uima-as runs the pipelines as
> services, and the reader is a client. If one reader completes and later on
> you want to run more documents, you should be able to run a new reader and
> point it to the port and batch service name of your pipelines. Otherwise
> you can create a reader that never shuts down by overriding the hasNext()
> and getNext() methods. Since you are using the
> FilesInDirectoryCollectionReader (consider using the newer
> FileTreeReader) you could use something like a WatchService.
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html
>
> Sean
>
> -----Original Message-----
> From: Matthew Vita [mailto:matthewvita48@gmail.com]
> Sent: Saturday, October 07, 2017 8:16 PM
> To: dev@ctakes.apache.org
> Subject: Quick question on continuously ingesting clinical documents
> [EXTERNAL]
>
> Hi Sean, Tim, cTAKES Community,
>
> How do I make the cTAKES Collection Reader continuously "listen" for more
> clinical documents to process?
>
> Some notes on how I am trying this:
>
> - I am using Tim's awesome cTAKES Docker solution:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_tmills_ctakes-2Ddocker&d=DwIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bc
> pKGd4f7d4gTao&m=zTjl8gQo_FqeXWuZaayXXrTp13R2rxGMTBZFqos
> FvsM&s=s91D5pM0vP6AQ_N9LB_kyObELALGWiJ83bKjkg7UQcM&e=
> - Target UIMA operation that is being targetted:
> *org.apache.uima.examples.as.RunRemoteAsyncAE*
> - Descriptor file that is passed in:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_tmills_ctakes-2Ddocker_blob_master_desc_localDeploymentDescriptorNoDei
> d.xml&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=zTjl8gQo_
> FqeXWuZaayXXrTp13R2rxGMTBZFqosFvsM&s=bSWaZM99ykkbAfanIZqma7F6Kzum1N
> g8zhwawR_YWsM&e=
> - The arguments being passed (note that ActiveMQ is used as the
> broker): *tcp://<local
> ip address>:61616 mainQueue -d desc/localDeploymentDescriptor.xml -c
> desc/FilesInDirectoryCollectionReader.xml -o xmis/*
>
>
> Thanks,
>
> Matthew Vita
> www.matthewvita.com
>