You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "reshu.agarwal" <re...@orkash.com> on 2015/04/28 11:20:52 UTC
DUCC- process_dd
Hi,
I was trying to scale my processing pipeline to be run in DUCC
environment with uima as process_dd. If I was trying to scale using the
below given configuration, the threads started were not as expected:
<analysisEngineDeploymentDescription
xmlns="http://uima.apache.org/resourceSpecifier">
<name>Uima v3 Deployment Descripter</name>
<description>Deploys Uima v3 Aggregate AE using the Advanced
Fixed Flow
Controller</description>
<deployment protocol="jms" provider="activemq">
<casPool numberOfCASes="5" />
<service>
<inputQueue endpoint="UIMA_Queue_test"
brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
<topDescriptor>
<import
location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
/>
</topDescriptor>
<analysisEngine async="true"
key="FlowControllerAgg" internalReplyQueueScaleout="10"
inputQueueScaleout="10">
<scaleout numberOfInstances="5"/>
<delegates>
<analysisEngine
key="ChunkerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine key="NEDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine
key="StemmerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine
key="ConsumerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
</delegates>
</analysisEngine>
</service>
</deployment>
</analysisEngineDeploymentDescription>
There should be 5 threads of FlowControllerAgg where each thread will
have 5 more threads of each
ChunkerDescriptor,NEDescriptor,StemmerDescriptor and ConsumerDescriptor.
But I didn't think it is actually happening in case of DUCC.
Thanks in advance.
Reshu.
Re: DUCC- process_dd
Posted by Eddie Epstein <ea...@gmail.com>.
Reshu,
UIMA-AS configurations are normally used in DUCC as Services for
interactive applications or to support Jobs. They can be used in Jobs, but
typically are not.
There is also a difference in the inputs between Job processes and
Services. Services will normally receive a CAS with the artifact to be
analyzed. A Job process will receive a CAS with reference to the artifact
or even collection of artifacts; this is important for Job scale out to
avoid making the Job's Collection Reader a bottleneck.
I suggest starting with one of the sample applications and adapting it to
your needs. We can help if you give some details about the format of the
input and output data.
Eddie
On Fri, May 1, 2015 at 12:31 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Eddie,
>
> I was using this same scenario and doing hit and try to compare this with
> UIMA AS to get the more scaled pipeline as I think UIMA AS can also did
> this. But I am unable to touch the processing time of DUCC's default
> configuration like you mentioned with UIMA AS.
>
> Can you help me in doing this? I just want to do scaling by using best
> configuration of UIMA AS and DUCC which can be done using process_dd. But
> How??
>
> Thanks in advanced.
>
> Reshu.
>
>
> On 05/01/2015 03:28 AM, Eddie Epstein wrote:
>
>> The simplest way of vertically scaling a Job process is to specify the
>> analysis pipeline using core UIMA descriptors and then using
>> --process_thread_count to specify how many copies of the pipeline to
>> deploy, each in a different thread. No use of UIMA-AS at all. Please check
>> out the "Raw Text Processing" sample application that comes with DUCC.
>>
>> On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <reshu.agarwal@orkash.com
>> >
>> wrote:
>>
>> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
>>> AEs both.
>>>
>>> I want to scale aggregate as well as individual AEs. Is there any way of
>>> doing this in UIMA AS/DUCC?
>>>
>>>
>>>
>>> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>>>
>>> In async aggregate you scale individual AEs not the aggregate as a
>>>> whole.
>>>> The below configuration should do that. Are there any warnings from
>>>> dd2spring at startup with your configuration?
>>>>
>>>> <analysisEngine async="true" >
>>>>
>>>> <delegates>
>>>> <analysisEngine
>>>> key="ChunkerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="NEDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="StemmerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="ConsumerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> </delegates>
>>>> </analysisEngine>
>>>>
>>>> Jerry
>>>>
>>>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <
>>>> reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>> I was trying to scale my processing pipeline to be run in DUCC
>>>>> environment
>>>>> with uima as process_dd. If I was trying to scale using the below given
>>>>> configuration, the threads started were not as expected:
>>>>>
>>>>>
>>>>> <analysisEngineDeploymentDescription
>>>>> xmlns="http://uima.apache.org/resourceSpecifier">
>>>>>
>>>>> <name>Uima v3 Deployment Descripter</name>
>>>>> <description>Deploys Uima v3 Aggregate AE using the Advanced
>>>>> Fixed
>>>>> Flow
>>>>> Controller</description>
>>>>>
>>>>> <deployment protocol="jms" provider="activemq">
>>>>> <casPool numberOfCASes="5" />
>>>>> <service>
>>>>> <inputQueue endpoint="UIMA_Queue_test"
>>>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0"
>>>>> />
>>>>> <topDescriptor>
>>>>> <import
>>>>>
>>>>>
>>>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>>>> />
>>>>> </topDescriptor>
>>>>> <analysisEngine async="true"
>>>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>>>> inputQueueScaleout="10">
>>>>> <scaleout numberOfInstances="5"/>
>>>>> <delegates>
>>>>> <analysisEngine
>>>>> key="ChunkerDescriptor">
>>>>> <scaleout
>>>>> numberOfInstances="5" />
>>>>> </analysisEngine>
>>>>> <analysisEngine
>>>>> key="NEDescriptor">
>>>>> <scaleout
>>>>> numberOfInstances="5" />
>>>>> </analysisEngine>
>>>>> <analysisEngine
>>>>> key="StemmerDescriptor">
>>>>> <scaleout
>>>>> numberOfInstances="5" />
>>>>> </analysisEngine>
>>>>> <analysisEngine
>>>>> key="ConsumerDescriptor">
>>>>> <scaleout
>>>>> numberOfInstances="5" />
>>>>> </analysisEngine>
>>>>> </delegates>
>>>>> </analysisEngine>
>>>>> </service>
>>>>> </deployment>
>>>>>
>>>>> </analysisEngineDeploymentDescription>
>>>>>
>>>>>
>>>>> There should be 5 threads of FlowControllerAgg where each thread will
>>>>> have
>>>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>>>> and
>>>>> ConsumerDescriptor.
>>>>>
>>>>> But I didn't think it is actually happening in case of DUCC.
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>
Re: DUCC- process_dd
Posted by "reshu.agarwal" <re...@orkash.com>.
Eddie,
I was using this same scenario and doing hit and try to compare this
with UIMA AS to get the more scaled pipeline as I think UIMA AS can also
did this. But I am unable to touch the processing time of DUCC's default
configuration like you mentioned with UIMA AS.
Can you help me in doing this? I just want to do scaling by using best
configuration of UIMA AS and DUCC which can be done using process_dd.
But How??
Thanks in advanced.
Reshu.
On 05/01/2015 03:28 AM, Eddie Epstein wrote:
> The simplest way of vertically scaling a Job process is to specify the
> analysis pipeline using core UIMA descriptors and then using
> --process_thread_count to specify how many copies of the pipeline to
> deploy, each in a different thread. No use of UIMA-AS at all. Please check
> out the "Raw Text Processing" sample application that comes with DUCC.
>
> On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
>> AEs both.
>>
>> I want to scale aggregate as well as individual AEs. Is there any way of
>> doing this in UIMA AS/DUCC?
>>
>>
>>
>> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>>
>>> In async aggregate you scale individual AEs not the aggregate as a whole.
>>> The below configuration should do that. Are there any warnings from
>>> dd2spring at startup with your configuration?
>>>
>>> <analysisEngine async="true" >
>>>
>>> <delegates>
>>> <analysisEngine
>>> key="ChunkerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="NEDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="StemmerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="ConsumerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> </delegates>
>>> </analysisEngine>
>>>
>>> Jerry
>>>
>>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <re...@orkash.com>
>>> wrote:
>>>
>>> Hi,
>>>> I was trying to scale my processing pipeline to be run in DUCC
>>>> environment
>>>> with uima as process_dd. If I was trying to scale using the below given
>>>> configuration, the threads started were not as expected:
>>>>
>>>>
>>>> <analysisEngineDeploymentDescription
>>>> xmlns="http://uima.apache.org/resourceSpecifier">
>>>>
>>>> <name>Uima v3 Deployment Descripter</name>
>>>> <description>Deploys Uima v3 Aggregate AE using the Advanced
>>>> Fixed
>>>> Flow
>>>> Controller</description>
>>>>
>>>> <deployment protocol="jms" provider="activemq">
>>>> <casPool numberOfCASes="5" />
>>>> <service>
>>>> <inputQueue endpoint="UIMA_Queue_test"
>>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>>>> <topDescriptor>
>>>> <import
>>>>
>>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>>> />
>>>> </topDescriptor>
>>>> <analysisEngine async="true"
>>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>>> inputQueueScaleout="10">
>>>> <scaleout numberOfInstances="5"/>
>>>> <delegates>
>>>> <analysisEngine
>>>> key="ChunkerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="NEDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="StemmerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> <analysisEngine
>>>> key="ConsumerDescriptor">
>>>> <scaleout
>>>> numberOfInstances="5" />
>>>> </analysisEngine>
>>>> </delegates>
>>>> </analysisEngine>
>>>> </service>
>>>> </deployment>
>>>>
>>>> </analysisEngineDeploymentDescription>
>>>>
>>>>
>>>> There should be 5 threads of FlowControllerAgg where each thread will
>>>> have
>>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>>> and
>>>> ConsumerDescriptor.
>>>>
>>>> But I didn't think it is actually happening in case of DUCC.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>
>>>>
Re: DUCC- process_dd
Posted by Eddie Epstein <ea...@gmail.com>.
The simplest way of vertically scaling a Job process is to specify the
analysis pipeline using core UIMA descriptors and then using
--process_thread_count to specify how many copies of the pipeline to
deploy, each in a different thread. No use of UIMA-AS at all. Please check
out the "Raw Text Processing" sample application that comes with DUCC.
On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <re...@orkash.com>
wrote:
>
> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
> AEs both.
>
> I want to scale aggregate as well as individual AEs. Is there any way of
> doing this in UIMA AS/DUCC?
>
>
>
> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>
>> In async aggregate you scale individual AEs not the aggregate as a whole.
>> The below configuration should do that. Are there any warnings from
>> dd2spring at startup with your configuration?
>>
>> <analysisEngine async="true" >
>>
>> <delegates>
>> <analysisEngine
>> key="ChunkerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine
>> key="NEDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine
>> key="StemmerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine
>> key="ConsumerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> </delegates>
>> </analysisEngine>
>>
>> Jerry
>>
>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <re...@orkash.com>
>> wrote:
>>
>> Hi,
>>>
>>> I was trying to scale my processing pipeline to be run in DUCC
>>> environment
>>> with uima as process_dd. If I was trying to scale using the below given
>>> configuration, the threads started were not as expected:
>>>
>>>
>>> <analysisEngineDeploymentDescription
>>> xmlns="http://uima.apache.org/resourceSpecifier">
>>>
>>> <name>Uima v3 Deployment Descripter</name>
>>> <description>Deploys Uima v3 Aggregate AE using the Advanced
>>> Fixed
>>> Flow
>>> Controller</description>
>>>
>>> <deployment protocol="jms" provider="activemq">
>>> <casPool numberOfCASes="5" />
>>> <service>
>>> <inputQueue endpoint="UIMA_Queue_test"
>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>>> <topDescriptor>
>>> <import
>>>
>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>> />
>>> </topDescriptor>
>>> <analysisEngine async="true"
>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>> inputQueueScaleout="10">
>>> <scaleout numberOfInstances="5"/>
>>> <delegates>
>>> <analysisEngine
>>> key="ChunkerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="NEDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="StemmerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> <analysisEngine
>>> key="ConsumerDescriptor">
>>> <scaleout
>>> numberOfInstances="5" />
>>> </analysisEngine>
>>> </delegates>
>>> </analysisEngine>
>>> </service>
>>> </deployment>
>>>
>>> </analysisEngineDeploymentDescription>
>>>
>>>
>>> There should be 5 threads of FlowControllerAgg where each thread will
>>> have
>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>> and
>>> ConsumerDescriptor.
>>>
>>> But I didn't think it is actually happening in case of DUCC.
>>>
>>> Thanks in advance.
>>>
>>> Reshu.
>>>
>>>
>>>
>>>
>
Re: DUCC- process_dd
Posted by "reshu.agarwal" <re...@orkash.com>.
Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
AEs both.
I want to scale aggregate as well as individual AEs. Is there any way of
doing this in UIMA AS/DUCC?
On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
> In async aggregate you scale individual AEs not the aggregate as a whole.
> The below configuration should do that. Are there any warnings from
> dd2spring at startup with your configuration?
>
> <analysisEngine async="true" >
>
> <delegates>
> <analysisEngine
> key="ChunkerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine key="NEDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine
> key="StemmerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine
> key="ConsumerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> </delegates>
> </analysisEngine>
>
> Jerry
>
> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Hi,
>>
>> I was trying to scale my processing pipeline to be run in DUCC environment
>> with uima as process_dd. If I was trying to scale using the below given
>> configuration, the threads started were not as expected:
>>
>>
>> <analysisEngineDeploymentDescription
>> xmlns="http://uima.apache.org/resourceSpecifier">
>>
>> <name>Uima v3 Deployment Descripter</name>
>> <description>Deploys Uima v3 Aggregate AE using the Advanced Fixed
>> Flow
>> Controller</description>
>>
>> <deployment protocol="jms" provider="activemq">
>> <casPool numberOfCASes="5" />
>> <service>
>> <inputQueue endpoint="UIMA_Queue_test"
>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>> <topDescriptor>
>> <import
>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>> />
>> </topDescriptor>
>> <analysisEngine async="true"
>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>> inputQueueScaleout="10">
>> <scaleout numberOfInstances="5"/>
>> <delegates>
>> <analysisEngine
>> key="ChunkerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine key="NEDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine
>> key="StemmerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> <analysisEngine
>> key="ConsumerDescriptor">
>> <scaleout
>> numberOfInstances="5" />
>> </analysisEngine>
>> </delegates>
>> </analysisEngine>
>> </service>
>> </deployment>
>>
>> </analysisEngineDeploymentDescription>
>>
>>
>> There should be 5 threads of FlowControllerAgg where each thread will have
>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and
>> ConsumerDescriptor.
>>
>> But I didn't think it is actually happening in case of DUCC.
>>
>> Thanks in advance.
>>
>> Reshu.
>>
>>
>>
Re: DUCC- process_dd
Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
In async aggregate you scale individual AEs not the aggregate as a whole.
The below configuration should do that. Are there any warnings from
dd2spring at startup with your configuration?
<analysisEngine async="true" >
<delegates>
<analysisEngine
key="ChunkerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine key="NEDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine
key="StemmerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
<analysisEngine
key="ConsumerDescriptor">
<scaleout
numberOfInstances="5" />
</analysisEngine>
</delegates>
</analysisEngine>
Jerry
On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <re...@orkash.com>
wrote:
> Hi,
>
> I was trying to scale my processing pipeline to be run in DUCC environment
> with uima as process_dd. If I was trying to scale using the below given
> configuration, the threads started were not as expected:
>
>
> <analysisEngineDeploymentDescription
> xmlns="http://uima.apache.org/resourceSpecifier">
>
> <name>Uima v3 Deployment Descripter</name>
> <description>Deploys Uima v3 Aggregate AE using the Advanced Fixed
> Flow
> Controller</description>
>
> <deployment protocol="jms" provider="activemq">
> <casPool numberOfCASes="5" />
> <service>
> <inputQueue endpoint="UIMA_Queue_test"
> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
> <topDescriptor>
> <import
> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
> />
> </topDescriptor>
> <analysisEngine async="true"
> key="FlowControllerAgg" internalReplyQueueScaleout="10"
> inputQueueScaleout="10">
> <scaleout numberOfInstances="5"/>
> <delegates>
> <analysisEngine
> key="ChunkerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine key="NEDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine
> key="StemmerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> <analysisEngine
> key="ConsumerDescriptor">
> <scaleout
> numberOfInstances="5" />
> </analysisEngine>
> </delegates>
> </analysisEngine>
> </service>
> </deployment>
>
> </analysisEngineDeploymentDescription>
>
>
> There should be 5 threads of FlowControllerAgg where each thread will have
> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor and
> ConsumerDescriptor.
>
> But I didn't think it is actually happening in case of DUCC.
>
> Thanks in advance.
>
> Reshu.
>
>
>