You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by "reshu.agarwal" <re...@orkash.com> on 2015/05/01 06:31:24 UTC

Re: DUCC- process_dd

Eddie,

I was using this same scenario and doing hit and try to compare this 
with UIMA AS to get the more scaled pipeline as I think UIMA AS can also 
did this. But I am unable to touch the processing time of DUCC's default 
configuration like you mentioned with UIMA AS.

Can you help me in doing this? I just want to do scaling by using best 
configuration of UIMA AS and DUCC which can be done using process_dd. 
But How??

Thanks in advanced.

Reshu.

On 05/01/2015 03:28 AM, Eddie Epstein wrote:
> The simplest way of vertically scaling a Job process is to specify the
> analysis pipeline using core UIMA descriptors and then using
> --process_thread_count to specify how many copies of the pipeline to
> deploy, each in a different thread. No use of UIMA-AS at all. Please check
> out the "Raw Text Processing" sample application that comes with DUCC.
>
> On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <re...@orkash.com>
> wrote:
>
>> Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
>> AEs both.
>>
>> I want to scale aggregate as well as individual AEs. Is there any way of
>> doing this in UIMA AS/DUCC?
>>
>>
>>
>> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>>
>>> In async aggregate you scale individual AEs not the aggregate as a whole.
>>> The below configuration should do that. Are there any warnings from
>>> dd2spring at startup with your configuration?
>>>
>>> <analysisEngine async="true" >
>>>
>>>                                   <delegates>
>>>                                           <analysisEngine
>>> key="ChunkerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="NEDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="StemmerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                           <analysisEngine
>>> key="ConsumerDescriptor">
>>>                                                   <scaleout
>>> numberOfInstances="5" />
>>>                                           </analysisEngine>
>>>                                   </delegates>
>>>                           </analysisEngine>
>>>
>>> Jerry
>>>
>>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <re...@orkash.com>
>>> wrote:
>>>
>>>   Hi,
>>>> I was trying to scale my processing pipeline to be run in DUCC
>>>> environment
>>>> with uima as process_dd. If I was trying to scale using the below given
>>>> configuration, the threads started were not as expected:
>>>>
>>>>
>>>> <analysisEngineDeploymentDescription
>>>>           xmlns="http://uima.apache.org/resourceSpecifier">
>>>>
>>>>           <name>Uima v3 Deployment Descripter</name>
>>>>           <description>Deploys Uima v3 Aggregate AE using the Advanced
>>>> Fixed
>>>> Flow
>>>>                   Controller</description>
>>>>
>>>>           <deployment protocol="jms" provider="activemq">
>>>>                   <casPool numberOfCASes="5" />
>>>>                   <service>
>>>>                           <inputQueue endpoint="UIMA_Queue_test"
>>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0" />
>>>>                           <topDescriptor>
>>>>                                   <import
>>>>
>>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>>> />
>>>>                           </topDescriptor>
>>>>                           <analysisEngine async="true"
>>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>>> inputQueueScaleout="10">
>>>>                                   <scaleout numberOfInstances="5"/>
>>>>                                   <delegates>
>>>>                                           <analysisEngine
>>>> key="ChunkerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="NEDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="StemmerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="ConsumerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                   </delegates>
>>>>                           </analysisEngine>
>>>>                   </service>
>>>>           </deployment>
>>>>
>>>> </analysisEngineDeploymentDescription>
>>>>
>>>>
>>>> There should be 5 threads of FlowControllerAgg where each thread will
>>>> have
>>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>>> and
>>>> ConsumerDescriptor.
>>>>
>>>> But I didn't think it is actually happening in case of DUCC.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Reshu.
>>>>
>>>>
>>>>
>>>>

Re: DUCC- process_dd

Posted by Eddie Epstein <ea...@gmail.com>.

Reshu,

UIMA-AS configurations are normally used in DUCC as Services for
interactive applications or to support Jobs. They can be used in Jobs, but
typically are not.

There is also a difference in the inputs between Job processes and
Services. Services will normally receive a CAS with the artifact to be
analyzed. A Job process will receive a CAS with reference to the artifact
or even collection of artifacts; this is important for Job scale out to
avoid making the Job's Collection Reader a bottleneck.

I suggest starting with one of the sample applications and adapting it to
your needs. We can help if you give some details about the format of the
input and output data.

Eddie

On Fri, May 1, 2015 at 12:31 AM, reshu.agarwal <re...@orkash.com>
wrote:

> Eddie,
>
> I was using this same scenario and doing hit and try to compare this with
> UIMA AS to get the more scaled pipeline as I think UIMA AS can also did
> this. But I am unable to touch the processing time of DUCC's default
> configuration like you mentioned with UIMA AS.
>
> Can you help me in doing this? I just want to do scaling by using best
> configuration of UIMA AS and DUCC which can be done using process_dd. But
> How??
>
> Thanks in advanced.
>
> Reshu.
>
>
> On 05/01/2015 03:28 AM, Eddie Epstein wrote:
>
>> The simplest way of vertically scaling a Job process is to specify the
>> analysis pipeline using core UIMA descriptors and then using
>> --process_thread_count to specify how many copies of the pipeline to
>> deploy, each in a different thread. No use of UIMA-AS at all. Please check
>> out the "Raw Text Processing" sample application that comes with DUCC.
>>
>> On Wed, Apr 29, 2015 at 12:30 AM, reshu.agarwal <reshu.agarwal@orkash.com
>> >
>> wrote:
>>
>>  Ohh!!! I misunderstand this. I thought this would scale my Aggregate and
>>> AEs both.
>>>
>>> I want to scale aggregate as well as individual AEs. Is there any way of
>>> doing this in UIMA AS/DUCC?
>>>
>>>
>>>
>>> On 04/28/2015 07:14 PM, Jaroslaw Cwiklik wrote:
>>>
>>>  In async aggregate you scale individual AEs not the aggregate as a
>>>> whole.
>>>> The below configuration should do that. Are there any warnings from
>>>> dd2spring at startup with your configuration?
>>>>
>>>> <analysisEngine async="true" >
>>>>
>>>>                                   <delegates>
>>>>                                           <analysisEngine
>>>> key="ChunkerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="NEDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="StemmerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                           <analysisEngine
>>>> key="ConsumerDescriptor">
>>>>                                                   <scaleout
>>>> numberOfInstances="5" />
>>>>                                           </analysisEngine>
>>>>                                   </delegates>
>>>>                           </analysisEngine>
>>>>
>>>> Jerry
>>>>
>>>> On Tue, Apr 28, 2015 at 5:20 AM, reshu.agarwal <
>>>> reshu.agarwal@orkash.com>
>>>> wrote:
>>>>
>>>>   Hi,
>>>>
>>>>> I was trying to scale my processing pipeline to be run in DUCC
>>>>> environment
>>>>> with uima as process_dd. If I was trying to scale using the below given
>>>>> configuration, the threads started were not as expected:
>>>>>
>>>>>
>>>>> <analysisEngineDeploymentDescription
>>>>>           xmlns="http://uima.apache.org/resourceSpecifier">
>>>>>
>>>>>           <name>Uima v3 Deployment Descripter</name>
>>>>>           <description>Deploys Uima v3 Aggregate AE using the Advanced
>>>>> Fixed
>>>>> Flow
>>>>>                   Controller</description>
>>>>>
>>>>>           <deployment protocol="jms" provider="activemq">
>>>>>                   <casPool numberOfCASes="5" />
>>>>>                   <service>
>>>>>                           <inputQueue endpoint="UIMA_Queue_test"
>>>>> brokerURL="tcp://localhost:61617?jms.useCompression=true" prefetch="0"
>>>>> />
>>>>>                           <topDescriptor>
>>>>>                                   <import
>>>>>
>>>>>
>>>>> location="../Uima_v3_test/desc/orkash/ae/aggregate/FlowController_Uima.xml"
>>>>> />
>>>>>                           </topDescriptor>
>>>>>                           <analysisEngine async="true"
>>>>> key="FlowControllerAgg" internalReplyQueueScaleout="10"
>>>>> inputQueueScaleout="10">
>>>>>                                   <scaleout numberOfInstances="5"/>
>>>>>                                   <delegates>
>>>>>                                           <analysisEngine
>>>>> key="ChunkerDescriptor">
>>>>>                                                   <scaleout
>>>>> numberOfInstances="5" />
>>>>>                                           </analysisEngine>
>>>>>                                           <analysisEngine
>>>>> key="NEDescriptor">
>>>>>                                                   <scaleout
>>>>> numberOfInstances="5" />
>>>>>                                           </analysisEngine>
>>>>>                                           <analysisEngine
>>>>> key="StemmerDescriptor">
>>>>>                                                   <scaleout
>>>>> numberOfInstances="5" />
>>>>>                                           </analysisEngine>
>>>>>                                           <analysisEngine
>>>>> key="ConsumerDescriptor">
>>>>>                                                   <scaleout
>>>>> numberOfInstances="5" />
>>>>>                                           </analysisEngine>
>>>>>                                   </delegates>
>>>>>                           </analysisEngine>
>>>>>                   </service>
>>>>>           </deployment>
>>>>>
>>>>> </analysisEngineDeploymentDescription>
>>>>>
>>>>>
>>>>> There should be 5 threads of FlowControllerAgg where each thread will
>>>>> have
>>>>> 5 more threads of each ChunkerDescriptor,NEDescriptor,StemmerDescriptor
>>>>> and
>>>>> ConsumerDescriptor.
>>>>>
>>>>> But I didn't think it is actually happening in case of DUCC.
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Reshu.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>