You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2019/01/18 19:50:15 UTC

DuccAbstractProcessContainer impl repeating lots of unneeded work?

Hi,

While tracing a ducc run, I noticed that for each work item, the method:
    /**
     * This method is called to fetch a WorkItem ID from a given CAS which
     * is required to support investment reset.
     *
     */
    public String getKey(String xmi) throws Exception {
        if ( analysisEngineMetadata == null ) {
            // WorkItem ID (key) is only supported for pieces 'n parts
            return null;
        }
        Properties props = new Properties();
        props.setProperty(UIMAFramework.CAS_INITIAL_HEAP_SIZE, "1000");

        TypeSystemDescription tsd = analysisEngineMetadata.getTypeSystem();
        TypePriorities tp = analysisEngineMetadata.getTypePriorities();
        FsIndexDescription[] fsid = analysisEngineMetadata.getFsIndexes();
        CAS cas;
        synchronized( CasCreationUtils.class) {
            cas = CasCreationUtils.createCas(tsd, tp, fsid, props);
        }
        // deserialize the CAS
        getUimaSerializer().deserializeCasFromXmi((String)xmi, cas);

repeatedly parses a type system descriptor, makes a type system from it, etc.,
could be costly for large type systems.  Is there any reason not to do this
just once, and reuse the "cas"?

If a new cas needs to be created, but you know the type system will
be the same, then you can parse and create the type system just once,
and pass that to the createCas, at least.

-Marshall


Re: DuccAbstractProcessContainer impl repeating lots of unneeded work?

Posted by Jaroslaw Cwiklik <cw...@apache.org>.
This dead code will be removed soon.
Jerry

On Tue, Jan 22, 2019 at 8:28 PM Eddie Epstein <ea...@gmail.com> wrote:

> Best I can tell after hunting this down is that the offending code has been
> removed some time ago. Nothing calls getKey() anymore.
> Eddie
>
> On Fri, Jan 18, 2019 at 2:50 PM Marshall Schor <ms...@schor.com> wrote:
>
> > Hi,
> >
> > While tracing a ducc run, I noticed that for each work item, the method:
> >     /**
> >      * This method is called to fetch a WorkItem ID from a given CAS
> which
> >      * is required to support investment reset.
> >      *
> >      */
> >     public String getKey(String xmi) throws Exception {
> >         if ( analysisEngineMetadata == null ) {
> >             // WorkItem ID (key) is only supported for pieces 'n parts
> >             return null;
> >         }
> >         Properties props = new Properties();
> >         props.setProperty(UIMAFramework.CAS_INITIAL_HEAP_SIZE, "1000");
> >
> >         TypeSystemDescription tsd =
> analysisEngineMetadata.getTypeSystem();
> >         TypePriorities tp = analysisEngineMetadata.getTypePriorities();
> >         FsIndexDescription[] fsid =
> analysisEngineMetadata.getFsIndexes();
> >         CAS cas;
> >         synchronized( CasCreationUtils.class) {
> >             cas = CasCreationUtils.createCas(tsd, tp, fsid, props);
> >         }
> >         // deserialize the CAS
> >         getUimaSerializer().deserializeCasFromXmi((String)xmi, cas);
> >
> > repeatedly parses a type system descriptor, makes a type system from it,
> > etc.,
> > could be costly for large type systems.  Is there any reason not to do
> this
> > just once, and reuse the "cas"?
> >
> > If a new cas needs to be created, but you know the type system will
> > be the same, then you can parse and create the type system just once,
> > and pass that to the createCas, at least.
> >
> > -Marshall
> >
> >
>

Re: DuccAbstractProcessContainer impl repeating lots of unneeded work?

Posted by Eddie Epstein <ea...@gmail.com>.
Best I can tell after hunting this down is that the offending code has been
removed some time ago. Nothing calls getKey() anymore.
Eddie

On Fri, Jan 18, 2019 at 2:50 PM Marshall Schor <ms...@schor.com> wrote:

> Hi,
>
> While tracing a ducc run, I noticed that for each work item, the method:
>     /**
>      * This method is called to fetch a WorkItem ID from a given CAS which
>      * is required to support investment reset.
>      *
>      */
>     public String getKey(String xmi) throws Exception {
>         if ( analysisEngineMetadata == null ) {
>             // WorkItem ID (key) is only supported for pieces 'n parts
>             return null;
>         }
>         Properties props = new Properties();
>         props.setProperty(UIMAFramework.CAS_INITIAL_HEAP_SIZE, "1000");
>
>         TypeSystemDescription tsd = analysisEngineMetadata.getTypeSystem();
>         TypePriorities tp = analysisEngineMetadata.getTypePriorities();
>         FsIndexDescription[] fsid = analysisEngineMetadata.getFsIndexes();
>         CAS cas;
>         synchronized( CasCreationUtils.class) {
>             cas = CasCreationUtils.createCas(tsd, tp, fsid, props);
>         }
>         // deserialize the CAS
>         getUimaSerializer().deserializeCasFromXmi((String)xmi, cas);
>
> repeatedly parses a type system descriptor, makes a type system from it,
> etc.,
> could be costly for large type systems.  Is there any reason not to do this
> just once, and reuse the "cas"?
>
> If a new cas needs to be created, but you know the type system will
> be the same, then you can parse and create the type system just once,
> and pass that to the createCas, at least.
>
> -Marshall
>
>