You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Mario Gazzo <ma...@gmail.com> on 2015/06/14 14:37:01 UTC

UIMAfit analysis descriptions appear to trim String configuration parameters

Using the new gapText parameter in UIMA Ruta HTMLConverter I noticed that the string is trimmed in the pipeline aggregation process e.g. “   .   “ ends up as “.” in the pipeline and when writing the pipeline to XML. I don’t think it has anything to do with the HTMLConverter in particular. We use UIMAfit to construct the aggregated analysis engine description but I don’t know where this trimming exactly occurs. I was also able to run a small example pipeline where the trim did not happen, which was a bit of a surprise. Does anyone have an idea what could be the cause and whether its somehow controllable?

We use a branch of UIMAfit that fixed some resource binding issue and we are currently on commit e9b32e30895443b9f93fef65453593dd1533c7d0 with UIMA 2.7.

Cheers
Mario


Re: UIMAfit analysis descriptions appear to trim String configuration parameters

Posted by Mario Gazzo <ma...@gmail.com>.
Done :)

https://issues.apache.org/jira/browse/UIMA-4464 <https://issues.apache.org/jira/browse/UIMA-4464>

Just pasted excerpts from this thread into the description.

Cheers
Mario


> On 15 Jun 2015, at 08:50 , Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> As far as I know, CPE does not work with in-memory descriptors (or I never dug deep enough). So if you use CPE (e.g. through the uimaFIT CpeBuilder or otherwise), there is probably some XML serialization of the descriptors involved. 
> 
> Anyway, I think that pinpoints the problem pretty precisely and it should be easy to set up a test case for it. Would you mind opening a Jira with your findings?
> 
> Cheers,
> 
> -- Richard
> 
> On 15.06.2015, at 08:43, Mario Gazzo <ma...@gmail.com> wrote:
> 
>> I am referring to to this Github repo:
>> 
>> https://github.com/apache/uima-uimafit <https://github.com/apache/uima-uimafit>
>> 
>> Thought it was published by you as a mirror of the SVN repo or the other way around.
>> 
>> The trimming is as such not a technical issue for me right now but I felt it might become important in some other case. I just noticed it when I added ekstra spaces to improve readability of my output. Initially I thought it was the HTMLConverter but when I inspected it then I could see that it had happened somewhere before configuration parameter initialisation.
>> 
>> I then inspected the descriptor right after creation as you suggested. The value was not trimmed at that point. Later during runtime initialisation without doing any XML serialization this time, the value is trimmed inside ConfigurationManagerImplBase::getConfigParameterValue right after the lookup operation (used debugger for value inspection). This was inside a UIMA core component though but the trim occurs somewhere between descriptor creation and AE initialisation. Seems this is not an UIMAfit issue afterall.
>> 
>> I did a small example app where the HTMLAnnotator and HTMLConverter descriptors were also aggregated before execution but here the trimming did not materialise at runtime but only in the serialised XML. Then it occurred to me that my example used the SimplePipeline whereas our main application uses CPE. I then switched to the SimplePipeline and the trimming was now gone there as well. Seems that trimming only happens inside the CPE and when XML serialising the pipeline.
>> 
>> Cheers,
>> Mario
> 


Re: UIMAfit analysis descriptions appear to trim String configuration parameters

Posted by Richard Eckart de Castilho <re...@apache.org>.
As far as I know, CPE does not work with in-memory descriptors (or I never dug deep enough). So if you use CPE (e.g. through the uimaFIT CpeBuilder or otherwise), there is probably some XML serialization of the descriptors involved. 

Anyway, I think that pinpoints the problem pretty precisely and it should be easy to set up a test case for it. Would you mind opening a Jira with your findings?

Cheers,

-- Richard

On 15.06.2015, at 08:43, Mario Gazzo <ma...@gmail.com> wrote:

> I am referring to to this Github repo:
> 
> https://github.com/apache/uima-uimafit <https://github.com/apache/uima-uimafit>
> 
> Thought it was published by you as a mirror of the SVN repo or the other way around.
> 
> The trimming is as such not a technical issue for me right now but I felt it might become important in some other case. I just noticed it when I added ekstra spaces to improve readability of my output. Initially I thought it was the HTMLConverter but when I inspected it then I could see that it had happened somewhere before configuration parameter initialisation.
> 
> I then inspected the descriptor right after creation as you suggested. The value was not trimmed at that point. Later during runtime initialisation without doing any XML serialization this time, the value is trimmed inside ConfigurationManagerImplBase::getConfigParameterValue right after the lookup operation (used debugger for value inspection). This was inside a UIMA core component though but the trim occurs somewhere between descriptor creation and AE initialisation. Seems this is not an UIMAfit issue afterall.
> 
> I did a small example app where the HTMLAnnotator and HTMLConverter descriptors were also aggregated before execution but here the trimming did not materialise at runtime but only in the serialised XML. Then it occurred to me that my example used the SimplePipeline whereas our main application uses CPE. I then switched to the SimplePipeline and the trimming was now gone there as well. Seems that trimming only happens inside the CPE and when XML serialising the pipeline.
> 
> Cheers,
> Mario


Re: UIMAfit analysis descriptions appear to trim String configuration parameters

Posted by Jens Grivolla <j+...@grivolla.net>.
On Mon, Jun 15, 2015 at 8:43 AM, Mario Gazzo <ma...@gmail.com> wrote:

> I am referring to to this Github repo:
>
> https://github.com/apache/uima-uimafit <
> https://github.com/apache/uima-uimafit>
>
> Thought it was published by you as a mirror of the SVN repo or the other
> way around.
>

Yes, this is the official (one-way) mirror of the SVN repository. If you
want to be able to reference SVN commits you can look at the commit details
on Github:
https://github.com/apache/uima-uimafit/commit/e9b32e30895443b9f93fef65453593dd1533c7d0

There you see:
git-svn-id: https://svn.apache.org/repos/asf/uima/uimafit/trunk@1681410
13f79535-47bb-0310-9956-ffa450edef68

Unfortunately, the link doesn't actually work with the repository browser
at svn.apache.org, but at least the commit id should be correct. The
correspondence between commits in SVN and git is a bit complicated because
there is only one big SVN repository for all of UIMA, whereas there are
separate git repositories for the subprojects. Therefore the commit you
reference is the latest one in the uimaFIT git repository, but there are
newer commits in the UIMA SVN.

HTH,
Jens

Re: UIMAfit analysis descriptions appear to trim String configuration parameters

Posted by Mario Gazzo <ma...@gmail.com>.
I am referring to to this Github repo:

https://github.com/apache/uima-uimafit <https://github.com/apache/uima-uimafit>

Thought it was published by you as a mirror of the SVN repo or the other way around.

The trimming is as such not a technical issue for me right now but I felt it might become important in some other case. I just noticed it when I added ekstra spaces to improve readability of my output. Initially I thought it was the HTMLConverter but when I inspected it then I could see that it had happened somewhere before configuration parameter initialisation.

I then inspected the descriptor right after creation as you suggested. The value was not trimmed at that point. Later during runtime initialisation without doing any XML serialization this time, the value is trimmed inside ConfigurationManagerImplBase::getConfigParameterValue right after the lookup operation (used debugger for value inspection). This was inside a UIMA core component though but the trim occurs somewhere between descriptor creation and AE initialisation. Seems this is not an UIMAfit issue afterall.

I did a small example app where the HTMLAnnotator and HTMLConverter descriptors were also aggregated before execution but here the trimming did not materialise at runtime but only in the serialised XML. Then it occurred to me that my example used the SimplePipeline whereas our main application uses CPE. I then switched to the SimplePipeline and the trimming was now gone there as well. Seems that trimming only happens inside the CPE and when XML serialising the pipeline.


Cheers,
Mario

> On 14 Jun 2015, at 17:02 , Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> Just as a quick feedback: uimaFIT (mind capitalization) is maintained in SVN, not in git. 
> 
> https://svn.apache.org/repos/asf/uima/uimafit
> 
> I have no idea what repository you are referring to ;)
> 
> Regarding the trimming, I have no idea right away. The parameter values are passed around quite a bit.
> 
> Do you see a trimming when you create just a standalone descriptor for the HTMLConverter with uimaFIT (outside an aggregate) and when serializing that to XML? If yes, do you see the trimming before serializing it when you directly access the parameter values from the AnalysisEngineDescription that uimaFIT has created for you?
> 
> Cheers,
> 
> -- Richard
> 
> On 14.06.2015, at 14:37, Mario Gazzo <ma...@gmail.com> wrote:
> 
>> Using the new gapText parameter in UIMA Ruta HTMLConverter I noticed that the string is trimmed in the pipeline aggregation process e.g. “   .   “ ends up as “.” in the pipeline and when writing the pipeline to XML. I don’t think it has anything to do with the HTMLConverter in particular. We use UIMAfit to construct the aggregated analysis engine description but I don’t know where this trimming exactly occurs. I was also able to run a small example pipeline where the trim did not happen, which was a bit of a surprise. Does anyone have an idea what could be the cause and whether its somehow controllable?
>> 
>> We use a branch of UIMAfit that fixed some resource binding issue and we are currently on commit e9b32e30895443b9f93fef65453593dd1533c7d0 with UIMA 2.7.
>> 
>> Cheers
>> Mario
>> 
> 


Re: UIMAfit analysis descriptions appear to trim String configuration parameters

Posted by Richard Eckart de Castilho <re...@apache.org>.
Just as a quick feedback: uimaFIT (mind capitalization) is maintained in SVN, not in git. 

https://svn.apache.org/repos/asf/uima/uimafit

I have no idea what repository you are referring to ;)

Regarding the trimming, I have no idea right away. The parameter values are passed around quite a bit.

Do you see a trimming when you create just a standalone descriptor for the HTMLConverter with uimaFIT (outside an aggregate) and when serializing that to XML? If yes, do you see the trimming before serializing it when you directly access the parameter values from the AnalysisEngineDescription that uimaFIT has created for you?

Cheers,

-- Richard

On 14.06.2015, at 14:37, Mario Gazzo <ma...@gmail.com> wrote:

> Using the new gapText parameter in UIMA Ruta HTMLConverter I noticed that the string is trimmed in the pipeline aggregation process e.g. “   .   “ ends up as “.” in the pipeline and when writing the pipeline to XML. I don’t think it has anything to do with the HTMLConverter in particular. We use UIMAfit to construct the aggregated analysis engine description but I don’t know where this trimming exactly occurs. I was also able to run a small example pipeline where the trim did not happen, which was a bit of a surprise. Does anyone have an idea what could be the cause and whether its somehow controllable?
> 
> We use a branch of UIMAfit that fixed some resource binding issue and we are currently on commit e9b32e30895443b9f93fef65453593dd1533c7d0 with UIMA 2.7.
> 
> Cheers
> Mario
>