You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2008/01/28 17:51:25 UTC

Clarifying language subsumption in Result Specifications

Language specifications are in a hierarchy.  For example, from most 
inclusive to finer subsets, we have:

x-unspecified
   en
     en-us

A result spec's most common use is in a negative sense - Annotators can 
check a result spec and if it doesn't contain the type or feature, it 
can skip producing that type or feature.

For simplicity, let's consider we have only one type or feature, called TF.

If the annotator thinks it produces TF for language en-us only, and 
wants to check if should skip producing this, it calls 
containsType/Feature(TF, "en-us").  This is defined in the current impl 
to return true, if the result spec has languages x-unspecified, en, or 
en-us.

Let's consider the opposite case.  Suppose we have an annotator that can 
produce TF for "en".  Suppose the result-spec has an entry for TF only 
for the language "en-us".  Should that annotator produce results?  If it 
calls containsType/Feature(TF, "en"), it will get a "false" (current 
implementation).

After some thinking about this and some discussion (because I don't 
think I got it right, just by myself :-) ),
it seems that this is correct.  Consider the following case:
  The language of the document is "en", and the containing (top-most) 
aggregate specified explicitly it wanted
  output only for en-us.  In that case, the annotator should not produce 
any results, because the language
  of this doc is not en-us, and the assembler put together things that 
they said should only output en-us results.

This same logic seems to apply to x-unspecified:

Suppose we have an annotator that can produce TF for "x-unspecified".  
Suppose the result-spec has an entry for TF only for the language "en".  
Should that annotator produce results?  If it calls 
containsType/Feature(TF, "x-unspecified"), it should get a "false" 
(broken in the current implementation!, but was true I think in the 
previous one).

Assume the language of the document is "x-unspecified", and the 
containing (top-most) aggregate specified explicitly it wanted
output only for en.  In that case, the annotator should not produce any 
results, because the language
of this doc is not "en", and the assembler put together things that they 
said should only output "en" results.

Do others agree with this?

-Marshall

Re: Clarifying language subsumption in Result Specifications

Posted by Marshall Schor <ms...@schor.com>.

Michael Baessler wrote:
> Marshall Schor wrote:
>> Michael Baessler wrote:
>>> Marshall Schor wrote:
>>>> I tried implimenting this change, and 2 test cases fail.  They look 
>>>> like they are failing exactly in the case where the result 
>>>> specification has a TypeOrFeature with a specified type other than 
>>>> "x-unspecified", and the containsTypeOrFeature method is being 
>>>> called using the form which doesn't pass in an explicit type, so is 
>>>> being treated as if x-unspecified was passed in.
>>>> As discussed below, this should give "false", but the text cases 
>>>> expect true.
>>>>
>>>> Should I change the test cases?  The failing ones are:
>>>>
>>>> ResultSpecification_implTest:  It defines a result spec containing 
>>>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but 
>>>> not "x-unspecified".  So the call rs.containsType("FakeType") 
>>>> returns false, but the test says it should return true (because the 
>>>> set of languages for FakeType is missing x-unspecified).
>>>>
>>> Which test method you are talking about? I would like to look at.
>> The call is on line 332 of class ResultSpecification_implTest.  This 
>> changed behavior arises from the proposed change to how containsType 
>> method works:  the changed logic is:  if the language x-unspecified 
>> is given (or if no language is given, as in this case), return true 
>> only if the result specification for this type or feature includes 
>> the langauge "x-unspecified".  In this test, the result specification 
>> for the type "FakeType" is set from the component's capabilities 
>> specification, which said this component outputs FakeType for 
>> languages "en", "de", "en-US", "en-GB", but not "x-unspecified".  So 
>> with the propsed changed to how containsType works, it returns 
>> false.  But the test case expects true.
> I don't know that test, but it is fine with me to change the behavior 
> since it seems to be wrong!
>>
>>>> The other test is the PearRuntimeTest.
>>>> This test loads two Pears, runs them and then looks at the CAS result.
>>>> The descriptor for one of the tests, the TutorialDateTime 
>>>> descriptor says it output 3 types, *but for language "en"* (only, 
>>>> and not for x-unspecified in particular).
>>>>
>>>> The result spec built for the aggregate is empty (the test case has 
>>>> nothing specified here).
>>>> When it is passed down to the delegates, the setResultSpecification 
>>>> for the Pear descriptor in PearAnalysisEngineWrapper is called.  
>>>> This is not implemented, so it inherits from its super, which is 
>>>> AnalysisEngineImplBase - and this impl does nothing (expecting to 
>>>> be overridden).  I'll write this up as a Jira issue. But even if 
>>>> this were "fixed", because the outer Aggregate had nothing 
>>>> specified in its capability, the inner primitive analysis engine is 
>>>> set up initially with a "default" result spec, which is its own 
>>>> output capabilities.  This spec says it should produce results just 
>>>> for "en", and in particular it should *not* produce output for 
>>>> x-unspecified.  This annotator is written to respect the result 
>>>> spec, so it doesn't produce anything.
>>>>
>>> The PearRuntimeTest does not use to capabilityLanguageFlow so we 
>>> have a different behavior there!
>> This test is just testing if the component's behavior with respect to 
>> using the result specification; I don't think it has anything to do 
>> with the capabilityLanguageFlow?
> So you mean that the computation of the default result spec does not 
> work correctly, since it is not implemented correctly? If that is 
> true, please go ahead and fix it. I was not aware of that. Thanks for 
> catching it!
This has been entered as Jira-727.  Not fixed yet (or assigned).
-Marshall
>
> -- Michael
>
>
>

Re: Clarifying language subsumption in Result Specifications

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> I tried implimenting this change, and 2 test cases fail.  They look 
>>> like they are failing exactly in the case where the result 
>>> specification has a TypeOrFeature with a specified type other than 
>>> "x-unspecified", and the containsTypeOrFeature method is being 
>>> called using the form which doesn't pass in an explicit type, so is 
>>> being treated as if x-unspecified was passed in.
>>> As discussed below, this should give "false", but the text cases 
>>> expect true.
>>>
>>> Should I change the test cases?  The failing ones are:
>>>
>>> ResultSpecification_implTest:  It defines a result spec containing 
>>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but 
>>> not "x-unspecified".  So the call rs.containsType("FakeType") 
>>> returns false, but the test says it should return true (because the 
>>> set of languages for FakeType is missing x-unspecified).
>>>
>> Which test method you are talking about? I would like to look at.
> The call is on line 332 of class ResultSpecification_implTest.  This 
> changed behavior arises from the proposed change to how containsType 
> method works:  the changed logic is:  if the language x-unspecified is 
> given (or if no language is given, as in this case), return true only 
> if the result specification for this type or feature includes the 
> langauge "x-unspecified".  In this test, the result specification for 
> the type "FakeType" is set from the component's capabilities 
> specification, which said this component outputs FakeType for 
> languages "en", "de", "en-US", "en-GB", but not "x-unspecified".  So 
> with the propsed changed to how containsType works, it returns false.  
> But the test case expects true.
I don't know that test, but it is fine with me to change the behavior 
since it seems to be wrong!
>
>>> The other test is the PearRuntimeTest.
>>> This test loads two Pears, runs them and then looks at the CAS result.
>>> The descriptor for one of the tests, the TutorialDateTime descriptor 
>>> says it output 3 types, *but for language "en"* (only, and not for 
>>> x-unspecified in particular).
>>>
>>> The result spec built for the aggregate is empty (the test case has 
>>> nothing specified here).
>>> When it is passed down to the delegates, the setResultSpecification 
>>> for the Pear descriptor in PearAnalysisEngineWrapper is called.  
>>> This is not implemented, so it inherits from its super, which is 
>>> AnalysisEngineImplBase - and this impl does nothing (expecting to be 
>>> overridden).  I'll write this up as a Jira issue. But even if this 
>>> were "fixed", because the outer Aggregate had nothing specified in 
>>> its capability, the inner primitive analysis engine is set up 
>>> initially with a "default" result spec, which is its own output 
>>> capabilities.  This spec says it should produce results just for 
>>> "en", and in particular it should *not* produce output for 
>>> x-unspecified.  This annotator is written to respect the result 
>>> spec, so it doesn't produce anything.
>>>
>> The PearRuntimeTest does not use to capabilityLanguageFlow so we have 
>> a different behavior there!
> This test is just testing if the component's behavior with respect to 
> using the result specification; I don't think it has anything to do 
> with the capabilityLanguageFlow?
So you mean that the computation of the default result spec does not 
work correctly, since it is not implemented correctly? If that is true, 
please go ahead and fix it. I was not aware of that. Thanks for catching it!

-- Michael

Re: Clarifying language subsumption in Result Specifications

Posted by Marshall Schor <ms...@schor.com>.

Michael Baessler wrote:
> Marshall Schor wrote:
>> I tried implimenting this change, and 2 test cases fail.  They look 
>> like they are failing exactly in the case where the result 
>> specification has a TypeOrFeature with a specified type other than 
>> "x-unspecified", and the containsTypeOrFeature method is being called 
>> using the form which doesn't pass in an explicit type, so is being 
>> treated as if x-unspecified was passed in.
>> As discussed below, this should give "false", but the text cases 
>> expect true.
>>
>> Should I change the test cases?  The failing ones are:
>>
>> ResultSpecification_implTest:  It defines a result spec containing 
>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but 
>> not "x-unspecified".  So the call rs.containsType("FakeType") returns 
>> false, but the test says it should return true (because the set of 
>> languages for FakeType is missing x-unspecified).
>>
> Which test method you are talking about? I would like to look at.
The call is on line 332 of class ResultSpecification_implTest.  This 
changed behavior arises from the proposed change to how containsType 
method works:  the changed logic is:  if the language x-unspecified is 
given (or if no language is given, as in this case), return true only if 
the result specification for this type or feature includes the langauge 
"x-unspecified".  In this test, the result specification for the type 
"FakeType" is set from the component's capabilities specification, which 
said this component outputs FakeType for languages "en", "de", "en-US", 
"en-GB", but not "x-unspecified".  So with the propsed changed to how 
containsType works, it returns false.  But the test case expects true.

>> The other test is the PearRuntimeTest.
>> This test loads two Pears, runs them and then looks at the CAS result.
>> The descriptor for one of the tests, the TutorialDateTime descriptor 
>> says it output 3 types, *but for language "en"* (only, and not for 
>> x-unspecified in particular).
>>
>> The result spec built for the aggregate is empty (the test case has 
>> nothing specified here).
>> When it is passed down to the delegates, the setResultSpecification 
>> for the Pear descriptor in PearAnalysisEngineWrapper is called.  This 
>> is not implemented, so it inherits from its super, which is 
>> AnalysisEngineImplBase - and this impl does nothing (expecting to be 
>> overridden).  I'll write this up as a Jira issue. But even if this 
>> were "fixed", because the outer Aggregate had nothing specified in 
>> its capability, the inner primitive analysis engine is set up 
>> initially with a "default" result spec, which is its own output 
>> capabilities.  This spec says it should produce results just for 
>> "en", and in particular it should *not* produce output for 
>> x-unspecified.  This annotator is written to respect the result spec, 
>> so it doesn't produce anything.
>>
> The PearRuntimeTest does not use to capabilityLanguageFlow so we have 
> a different behavior there!
This test is just testing if the component's behavior with respect to 
using the result specification; I don't think it has anything to do with 
the capabilityLanguageFlow?

-Marshall
>
> -- Michael
>
>

Re: Clarifying language subsumption in Result Specifications

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> I tried implimenting this change, and 2 test cases fail.  They look 
> like they are failing exactly in the case where the result 
> specification has a TypeOrFeature with a specified type other than 
> "x-unspecified", and the containsTypeOrFeature method is being called 
> using the form which doesn't pass in an explicit type, so is being 
> treated as if x-unspecified was passed in.
> As discussed below, this should give "false", but the text cases 
> expect true.
>
> Should I change the test cases?  The failing ones are:
>
> ResultSpecification_implTest:  It defines a result spec containing the 
> type "FakeType" for languages "en", "de", "en-US", "en-GB", but not 
> "x-unspecified".  So the call rs.containsType("FakeType") returns 
> false, but the test says it should return true (because the set of 
> languages for FakeType is missing x-unspecified).
>
Which test method you are talking about? I would like to look at.
> The other test is the PearRuntimeTest.
> This test loads two Pears, runs them and then looks at the CAS result.
> The descriptor for one of the tests, the TutorialDateTime descriptor 
> says it output 3 types, *but for language "en"* (only, and not for 
> x-unspecified in particular).
>
> The result spec built for the aggregate is empty (the test case has 
> nothing specified here).
> When it is passed down to the delegates, the setResultSpecification 
> for the Pear descriptor in PearAnalysisEngineWrapper is called.  This 
> is not implemented, so it inherits from its super, which is 
> AnalysisEngineImplBase - and this impl does nothing (expecting to be 
> overridden).  I'll write this up as a Jira issue. 
> But even if this were "fixed", because the outer Aggregate had nothing 
> specified in its capability, the inner primitive analysis engine is 
> set up initially with a "default" result spec, which is its own output 
> capabilities.  This spec says it should produce results just for "en", 
> and in particular it should *not* produce output for x-unspecified.  
> This annotator is written to respect the result spec, so it doesn't 
> produce anything.
>
The PearRuntimeTest does not use to capabilityLanguageFlow so we have a 
different behavior there!

-- Michael

Re: Clarifying language subsumption in Result Specifications

Posted by Marshall Schor <ms...@schor.com>.

I tried implimenting this change, and 2 test cases fail.  They look like 
they are failing exactly in the case where the result specification has 
a TypeOrFeature with a specified type other than "x-unspecified", and 
the containsTypeOrFeature method is being called using the form which 
doesn't pass in an explicit type, so is being treated as if 
x-unspecified was passed in. 

As discussed below, this should give "false", but the text cases expect 
true.

Should I change the test cases?  The failing ones are:

ResultSpecification_implTest:  It defines a result spec containing the 
type "FakeType" for languages "en", "de", "en-US", "en-GB", but not 
"x-unspecified".  So the call rs.containsType("FakeType") returns false, 
but the test says it should return true (because the set of languages 
for FakeType is missing x-unspecified).

The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.
The descriptor for one of the tests, the TutorialDateTime descriptor 
says it output 3 types, *but for language "en"* (only, and not for 
x-unspecified in particular).

The result spec built for the aggregate is empty (the test case has 
nothing specified here). 

When it is passed down to the delegates, the setResultSpecification for 
the Pear descriptor in PearAnalysisEngineWrapper is called.  This is not 
implemented, so it inherits from its super, which is 
AnalysisEngineImplBase - and this impl does nothing (expecting to be 
overridden).  I'll write this up as a Jira issue.  

But even if this were "fixed", because the outer Aggregate had nothing 
specified in its capability, the inner primitive analysis engine is set 
up initially with a "default" result spec, which is its own output 
capabilities.  This spec says it should produce results just for "en", 
and in particular it should *not* produce output for x-unspecified.  
This annotator is written to respect the result spec, so it doesn't 
produce anything.

Anyone object to my changing the test cases?

-Marshall

Marshall Schor wrote:
> Language specifications are in a hierarchy.  For example, from most 
> inclusive to finer subsets, we have:
>
> x-unspecified
>   en
>     en-us
>
> A result spec's most common use is in a negative sense - Annotators 
> can check a result spec and if it doesn't contain the type or feature, 
> it can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature, 
> called TF.
>
> If the annotator thinks it produces TF for language en-us only, and 
> wants to check if should skip producing this, it calls 
> containsType/Feature(TF, "en-us").  This is defined in the current 
> impl to return true, if the result spec has languages x-unspecified, 
> en, or en-us.
>
> Let's consider the opposite case.  Suppose we have an annotator that 
> can produce TF for "en".  Suppose the result-spec has an entry for TF 
> only for the language "en-us".  Should that annotator produce 
> results?  If it calls containsType/Feature(TF, "en"), it will get a 
> "false" (current implementation).
>
> After some thinking about this and some discussion (because I don't 
> think I got it right, just by myself :-) ),
> it seems that this is correct.  Consider the following case:
>  The language of the document is "en", and the containing (top-most) 
> aggregate specified explicitly it wanted
>  output only for en-us.  In that case, the annotator should not 
> produce any results, because the language
>  of this doc is not en-us, and the assembler put together things that 
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".  
> Suppose the result-spec has an entry for TF only for the language 
> "en".  Should that annotator produce results?  If it calls 
> containsType/Feature(TF, "x-unspecified"), it should get a "false" 
> (broken in the current implementation!, but was true I think in the 
> previous one).
>
> Assume the language of the document is "x-unspecified", and the 
> containing (top-most) aggregate specified explicitly it wanted
> output only for en.  In that case, the annotator should not produce 
> any results, because the language
> of this doc is not "en", and the assembler put together things that 
> they said should only output "en" results.
>
> Do others agree with this?
>
> -Marshall
>
>

Re: Clarifying language subsumption in Result Specifications

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Michael Baessler wrote:
>>>> Marshall Schor wrote:
>>>>> Language specifications are in a hierarchy.  For example, from 
>>>>> most inclusive to finer subsets, we have:
>>>>>
>>>>> x-unspecified
>>>>>   en
>>>>>     en-us
>>>>>
>>>>> A result spec's most common use is in a negative sense - 
>>>>> Annotators can check a result spec and if it doesn't contain the 
>>>>> type or feature, it can skip producing that type or feature.
>>>>>
>>>>> For simplicity, let's consider we have only one type or feature, 
>>>>> called TF.
>>>>>
>>>>> If the annotator thinks it produces TF for language en-us only, 
>>>>> and wants to check if should skip producing this, it calls 
>>>>> containsType/Feature(TF, "en-us").  This is defined in the current 
>>>>> impl to return true, if the result spec has languages 
>>>>> x-unspecified, en, or en-us.
>>>>>
>>>>> Let's consider the opposite case.  Suppose we have an annotator 
>>>>> that can produce TF for "en".  Suppose the result-spec has an 
>>>>> entry for TF only for the language "en-us".  Should that annotator 
>>>>> produce results?  If it calls containsType/Feature(TF, "en"), it 
>>>>> will get a "false" (current implementation).
>>>>>
>>>>> After some thinking about this and some discussion (because I 
>>>>> don't think I got it right, just by myself :-) ),
>>>>> it seems that this is correct.  Consider the following case:
>>>>>  The language of the document is "en", and the containing 
>>>>> (top-most) aggregate specified explicitly it wanted
>>>>>  output only for en-us.  In that case, the annotator should not 
>>>>> produce any results, because the language
>>>>>  of this doc is not en-us, and the assembler put together things 
>>>>> that they said should only output en-us results.
>>>>>
>>>>> This same logic seems to apply to x-unspecified:
>>>>>
>>>>> Suppose we have an annotator that can produce TF for 
>>>>> "x-unspecified".  Suppose the result-spec has an entry for TF only 
>>>>> for the language "en".  Should that annotator produce results?  If 
>>>>> it calls containsType/Feature(TF, "x-unspecified"), it should get 
>>>>> a "false" (broken in the current implementation!, but was true I 
>>>>> think in the previous one).
>>>> I'm not sure you are right here. I think if an annotator can 
>>>> produce TF for "x-unspecified" that means that it can produce TF 
>>>> for all languages. So if an "en" document comes in the annotator 
>>>> should produce a result.
>>> hmmm, this seems to contradict your statement below, saying "That 
>>> case is correct".
>>>
>>> In the example below, the result-spec passed in to the annotator has 
>>> only "en", not "x-unspecified".  This is the case proposed in my 
>>> paragraph.  Below you say it is right for the annotator to *not* 
>>> produce results, while above you say it should produce results.  
>>> This is inconsistent, unless I've mangled something...   Can you 
>>> clarify?
>>>
>>> -Marshall
>>>>>
>>>>> Assume the language of the document is "x-unspecified", and the 
>>>>> containing (top-most) aggregate specified explicitly it wanted
>>>>> output only for en.  In that case, the annotator should not 
>>>>> produce any results, because the language
>>>>> of this doc is not "en", and the assembler put together things 
>>>>> that they said should only output "en" results.
>>>>>
>>>> That case is correct.
>>>>
>>>> -- Michael
>>>>
>>>>
>>>
>> Maybe the confusion comes from the different treatment of 
>> "x-unspecified". If "x-unspecified" is specified in the output spec 
>> of an annotator it means that it can produce results for all languages. 
> True - and that works.  But that wasn't the case I was trying to 
> describe - I was trying to describe the opposite case:  The case  
> where the *output spec* of an annotator is *missing* the "x-unspecified".
> To restate the case:  The output spec has "en" (only), and the 
> annotator, when running, queries the result spec with 
> "x-unspecified".  This proposal says in that case, containsType should 
> return false.  Do you agree this should be the result in this case?  
> It seems you do above when you say "That case is correct", but 
> disagree in the paragraph where you say "I'm not sure you are right 
> here.".
> Perhaps I have not clearly described the two cases, but I think they 
> are the same case (and therefore need to have the same answer ;-) )
OK seems I did not understand the two cases correctly. :-)
Yes it is true that no results should be produced when the output spec 
for the annotator has only "en" and the document language is 
"x-unspecified".

-- Michael

Re: Clarifying language subsumption in Result Specifications

Posted by Marshall Schor <ms...@schor.com>.

Michael Baessler wrote:
> Marshall Schor wrote:
>> Michael Baessler wrote:
>>> Marshall Schor wrote:
>>>> Language specifications are in a hierarchy.  For example, from most 
>>>> inclusive to finer subsets, we have:
>>>>
>>>> x-unspecified
>>>>   en
>>>>     en-us
>>>>
>>>> A result spec's most common use is in a negative sense - Annotators 
>>>> can check a result spec and if it doesn't contain the type or 
>>>> feature, it can skip producing that type or feature.
>>>>
>>>> For simplicity, let's consider we have only one type or feature, 
>>>> called TF.
>>>>
>>>> If the annotator thinks it produces TF for language en-us only, and 
>>>> wants to check if should skip producing this, it calls 
>>>> containsType/Feature(TF, "en-us").  This is defined in the current 
>>>> impl to return true, if the result spec has languages 
>>>> x-unspecified, en, or en-us.
>>>>
>>>> Let's consider the opposite case.  Suppose we have an annotator 
>>>> that can produce TF for "en".  Suppose the result-spec has an entry 
>>>> for TF only for the language "en-us".  Should that annotator 
>>>> produce results?  If it calls containsType/Feature(TF, "en"), it 
>>>> will get a "false" (current implementation).
>>>>
>>>> After some thinking about this and some discussion (because I don't 
>>>> think I got it right, just by myself :-) ),
>>>> it seems that this is correct.  Consider the following case:
>>>>  The language of the document is "en", and the containing 
>>>> (top-most) aggregate specified explicitly it wanted
>>>>  output only for en-us.  In that case, the annotator should not 
>>>> produce any results, because the language
>>>>  of this doc is not en-us, and the assembler put together things 
>>>> that they said should only output en-us results.
>>>>
>>>> This same logic seems to apply to x-unspecified:
>>>>
>>>> Suppose we have an annotator that can produce TF for 
>>>> "x-unspecified".  Suppose the result-spec has an entry for TF only 
>>>> for the language "en".  Should that annotator produce results?  If 
>>>> it calls containsType/Feature(TF, "x-unspecified"), it should get a 
>>>> "false" (broken in the current implementation!, but was true I 
>>>> think in the previous one).
>>> I'm not sure you are right here. I think if an annotator can produce 
>>> TF for "x-unspecified" that means that it can produce TF for all 
>>> languages. So if an "en" document comes in the annotator should 
>>> produce a result.
>> hmmm, this seems to contradict your statement below, saying "That 
>> case is correct".
>>
>> In the example below, the result-spec passed in to the annotator has 
>> only "en", not "x-unspecified".  This is the case proposed in my 
>> paragraph.  Below you say it is right for the annotator to *not* 
>> produce results, while above you say it should produce results.  This 
>> is inconsistent, unless I've mangled something...   Can you clarify?
>>
>> -Marshall
>>>>
>>>> Assume the language of the document is "x-unspecified", and the 
>>>> containing (top-most) aggregate specified explicitly it wanted
>>>> output only for en.  In that case, the annotator should not produce 
>>>> any results, because the language
>>>> of this doc is not "en", and the assembler put together things that 
>>>> they said should only output "en" results.
>>>>
>>> That case is correct.
>>>
>>> -- Michael
>>>
>>>
>>
> Maybe the confusion comes from the different treatment of 
> "x-unspecified". If "x-unspecified" is specified in the output spec of 
> an annotator it means that it can produce results for all languages. 
True - and that works.  But that wasn't the case I was trying to 
describe - I was trying to describe the opposite case:  The case  where 
the *output spec* of an annotator is *missing* the "x-unspecified". 

To restate the case:  The output spec has "en" (only), and the 
annotator, when running, queries the result spec with "x-unspecified".  
This proposal says in that case, containsType should return false.  Do 
you agree this should be the result in this case?  It seems you do above 
when you say "That case is correct", but disagree in the paragraph where 
you say "I'm not sure you are right here.". 

Perhaps I have not clearly described the two cases, but I think they are 
the same case (and therefore need to have the same answer ;-) ) 

-Marshall

>
> -- Michael
>
>
>

Re: Clarifying language subsumption in Result Specifications

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Language specifications are in a hierarchy.  For example, from most 
>>> inclusive to finer subsets, we have:
>>>
>>> x-unspecified
>>>   en
>>>     en-us
>>>
>>> A result spec's most common use is in a negative sense - Annotators 
>>> can check a result spec and if it doesn't contain the type or 
>>> feature, it can skip producing that type or feature.
>>>
>>> For simplicity, let's consider we have only one type or feature, 
>>> called TF.
>>>
>>> If the annotator thinks it produces TF for language en-us only, and 
>>> wants to check if should skip producing this, it calls 
>>> containsType/Feature(TF, "en-us").  This is defined in the current 
>>> impl to return true, if the result spec has languages x-unspecified, 
>>> en, or en-us.
>>>
>>> Let's consider the opposite case.  Suppose we have an annotator that 
>>> can produce TF for "en".  Suppose the result-spec has an entry for 
>>> TF only for the language "en-us".  Should that annotator produce 
>>> results?  If it calls containsType/Feature(TF, "en"), it will get a 
>>> "false" (current implementation).
>>>
>>> After some thinking about this and some discussion (because I don't 
>>> think I got it right, just by myself :-) ),
>>> it seems that this is correct.  Consider the following case:
>>>  The language of the document is "en", and the containing (top-most) 
>>> aggregate specified explicitly it wanted
>>>  output only for en-us.  In that case, the annotator should not 
>>> produce any results, because the language
>>>  of this doc is not en-us, and the assembler put together things 
>>> that they said should only output en-us results.
>>>
>>> This same logic seems to apply to x-unspecified:
>>>
>>> Suppose we have an annotator that can produce TF for 
>>> "x-unspecified".  Suppose the result-spec has an entry for TF only 
>>> for the language "en".  Should that annotator produce results?  If 
>>> it calls containsType/Feature(TF, "x-unspecified"), it should get a 
>>> "false" (broken in the current implementation!, but was true I think 
>>> in the previous one).
>> I'm not sure you are right here. I think if an annotator can produce 
>> TF for "x-unspecified" that means that it can produce TF for all 
>> languages. So if an "en" document comes in the annotator should 
>> produce a result.
> hmmm, this seems to contradict your statement below, saying "That case 
> is correct".
>
> In the example below, the result-spec passed in to the annotator has 
> only "en", not "x-unspecified".  This is the case proposed in my 
> paragraph.  Below you say it is right for the annotator to *not* 
> produce results, while above you say it should produce results.  This 
> is inconsistent, unless I've mangled something...   Can you clarify?
>
> -Marshall
>>>
>>> Assume the language of the document is "x-unspecified", and the 
>>> containing (top-most) aggregate specified explicitly it wanted
>>> output only for en.  In that case, the annotator should not produce 
>>> any results, because the language
>>> of this doc is not "en", and the assembler put together things that 
>>> they said should only output "en" results.
>>>
>> That case is correct.
>>
>> -- Michael
>>
>>
>
Maybe the confusion comes from the different treatment of 
"x-unspecified". If "x-unspecified" is specified in the output spec of 
an annotator it means that it can produce results for all languages. So 
if the document language is "en" or "de" or "x-unspecified" the 
annotator produce results.
If the output spec of an annotator only has "en" for type TF, the 
annotator only produce results if the document language is "en" or 
"en-US" but not if it is "x-unspecified".

Does this help?

-- Michael

Re: Clarifying language subsumption in Result Specifications

Posted by Marshall Schor <ms...@schor.com>.

Michael Baessler wrote:
> Marshall Schor wrote:
>> Language specifications are in a hierarchy.  For example, from most 
>> inclusive to finer subsets, we have:
>>
>> x-unspecified
>>   en
>>     en-us
>>
>> A result spec's most common use is in a negative sense - Annotators 
>> can check a result spec and if it doesn't contain the type or 
>> feature, it can skip producing that type or feature.
>>
>> For simplicity, let's consider we have only one type or feature, 
>> called TF.
>>
>> If the annotator thinks it produces TF for language en-us only, and 
>> wants to check if should skip producing this, it calls 
>> containsType/Feature(TF, "en-us").  This is defined in the current 
>> impl to return true, if the result spec has languages x-unspecified, 
>> en, or en-us.
>>
>> Let's consider the opposite case.  Suppose we have an annotator that 
>> can produce TF for "en".  Suppose the result-spec has an entry for TF 
>> only for the language "en-us".  Should that annotator produce 
>> results?  If it calls containsType/Feature(TF, "en"), it will get a 
>> "false" (current implementation).
>>
>> After some thinking about this and some discussion (because I don't 
>> think I got it right, just by myself :-) ),
>> it seems that this is correct.  Consider the following case:
>>  The language of the document is "en", and the containing (top-most) 
>> aggregate specified explicitly it wanted
>>  output only for en-us.  In that case, the annotator should not 
>> produce any results, because the language
>>  of this doc is not en-us, and the assembler put together things that 
>> they said should only output en-us results.
>>
>> This same logic seems to apply to x-unspecified:
>>
>> Suppose we have an annotator that can produce TF for 
>> "x-unspecified".  Suppose the result-spec has an entry for TF only 
>> for the language "en".  Should that annotator produce results?  If it 
>> calls containsType/Feature(TF, "x-unspecified"), it should get a 
>> "false" (broken in the current implementation!, but was true I think 
>> in the previous one).
> I'm not sure you are right here. I think if an annotator can produce 
> TF for "x-unspecified" that means that it can produce TF for all 
> languages. So if an "en" document comes in the annotator should 
> produce a result.
hmmm, this seems to contradict your statement below, saying "That case 
is correct".

In the example below, the result-spec passed in to the annotator has 
only "en", not "x-unspecified".  This is the case proposed in my 
paragraph.  Below you say it is right for the annotator to *not* produce 
results, while above you say it should produce results.  This is 
inconsistent, unless I've mangled something...   Can you clarify?

-Marshall
>>
>> Assume the language of the document is "x-unspecified", and the 
>> containing (top-most) aggregate specified explicitly it wanted
>> output only for en.  In that case, the annotator should not produce 
>> any results, because the language
>> of this doc is not "en", and the assembler put together things that 
>> they said should only output "en" results.
>>
> That case is correct.
>
> -- Michael
>
>

Re: Clarifying language subsumption in Result Specifications

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> Language specifications are in a hierarchy.  For example, from most 
> inclusive to finer subsets, we have:
>
> x-unspecified
>   en
>     en-us
>
> A result spec's most common use is in a negative sense - Annotators 
> can check a result spec and if it doesn't contain the type or feature, 
> it can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature, 
> called TF.
>
> If the annotator thinks it produces TF for language en-us only, and 
> wants to check if should skip producing this, it calls 
> containsType/Feature(TF, "en-us").  This is defined in the current 
> impl to return true, if the result spec has languages x-unspecified, 
> en, or en-us.
>
> Let's consider the opposite case.  Suppose we have an annotator that 
> can produce TF for "en".  Suppose the result-spec has an entry for TF 
> only for the language "en-us".  Should that annotator produce 
> results?  If it calls containsType/Feature(TF, "en"), it will get a 
> "false" (current implementation).
>
> After some thinking about this and some discussion (because I don't 
> think I got it right, just by myself :-) ),
> it seems that this is correct.  Consider the following case:
>  The language of the document is "en", and the containing (top-most) 
> aggregate specified explicitly it wanted
>  output only for en-us.  In that case, the annotator should not 
> produce any results, because the language
>  of this doc is not en-us, and the assembler put together things that 
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".  
> Suppose the result-spec has an entry for TF only for the language 
> "en".  Should that annotator produce results?  If it calls 
> containsType/Feature(TF, "x-unspecified"), it should get a "false" 
> (broken in the current implementation!, but was true I think in the 
> previous one).
I'm not sure you are right here. I think if an annotator can produce TF 
for "x-unspecified" that means that it can produce TF for all languages. 
So if an "en" document comes in the annotator should produce a result.
>
> Assume the language of the document is "x-unspecified", and the 
> containing (top-most) aggregate specified explicitly it wanted
> output only for en.  In that case, the annotator should not produce 
> any results, because the language
> of this doc is not "en", and the assembler put together things that 
> they said should only output "en" results.
>
That case is correct.

-- Michael

Re: Clarifying language subsumption in Result Specifications

Posted by Adam Lally <al...@alum.rpi.edu>.

Seems right to me.
  -Adam

On Jan 28, 2008 11:51 AM, Marshall Schor <ms...@schor.com> wrote:
> Language specifications are in a hierarchy.  For example, from most
> inclusive to finer subsets, we have:
>
> x-unspecified
>    en
>      en-us
>
> A result spec's most common use is in a negative sense - Annotators can
> check a result spec and if it doesn't contain the type or feature, it
> can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature, called TF.
>
> If the annotator thinks it produces TF for language en-us only, and
> wants to check if should skip producing this, it calls
> containsType/Feature(TF, "en-us").  This is defined in the current impl
> to return true, if the result spec has languages x-unspecified, en, or
> en-us.
>
> Let's consider the opposite case.  Suppose we have an annotator that can
> produce TF for "en".  Suppose the result-spec has an entry for TF only
> for the language "en-us".  Should that annotator produce results?  If it
> calls containsType/Feature(TF, "en"), it will get a "false" (current
> implementation).
>
> After some thinking about this and some discussion (because I don't
> think I got it right, just by myself :-) ),
> it seems that this is correct.  Consider the following case:
>   The language of the document is "en", and the containing (top-most)
> aggregate specified explicitly it wanted
>   output only for en-us.  In that case, the annotator should not produce
> any results, because the language
>   of this doc is not en-us, and the assembler put together things that
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".
> Suppose the result-spec has an entry for TF only for the language "en".
> Should that annotator produce results?  If it calls
> containsType/Feature(TF, "x-unspecified"), it should get a "false"
> (broken in the current implementation!, but was true I think in the
> previous one).
>
> Assume the language of the document is "x-unspecified", and the
> containing (top-most) aggregate specified explicitly it wanted
> output only for en.  In that case, the annotator should not produce any
> results, because the language
> of this doc is not "en", and the assembler put together things that they
> said should only output "en" results.
>
> Do others agree with this?
>
> -Marshall
>