You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2008/01/28 17:51:25 UTC
Clarifying language subsumption in Result Specifications
Language specifications are in a hierarchy. For example, from most
inclusive to finer subsets, we have:
x-unspecified
en
en-us
A result spec's most common use is in a negative sense - Annotators can
check a result spec and if it doesn't contain the type or feature, it
can skip producing that type or feature.
For simplicity, let's consider we have only one type or feature, called TF.
If the annotator thinks it produces TF for language en-us only, and
wants to check if should skip producing this, it calls
containsType/Feature(TF, "en-us"). This is defined in the current impl
to return true, if the result spec has languages x-unspecified, en, or
en-us.
Let's consider the opposite case. Suppose we have an annotator that can
produce TF for "en". Suppose the result-spec has an entry for TF only
for the language "en-us". Should that annotator produce results? If it
calls containsType/Feature(TF, "en"), it will get a "false" (current
implementation).
After some thinking about this and some discussion (because I don't
think I got it right, just by myself :-) ),
it seems that this is correct. Consider the following case:
The language of the document is "en", and the containing (top-most)
aggregate specified explicitly it wanted
output only for en-us. In that case, the annotator should not produce
any results, because the language
of this doc is not en-us, and the assembler put together things that
they said should only output en-us results.
This same logic seems to apply to x-unspecified:
Suppose we have an annotator that can produce TF for "x-unspecified".
Suppose the result-spec has an entry for TF only for the language "en".
Should that annotator produce results? If it calls
containsType/Feature(TF, "x-unspecified"), it should get a "false"
(broken in the current implementation!, but was true I think in the
previous one).
Assume the language of the document is "x-unspecified", and the
containing (top-most) aggregate specified explicitly it wanted
output only for en. In that case, the annotator should not produce any
results, because the language
of this doc is not "en", and the assembler put together things that they
said should only output "en" results.
Do others agree with this?
-Marshall
Re: Clarifying language subsumption in Result Specifications
Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Marshall Schor wrote:
>> Michael Baessler wrote:
>>> Marshall Schor wrote:
>>>> I tried implimenting this change, and 2 test cases fail. They look
>>>> like they are failing exactly in the case where the result
>>>> specification has a TypeOrFeature with a specified type other than
>>>> "x-unspecified", and the containsTypeOrFeature method is being
>>>> called using the form which doesn't pass in an explicit type, so is
>>>> being treated as if x-unspecified was passed in.
>>>> As discussed below, this should give "false", but the text cases
>>>> expect true.
>>>>
>>>> Should I change the test cases? The failing ones are:
>>>>
>>>> ResultSpecification_implTest: It defines a result spec containing
>>>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but
>>>> not "x-unspecified". So the call rs.containsType("FakeType")
>>>> returns false, but the test says it should return true (because the
>>>> set of languages for FakeType is missing x-unspecified).
>>>>
>>> Which test method you are talking about? I would like to look at.
>> The call is on line 332 of class ResultSpecification_implTest. This
>> changed behavior arises from the proposed change to how containsType
>> method works: the changed logic is: if the language x-unspecified
>> is given (or if no language is given, as in this case), return true
>> only if the result specification for this type or feature includes
>> the langauge "x-unspecified". In this test, the result specification
>> for the type "FakeType" is set from the component's capabilities
>> specification, which said this component outputs FakeType for
>> languages "en", "de", "en-US", "en-GB", but not "x-unspecified". So
>> with the propsed changed to how containsType works, it returns
>> false. But the test case expects true.
> I don't know that test, but it is fine with me to change the behavior
> since it seems to be wrong!
>>
>>>> The other test is the PearRuntimeTest.
>>>> This test loads two Pears, runs them and then looks at the CAS result.
>>>> The descriptor for one of the tests, the TutorialDateTime
>>>> descriptor says it output 3 types, *but for language "en"* (only,
>>>> and not for x-unspecified in particular).
>>>>
>>>> The result spec built for the aggregate is empty (the test case has
>>>> nothing specified here).
>>>> When it is passed down to the delegates, the setResultSpecification
>>>> for the Pear descriptor in PearAnalysisEngineWrapper is called.
>>>> This is not implemented, so it inherits from its super, which is
>>>> AnalysisEngineImplBase - and this impl does nothing (expecting to
>>>> be overridden). I'll write this up as a Jira issue. But even if
>>>> this were "fixed", because the outer Aggregate had nothing
>>>> specified in its capability, the inner primitive analysis engine is
>>>> set up initially with a "default" result spec, which is its own
>>>> output capabilities. This spec says it should produce results just
>>>> for "en", and in particular it should *not* produce output for
>>>> x-unspecified. This annotator is written to respect the result
>>>> spec, so it doesn't produce anything.
>>>>
>>> The PearRuntimeTest does not use to capabilityLanguageFlow so we
>>> have a different behavior there!
>> This test is just testing if the component's behavior with respect to
>> using the result specification; I don't think it has anything to do
>> with the capabilityLanguageFlow?
> So you mean that the computation of the default result spec does not
> work correctly, since it is not implemented correctly? If that is
> true, please go ahead and fix it. I was not aware of that. Thanks for
> catching it!
This has been entered as Jira-727. Not fixed yet (or assigned).
-Marshall
>
> -- Michael
>
>
>
Re: Clarifying language subsumption in Result Specifications
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> I tried implimenting this change, and 2 test cases fail. They look
>>> like they are failing exactly in the case where the result
>>> specification has a TypeOrFeature with a specified type other than
>>> "x-unspecified", and the containsTypeOrFeature method is being
>>> called using the form which doesn't pass in an explicit type, so is
>>> being treated as if x-unspecified was passed in.
>>> As discussed below, this should give "false", but the text cases
>>> expect true.
>>>
>>> Should I change the test cases? The failing ones are:
>>>
>>> ResultSpecification_implTest: It defines a result spec containing
>>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but
>>> not "x-unspecified". So the call rs.containsType("FakeType")
>>> returns false, but the test says it should return true (because the
>>> set of languages for FakeType is missing x-unspecified).
>>>
>> Which test method you are talking about? I would like to look at.
> The call is on line 332 of class ResultSpecification_implTest. This
> changed behavior arises from the proposed change to how containsType
> method works: the changed logic is: if the language x-unspecified is
> given (or if no language is given, as in this case), return true only
> if the result specification for this type or feature includes the
> langauge "x-unspecified". In this test, the result specification for
> the type "FakeType" is set from the component's capabilities
> specification, which said this component outputs FakeType for
> languages "en", "de", "en-US", "en-GB", but not "x-unspecified". So
> with the propsed changed to how containsType works, it returns false.
> But the test case expects true.
I don't know that test, but it is fine with me to change the behavior
since it seems to be wrong!
>
>>> The other test is the PearRuntimeTest.
>>> This test loads two Pears, runs them and then looks at the CAS result.
>>> The descriptor for one of the tests, the TutorialDateTime descriptor
>>> says it output 3 types, *but for language "en"* (only, and not for
>>> x-unspecified in particular).
>>>
>>> The result spec built for the aggregate is empty (the test case has
>>> nothing specified here).
>>> When it is passed down to the delegates, the setResultSpecification
>>> for the Pear descriptor in PearAnalysisEngineWrapper is called.
>>> This is not implemented, so it inherits from its super, which is
>>> AnalysisEngineImplBase - and this impl does nothing (expecting to be
>>> overridden). I'll write this up as a Jira issue. But even if this
>>> were "fixed", because the outer Aggregate had nothing specified in
>>> its capability, the inner primitive analysis engine is set up
>>> initially with a "default" result spec, which is its own output
>>> capabilities. This spec says it should produce results just for
>>> "en", and in particular it should *not* produce output for
>>> x-unspecified. This annotator is written to respect the result
>>> spec, so it doesn't produce anything.
>>>
>> The PearRuntimeTest does not use to capabilityLanguageFlow so we have
>> a different behavior there!
> This test is just testing if the component's behavior with respect to
> using the result specification; I don't think it has anything to do
> with the capabilityLanguageFlow?
So you mean that the computation of the default result spec does not
work correctly, since it is not implemented correctly? If that is true,
please go ahead and fix it. I was not aware of that. Thanks for catching it!
-- Michael
Re: Clarifying language subsumption in Result Specifications
Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Marshall Schor wrote:
>> I tried implimenting this change, and 2 test cases fail. They look
>> like they are failing exactly in the case where the result
>> specification has a TypeOrFeature with a specified type other than
>> "x-unspecified", and the containsTypeOrFeature method is being called
>> using the form which doesn't pass in an explicit type, so is being
>> treated as if x-unspecified was passed in.
>> As discussed below, this should give "false", but the text cases
>> expect true.
>>
>> Should I change the test cases? The failing ones are:
>>
>> ResultSpecification_implTest: It defines a result spec containing
>> the type "FakeType" for languages "en", "de", "en-US", "en-GB", but
>> not "x-unspecified". So the call rs.containsType("FakeType") returns
>> false, but the test says it should return true (because the set of
>> languages for FakeType is missing x-unspecified).
>>
> Which test method you are talking about? I would like to look at.
The call is on line 332 of class ResultSpecification_implTest. This
changed behavior arises from the proposed change to how containsType
method works: the changed logic is: if the language x-unspecified is
given (or if no language is given, as in this case), return true only if
the result specification for this type or feature includes the langauge
"x-unspecified". In this test, the result specification for the type
"FakeType" is set from the component's capabilities specification, which
said this component outputs FakeType for languages "en", "de", "en-US",
"en-GB", but not "x-unspecified". So with the propsed changed to how
containsType works, it returns false. But the test case expects true.
>> The other test is the PearRuntimeTest.
>> This test loads two Pears, runs them and then looks at the CAS result.
>> The descriptor for one of the tests, the TutorialDateTime descriptor
>> says it output 3 types, *but for language "en"* (only, and not for
>> x-unspecified in particular).
>>
>> The result spec built for the aggregate is empty (the test case has
>> nothing specified here).
>> When it is passed down to the delegates, the setResultSpecification
>> for the Pear descriptor in PearAnalysisEngineWrapper is called. This
>> is not implemented, so it inherits from its super, which is
>> AnalysisEngineImplBase - and this impl does nothing (expecting to be
>> overridden). I'll write this up as a Jira issue. But even if this
>> were "fixed", because the outer Aggregate had nothing specified in
>> its capability, the inner primitive analysis engine is set up
>> initially with a "default" result spec, which is its own output
>> capabilities. This spec says it should produce results just for
>> "en", and in particular it should *not* produce output for
>> x-unspecified. This annotator is written to respect the result spec,
>> so it doesn't produce anything.
>>
> The PearRuntimeTest does not use to capabilityLanguageFlow so we have
> a different behavior there!
This test is just testing if the component's behavior with respect to
using the result specification; I don't think it has anything to do with
the capabilityLanguageFlow?
-Marshall
>
> -- Michael
>
>
Re: Clarifying language subsumption in Result Specifications
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> I tried implimenting this change, and 2 test cases fail. They look
> like they are failing exactly in the case where the result
> specification has a TypeOrFeature with a specified type other than
> "x-unspecified", and the containsTypeOrFeature method is being called
> using the form which doesn't pass in an explicit type, so is being
> treated as if x-unspecified was passed in.
> As discussed below, this should give "false", but the text cases
> expect true.
>
> Should I change the test cases? The failing ones are:
>
> ResultSpecification_implTest: It defines a result spec containing the
> type "FakeType" for languages "en", "de", "en-US", "en-GB", but not
> "x-unspecified". So the call rs.containsType("FakeType") returns
> false, but the test says it should return true (because the set of
> languages for FakeType is missing x-unspecified).
>
Which test method you are talking about? I would like to look at.
> The other test is the PearRuntimeTest.
> This test loads two Pears, runs them and then looks at the CAS result.
> The descriptor for one of the tests, the TutorialDateTime descriptor
> says it output 3 types, *but for language "en"* (only, and not for
> x-unspecified in particular).
>
> The result spec built for the aggregate is empty (the test case has
> nothing specified here).
> When it is passed down to the delegates, the setResultSpecification
> for the Pear descriptor in PearAnalysisEngineWrapper is called. This
> is not implemented, so it inherits from its super, which is
> AnalysisEngineImplBase - and this impl does nothing (expecting to be
> overridden). I'll write this up as a Jira issue.
> But even if this were "fixed", because the outer Aggregate had nothing
> specified in its capability, the inner primitive analysis engine is
> set up initially with a "default" result spec, which is its own output
> capabilities. This spec says it should produce results just for "en",
> and in particular it should *not* produce output for x-unspecified.
> This annotator is written to respect the result spec, so it doesn't
> produce anything.
>
The PearRuntimeTest does not use to capabilityLanguageFlow so we have a
different behavior there!
-- Michael
Re: Clarifying language subsumption in Result Specifications
Posted by Marshall Schor <ms...@schor.com>.
I tried implimenting this change, and 2 test cases fail. They look like
they are failing exactly in the case where the result specification has
a TypeOrFeature with a specified type other than "x-unspecified", and
the containsTypeOrFeature method is being called using the form which
doesn't pass in an explicit type, so is being treated as if
x-unspecified was passed in.
As discussed below, this should give "false", but the text cases expect
true.
Should I change the test cases? The failing ones are:
ResultSpecification_implTest: It defines a result spec containing the
type "FakeType" for languages "en", "de", "en-US", "en-GB", but not
"x-unspecified". So the call rs.containsType("FakeType") returns false,
but the test says it should return true (because the set of languages
for FakeType is missing x-unspecified).
The other test is the PearRuntimeTest.
This test loads two Pears, runs them and then looks at the CAS result.
The descriptor for one of the tests, the TutorialDateTime descriptor
says it output 3 types, *but for language "en"* (only, and not for
x-unspecified in particular).
The result spec built for the aggregate is empty (the test case has
nothing specified here).
When it is passed down to the delegates, the setResultSpecification for
the Pear descriptor in PearAnalysisEngineWrapper is called. This is not
implemented, so it inherits from its super, which is
AnalysisEngineImplBase - and this impl does nothing (expecting to be
overridden). I'll write this up as a Jira issue.
But even if this were "fixed", because the outer Aggregate had nothing
specified in its capability, the inner primitive analysis engine is set
up initially with a "default" result spec, which is its own output
capabilities. This spec says it should produce results just for "en",
and in particular it should *not* produce output for x-unspecified.
This annotator is written to respect the result spec, so it doesn't
produce anything.
Anyone object to my changing the test cases?
-Marshall
Marshall Schor wrote:
> Language specifications are in a hierarchy. For example, from most
> inclusive to finer subsets, we have:
>
> x-unspecified
> en
> en-us
>
> A result spec's most common use is in a negative sense - Annotators
> can check a result spec and if it doesn't contain the type or feature,
> it can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature,
> called TF.
>
> If the annotator thinks it produces TF for language en-us only, and
> wants to check if should skip producing this, it calls
> containsType/Feature(TF, "en-us"). This is defined in the current
> impl to return true, if the result spec has languages x-unspecified,
> en, or en-us.
>
> Let's consider the opposite case. Suppose we have an annotator that
> can produce TF for "en". Suppose the result-spec has an entry for TF
> only for the language "en-us". Should that annotator produce
> results? If it calls containsType/Feature(TF, "en"), it will get a
> "false" (current implementation).
>
> After some thinking about this and some discussion (because I don't
> think I got it right, just by myself :-) ),
> it seems that this is correct. Consider the following case:
> The language of the document is "en", and the containing (top-most)
> aggregate specified explicitly it wanted
> output only for en-us. In that case, the annotator should not
> produce any results, because the language
> of this doc is not en-us, and the assembler put together things that
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".
> Suppose the result-spec has an entry for TF only for the language
> "en". Should that annotator produce results? If it calls
> containsType/Feature(TF, "x-unspecified"), it should get a "false"
> (broken in the current implementation!, but was true I think in the
> previous one).
>
> Assume the language of the document is "x-unspecified", and the
> containing (top-most) aggregate specified explicitly it wanted
> output only for en. In that case, the annotator should not produce
> any results, because the language
> of this doc is not "en", and the assembler put together things that
> they said should only output "en" results.
>
> Do others agree with this?
>
> -Marshall
>
>
Re: Clarifying language subsumption in Result Specifications
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Michael Baessler wrote:
>>>> Marshall Schor wrote:
>>>>> Language specifications are in a hierarchy. For example, from
>>>>> most inclusive to finer subsets, we have:
>>>>>
>>>>> x-unspecified
>>>>> en
>>>>> en-us
>>>>>
>>>>> A result spec's most common use is in a negative sense -
>>>>> Annotators can check a result spec and if it doesn't contain the
>>>>> type or feature, it can skip producing that type or feature.
>>>>>
>>>>> For simplicity, let's consider we have only one type or feature,
>>>>> called TF.
>>>>>
>>>>> If the annotator thinks it produces TF for language en-us only,
>>>>> and wants to check if should skip producing this, it calls
>>>>> containsType/Feature(TF, "en-us"). This is defined in the current
>>>>> impl to return true, if the result spec has languages
>>>>> x-unspecified, en, or en-us.
>>>>>
>>>>> Let's consider the opposite case. Suppose we have an annotator
>>>>> that can produce TF for "en". Suppose the result-spec has an
>>>>> entry for TF only for the language "en-us". Should that annotator
>>>>> produce results? If it calls containsType/Feature(TF, "en"), it
>>>>> will get a "false" (current implementation).
>>>>>
>>>>> After some thinking about this and some discussion (because I
>>>>> don't think I got it right, just by myself :-) ),
>>>>> it seems that this is correct. Consider the following case:
>>>>> The language of the document is "en", and the containing
>>>>> (top-most) aggregate specified explicitly it wanted
>>>>> output only for en-us. In that case, the annotator should not
>>>>> produce any results, because the language
>>>>> of this doc is not en-us, and the assembler put together things
>>>>> that they said should only output en-us results.
>>>>>
>>>>> This same logic seems to apply to x-unspecified:
>>>>>
>>>>> Suppose we have an annotator that can produce TF for
>>>>> "x-unspecified". Suppose the result-spec has an entry for TF only
>>>>> for the language "en". Should that annotator produce results? If
>>>>> it calls containsType/Feature(TF, "x-unspecified"), it should get
>>>>> a "false" (broken in the current implementation!, but was true I
>>>>> think in the previous one).
>>>> I'm not sure you are right here. I think if an annotator can
>>>> produce TF for "x-unspecified" that means that it can produce TF
>>>> for all languages. So if an "en" document comes in the annotator
>>>> should produce a result.
>>> hmmm, this seems to contradict your statement below, saying "That
>>> case is correct".
>>>
>>> In the example below, the result-spec passed in to the annotator has
>>> only "en", not "x-unspecified". This is the case proposed in my
>>> paragraph. Below you say it is right for the annotator to *not*
>>> produce results, while above you say it should produce results.
>>> This is inconsistent, unless I've mangled something... Can you
>>> clarify?
>>>
>>> -Marshall
>>>>>
>>>>> Assume the language of the document is "x-unspecified", and the
>>>>> containing (top-most) aggregate specified explicitly it wanted
>>>>> output only for en. In that case, the annotator should not
>>>>> produce any results, because the language
>>>>> of this doc is not "en", and the assembler put together things
>>>>> that they said should only output "en" results.
>>>>>
>>>> That case is correct.
>>>>
>>>> -- Michael
>>>>
>>>>
>>>
>> Maybe the confusion comes from the different treatment of
>> "x-unspecified". If "x-unspecified" is specified in the output spec
>> of an annotator it means that it can produce results for all languages.
> True - and that works. But that wasn't the case I was trying to
> describe - I was trying to describe the opposite case: The case
> where the *output spec* of an annotator is *missing* the "x-unspecified".
> To restate the case: The output spec has "en" (only), and the
> annotator, when running, queries the result spec with
> "x-unspecified". This proposal says in that case, containsType should
> return false. Do you agree this should be the result in this case?
> It seems you do above when you say "That case is correct", but
> disagree in the paragraph where you say "I'm not sure you are right
> here.".
> Perhaps I have not clearly described the two cases, but I think they
> are the same case (and therefore need to have the same answer ;-) )
OK seems I did not understand the two cases correctly. :-)
Yes it is true that no results should be produced when the output spec
for the annotator has only "en" and the document language is
"x-unspecified".
-- Michael
Re: Clarifying language subsumption in Result Specifications
Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Marshall Schor wrote:
>> Michael Baessler wrote:
>>> Marshall Schor wrote:
>>>> Language specifications are in a hierarchy. For example, from most
>>>> inclusive to finer subsets, we have:
>>>>
>>>> x-unspecified
>>>> en
>>>> en-us
>>>>
>>>> A result spec's most common use is in a negative sense - Annotators
>>>> can check a result spec and if it doesn't contain the type or
>>>> feature, it can skip producing that type or feature.
>>>>
>>>> For simplicity, let's consider we have only one type or feature,
>>>> called TF.
>>>>
>>>> If the annotator thinks it produces TF for language en-us only, and
>>>> wants to check if should skip producing this, it calls
>>>> containsType/Feature(TF, "en-us"). This is defined in the current
>>>> impl to return true, if the result spec has languages
>>>> x-unspecified, en, or en-us.
>>>>
>>>> Let's consider the opposite case. Suppose we have an annotator
>>>> that can produce TF for "en". Suppose the result-spec has an entry
>>>> for TF only for the language "en-us". Should that annotator
>>>> produce results? If it calls containsType/Feature(TF, "en"), it
>>>> will get a "false" (current implementation).
>>>>
>>>> After some thinking about this and some discussion (because I don't
>>>> think I got it right, just by myself :-) ),
>>>> it seems that this is correct. Consider the following case:
>>>> The language of the document is "en", and the containing
>>>> (top-most) aggregate specified explicitly it wanted
>>>> output only for en-us. In that case, the annotator should not
>>>> produce any results, because the language
>>>> of this doc is not en-us, and the assembler put together things
>>>> that they said should only output en-us results.
>>>>
>>>> This same logic seems to apply to x-unspecified:
>>>>
>>>> Suppose we have an annotator that can produce TF for
>>>> "x-unspecified". Suppose the result-spec has an entry for TF only
>>>> for the language "en". Should that annotator produce results? If
>>>> it calls containsType/Feature(TF, "x-unspecified"), it should get a
>>>> "false" (broken in the current implementation!, but was true I
>>>> think in the previous one).
>>> I'm not sure you are right here. I think if an annotator can produce
>>> TF for "x-unspecified" that means that it can produce TF for all
>>> languages. So if an "en" document comes in the annotator should
>>> produce a result.
>> hmmm, this seems to contradict your statement below, saying "That
>> case is correct".
>>
>> In the example below, the result-spec passed in to the annotator has
>> only "en", not "x-unspecified". This is the case proposed in my
>> paragraph. Below you say it is right for the annotator to *not*
>> produce results, while above you say it should produce results. This
>> is inconsistent, unless I've mangled something... Can you clarify?
>>
>> -Marshall
>>>>
>>>> Assume the language of the document is "x-unspecified", and the
>>>> containing (top-most) aggregate specified explicitly it wanted
>>>> output only for en. In that case, the annotator should not produce
>>>> any results, because the language
>>>> of this doc is not "en", and the assembler put together things that
>>>> they said should only output "en" results.
>>>>
>>> That case is correct.
>>>
>>> -- Michael
>>>
>>>
>>
> Maybe the confusion comes from the different treatment of
> "x-unspecified". If "x-unspecified" is specified in the output spec of
> an annotator it means that it can produce results for all languages.
True - and that works. But that wasn't the case I was trying to
describe - I was trying to describe the opposite case: The case where
the *output spec* of an annotator is *missing* the "x-unspecified".
To restate the case: The output spec has "en" (only), and the
annotator, when running, queries the result spec with "x-unspecified".
This proposal says in that case, containsType should return false. Do
you agree this should be the result in this case? It seems you do above
when you say "That case is correct", but disagree in the paragraph where
you say "I'm not sure you are right here.".
Perhaps I have not clearly described the two cases, but I think they are
the same case (and therefore need to have the same answer ;-) )
-Marshall
>
> -- Michael
>
>
>
Re: Clarifying language subsumption in Result Specifications
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Language specifications are in a hierarchy. For example, from most
>>> inclusive to finer subsets, we have:
>>>
>>> x-unspecified
>>> en
>>> en-us
>>>
>>> A result spec's most common use is in a negative sense - Annotators
>>> can check a result spec and if it doesn't contain the type or
>>> feature, it can skip producing that type or feature.
>>>
>>> For simplicity, let's consider we have only one type or feature,
>>> called TF.
>>>
>>> If the annotator thinks it produces TF for language en-us only, and
>>> wants to check if should skip producing this, it calls
>>> containsType/Feature(TF, "en-us"). This is defined in the current
>>> impl to return true, if the result spec has languages x-unspecified,
>>> en, or en-us.
>>>
>>> Let's consider the opposite case. Suppose we have an annotator that
>>> can produce TF for "en". Suppose the result-spec has an entry for
>>> TF only for the language "en-us". Should that annotator produce
>>> results? If it calls containsType/Feature(TF, "en"), it will get a
>>> "false" (current implementation).
>>>
>>> After some thinking about this and some discussion (because I don't
>>> think I got it right, just by myself :-) ),
>>> it seems that this is correct. Consider the following case:
>>> The language of the document is "en", and the containing (top-most)
>>> aggregate specified explicitly it wanted
>>> output only for en-us. In that case, the annotator should not
>>> produce any results, because the language
>>> of this doc is not en-us, and the assembler put together things
>>> that they said should only output en-us results.
>>>
>>> This same logic seems to apply to x-unspecified:
>>>
>>> Suppose we have an annotator that can produce TF for
>>> "x-unspecified". Suppose the result-spec has an entry for TF only
>>> for the language "en". Should that annotator produce results? If
>>> it calls containsType/Feature(TF, "x-unspecified"), it should get a
>>> "false" (broken in the current implementation!, but was true I think
>>> in the previous one).
>> I'm not sure you are right here. I think if an annotator can produce
>> TF for "x-unspecified" that means that it can produce TF for all
>> languages. So if an "en" document comes in the annotator should
>> produce a result.
> hmmm, this seems to contradict your statement below, saying "That case
> is correct".
>
> In the example below, the result-spec passed in to the annotator has
> only "en", not "x-unspecified". This is the case proposed in my
> paragraph. Below you say it is right for the annotator to *not*
> produce results, while above you say it should produce results. This
> is inconsistent, unless I've mangled something... Can you clarify?
>
> -Marshall
>>>
>>> Assume the language of the document is "x-unspecified", and the
>>> containing (top-most) aggregate specified explicitly it wanted
>>> output only for en. In that case, the annotator should not produce
>>> any results, because the language
>>> of this doc is not "en", and the assembler put together things that
>>> they said should only output "en" results.
>>>
>> That case is correct.
>>
>> -- Michael
>>
>>
>
Maybe the confusion comes from the different treatment of
"x-unspecified". If "x-unspecified" is specified in the output spec of
an annotator it means that it can produce results for all languages. So
if the document language is "en" or "de" or "x-unspecified" the
annotator produce results.
If the output spec of an annotator only has "en" for type TF, the
annotator only produce results if the document language is "en" or
"en-US" but not if it is "x-unspecified".
Does this help?
-- Michael
Re: Clarifying language subsumption in Result Specifications
Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Marshall Schor wrote:
>> Language specifications are in a hierarchy. For example, from most
>> inclusive to finer subsets, we have:
>>
>> x-unspecified
>> en
>> en-us
>>
>> A result spec's most common use is in a negative sense - Annotators
>> can check a result spec and if it doesn't contain the type or
>> feature, it can skip producing that type or feature.
>>
>> For simplicity, let's consider we have only one type or feature,
>> called TF.
>>
>> If the annotator thinks it produces TF for language en-us only, and
>> wants to check if should skip producing this, it calls
>> containsType/Feature(TF, "en-us"). This is defined in the current
>> impl to return true, if the result spec has languages x-unspecified,
>> en, or en-us.
>>
>> Let's consider the opposite case. Suppose we have an annotator that
>> can produce TF for "en". Suppose the result-spec has an entry for TF
>> only for the language "en-us". Should that annotator produce
>> results? If it calls containsType/Feature(TF, "en"), it will get a
>> "false" (current implementation).
>>
>> After some thinking about this and some discussion (because I don't
>> think I got it right, just by myself :-) ),
>> it seems that this is correct. Consider the following case:
>> The language of the document is "en", and the containing (top-most)
>> aggregate specified explicitly it wanted
>> output only for en-us. In that case, the annotator should not
>> produce any results, because the language
>> of this doc is not en-us, and the assembler put together things that
>> they said should only output en-us results.
>>
>> This same logic seems to apply to x-unspecified:
>>
>> Suppose we have an annotator that can produce TF for
>> "x-unspecified". Suppose the result-spec has an entry for TF only
>> for the language "en". Should that annotator produce results? If it
>> calls containsType/Feature(TF, "x-unspecified"), it should get a
>> "false" (broken in the current implementation!, but was true I think
>> in the previous one).
> I'm not sure you are right here. I think if an annotator can produce
> TF for "x-unspecified" that means that it can produce TF for all
> languages. So if an "en" document comes in the annotator should
> produce a result.
hmmm, this seems to contradict your statement below, saying "That case
is correct".
In the example below, the result-spec passed in to the annotator has
only "en", not "x-unspecified". This is the case proposed in my
paragraph. Below you say it is right for the annotator to *not* produce
results, while above you say it should produce results. This is
inconsistent, unless I've mangled something... Can you clarify?
-Marshall
>>
>> Assume the language of the document is "x-unspecified", and the
>> containing (top-most) aggregate specified explicitly it wanted
>> output only for en. In that case, the annotator should not produce
>> any results, because the language
>> of this doc is not "en", and the assembler put together things that
>> they said should only output "en" results.
>>
> That case is correct.
>
> -- Michael
>
>
Re: Clarifying language subsumption in Result Specifications
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Language specifications are in a hierarchy. For example, from most
> inclusive to finer subsets, we have:
>
> x-unspecified
> en
> en-us
>
> A result spec's most common use is in a negative sense - Annotators
> can check a result spec and if it doesn't contain the type or feature,
> it can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature,
> called TF.
>
> If the annotator thinks it produces TF for language en-us only, and
> wants to check if should skip producing this, it calls
> containsType/Feature(TF, "en-us"). This is defined in the current
> impl to return true, if the result spec has languages x-unspecified,
> en, or en-us.
>
> Let's consider the opposite case. Suppose we have an annotator that
> can produce TF for "en". Suppose the result-spec has an entry for TF
> only for the language "en-us". Should that annotator produce
> results? If it calls containsType/Feature(TF, "en"), it will get a
> "false" (current implementation).
>
> After some thinking about this and some discussion (because I don't
> think I got it right, just by myself :-) ),
> it seems that this is correct. Consider the following case:
> The language of the document is "en", and the containing (top-most)
> aggregate specified explicitly it wanted
> output only for en-us. In that case, the annotator should not
> produce any results, because the language
> of this doc is not en-us, and the assembler put together things that
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".
> Suppose the result-spec has an entry for TF only for the language
> "en". Should that annotator produce results? If it calls
> containsType/Feature(TF, "x-unspecified"), it should get a "false"
> (broken in the current implementation!, but was true I think in the
> previous one).
I'm not sure you are right here. I think if an annotator can produce TF
for "x-unspecified" that means that it can produce TF for all languages.
So if an "en" document comes in the annotator should produce a result.
>
> Assume the language of the document is "x-unspecified", and the
> containing (top-most) aggregate specified explicitly it wanted
> output only for en. In that case, the annotator should not produce
> any results, because the language
> of this doc is not "en", and the assembler put together things that
> they said should only output "en" results.
>
That case is correct.
-- Michael
Re: Clarifying language subsumption in Result Specifications
Posted by Adam Lally <al...@alum.rpi.edu>.
Seems right to me.
-Adam
On Jan 28, 2008 11:51 AM, Marshall Schor <ms...@schor.com> wrote:
> Language specifications are in a hierarchy. For example, from most
> inclusive to finer subsets, we have:
>
> x-unspecified
> en
> en-us
>
> A result spec's most common use is in a negative sense - Annotators can
> check a result spec and if it doesn't contain the type or feature, it
> can skip producing that type or feature.
>
> For simplicity, let's consider we have only one type or feature, called TF.
>
> If the annotator thinks it produces TF for language en-us only, and
> wants to check if should skip producing this, it calls
> containsType/Feature(TF, "en-us"). This is defined in the current impl
> to return true, if the result spec has languages x-unspecified, en, or
> en-us.
>
> Let's consider the opposite case. Suppose we have an annotator that can
> produce TF for "en". Suppose the result-spec has an entry for TF only
> for the language "en-us". Should that annotator produce results? If it
> calls containsType/Feature(TF, "en"), it will get a "false" (current
> implementation).
>
> After some thinking about this and some discussion (because I don't
> think I got it right, just by myself :-) ),
> it seems that this is correct. Consider the following case:
> The language of the document is "en", and the containing (top-most)
> aggregate specified explicitly it wanted
> output only for en-us. In that case, the annotator should not produce
> any results, because the language
> of this doc is not en-us, and the assembler put together things that
> they said should only output en-us results.
>
> This same logic seems to apply to x-unspecified:
>
> Suppose we have an annotator that can produce TF for "x-unspecified".
> Suppose the result-spec has an entry for TF only for the language "en".
> Should that annotator produce results? If it calls
> containsType/Feature(TF, "x-unspecified"), it should get a "false"
> (broken in the current implementation!, but was true I think in the
> previous one).
>
> Assume the language of the document is "x-unspecified", and the
> containing (top-most) aggregate specified explicitly it wanted
> output only for en. In that case, the annotator should not produce any
> results, because the language
> of this doc is not "en", and the assembler put together things that they
> said should only output "en" results.
>
> Do others agree with this?
>
> -Marshall
>