You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "Sommer, Scott (Contractor)" <Sc...@dsto.defence.gov.au> on 2007/09/13 09:00:54 UTC

SIAPI Indexing

Hi,

I'm looking into writing an indexer for the semantic search cas consumer
from the UIMA-examples. 

The main reason I want to do this is because our CAS passed to the
indexer does not have a SourceDocumentInformation annotation in our
current set up. I have had a look at the API docs and can see that the
UimaJuruDocument class has 5 different constructors, which basically
shows that most of the arguments are optional (If this is incorrect,
please let me know). I am assuming that in the example, the
SourceDocumentInformation annotation contains information that is passed
to the UimaJuruDocument constructor, but I don't have the
SourceDocumentInformation annotation in the CAS so of course the
SemanticSearchCasIndexer in the example fails. I would love to be able
to simply get the source code for the Indexer and modify it so that it
doesn't require the SourceDocumentInformation annotation, but there is
no source code included in the jar containing this class.

My only other options would appear to be; to write an Indexer of my own
which does not require the SourceDocumentInformation annotation, or, to
put in a SourceDocumentAnnotation annotation just for the indexer which
contains nothing useful.

I would like to avoid putting useless annotations into the CAS, so I've
been reading through the SIAPI.pdf and the javadoc. The SIAPI pdf seems
largely focused on the searching with next to nothing about how to write
an indexer. Is there a better source of information on the SIAPI from an
indexing perspective? And overall, is there a better solution to this
problem?

Cheers,
Scott Sommer.


IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.



RE: SIAPI Indexing

Posted by "Sommer, Scott (Contractor)" <Sc...@dsto.defence.gov.au>.
I am using the semantic search package mentioned. I was considering
trying to create a wrapper which extends SemanticSearchCasIndexer and
adds/removes the SourceDocumentInformation annotation before and after
calling the super.processCas method, but for now I have written a simple
"type system mapper". I might have a go at this later.

Cheers,
Scott.

-----Original Message-----
From: lally.adam@gmail.com [mailto:lally.adam@gmail.com] On Behalf Of
Adam Lally
Sent: Thursday, 13 September 2007 10:52 PM
To: uima-user@incubator.apache.org
Subject: Re: SIAPI Indexing

On 9/13/07, Thilo Goetz <tw...@gmx.de> wrote:
> You may wish to look at the semantic search package on the IBM 
> alphaworks site.

I assume that's what Scott's using.  The CAS Consumer requires a
SourceDocumentInformation object as input, which contains the URL of the
document.  That's what gets added to the index so that when a search is
done we can send back the URLs of the documents that match.

Unfortunately this code was never open sourced and I myself don't have
it.  I think your best bet is to create the SourceDocumentInformation
object, and then if you want you can delete it afterwards.  When
integrating components from different sources it is common to have to
put in a "type system mapper" (in this case a very simple one) that
translates the FSs in the CAS to a format the downstream component can
accept.  Ideally there would be a standard UIMA type system for some
things, such as the location of the source document, but that's not the
case at the moment.

-Adam

IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.



Re: SIAPI Indexing

Posted by Adam Lally <al...@alum.rpi.edu>.
On 9/13/07, Thilo Goetz <tw...@gmx.de> wrote:
> You may wish to look at the semantic search
> package on the IBM alphaworks site.

I assume that's what Scott's using.  The CAS Consumer requires a
SourceDocumentInformation object as input, which contains the URL of
the document.  That's what gets added to the index so that when a
search is done we can send back the URLs of the documents that match.

Unfortunately this code was never open sourced and I myself don't have
it.  I think your best bet is to create the SourceDocumentInformation
object, and then if you want you can delete it afterwards.  When
integrating components from different sources it is common to have to
put in a "type system mapper" (in this case a very simple one) that
translates the FSs in the CAS to a format the downstream component can
accept.  Ideally there would be a standard UIMA type system for some
things, such as the location of the source document, but that's not
the case at the moment.

-Adam

Re: SIAPI Indexing

Posted by Thilo Goetz <tw...@gmx.de>.
Hi Scott,

the Apache UIMA release does not actually come
with a semantic search capability out of the box.
All we give you are the indexing interfaces, but
no implementation.

You may wish to look at the semantic search
package on the IBM alphaworks site.  Go to
http://www.alphaworks.ibm.com/tech/uima/download
and get SemanticSearch_2.1.zip.  It was built with
Apache UIMA 2.1, but I have no reason to believe
it won't work with 2.2.  I have not actually tried
it myself, though.  Let us know how it goes.

--Thilo

Sommer, Scott (Contractor) wrote:
> Hi Thilo,
> 
> I'm using version 2.2.
> 
> -----Original Message-----
> From: Thilo Goetz [mailto:twgoetz@gmx.de] 
> Sent: Thursday, 13 September 2007 5:21 PM
> To: uima-user@incubator.apache.org
> Subject: Re: SIAPI Indexing
> 
> Hi Scott,
> 
> what version of UIMA are you referring to?
> 
> --Thilo
> 
> Sommer, Scott (Contractor) wrote:
>> Hi,
>>
>> I'm looking into writing an indexer for the semantic search cas 
>> consumer
>>> >from the UIMA-examples. 
>> The main reason I want to do this is because our CAS passed to the 
>> indexer does not have a SourceDocumentInformation annotation in our 
>> current set up. I have had a look at the API docs and can see that the
> 
>> UimaJuruDocument class has 5 different constructors, which basically 
>> shows that most of the arguments are optional (If this is incorrect, 
>> please let me know). I am assuming that in the example, the 
>> SourceDocumentInformation annotation contains information that is 
>> passed to the UimaJuruDocument constructor, but I don't have the 
>> SourceDocumentInformation annotation in the CAS so of course the 
>> SemanticSearchCasIndexer in the example fails. I would love to be able
> 
>> to simply get the source code for the Indexer and modify it so that it
> 
>> doesn't require the SourceDocumentInformation annotation, but there is
> 
>> no source code included in the jar containing this class.
>>
>> My only other options would appear to be; to write an Indexer of my 
>> own which does not require the SourceDocumentInformation annotation, 
>> or, to put in a SourceDocumentAnnotation annotation just for the 
>> indexer which contains nothing useful.
>>
>> I would like to avoid putting useless annotations into the CAS, so 
>> I've been reading through the SIAPI.pdf and the javadoc. The SIAPI pdf
> 
>> seems largely focused on the searching with next to nothing about how 
>> to write an indexer. Is there a better source of information on the 
>> SIAPI from an indexing perspective? And overall, is there a better 
>> solution to this problem?
>>
>> Cheers,
>> Scott Sommer.
>>
>>
>> IMPORTANT: This email remains the property of the Australian Defence
> Organisation and is subject to the jurisdiction of section 70 of the
> CRIMES ACT 1914.  If you have received this email in error, you are
> requested to contact the sender and delete the email.
>>
>>

RE: SIAPI Indexing

Posted by "Sommer, Scott (Contractor)" <Sc...@dsto.defence.gov.au>.
Hi Thilo,

I'm using version 2.2.

-----Original Message-----
From: Thilo Goetz [mailto:twgoetz@gmx.de] 
Sent: Thursday, 13 September 2007 5:21 PM
To: uima-user@incubator.apache.org
Subject: Re: SIAPI Indexing

Hi Scott,

what version of UIMA are you referring to?

--Thilo

Sommer, Scott (Contractor) wrote:
> Hi,
> 
> I'm looking into writing an indexer for the semantic search cas 
> consumer
>>from the UIMA-examples. 
> 
> The main reason I want to do this is because our CAS passed to the 
> indexer does not have a SourceDocumentInformation annotation in our 
> current set up. I have had a look at the API docs and can see that the

> UimaJuruDocument class has 5 different constructors, which basically 
> shows that most of the arguments are optional (If this is incorrect, 
> please let me know). I am assuming that in the example, the 
> SourceDocumentInformation annotation contains information that is 
> passed to the UimaJuruDocument constructor, but I don't have the 
> SourceDocumentInformation annotation in the CAS so of course the 
> SemanticSearchCasIndexer in the example fails. I would love to be able

> to simply get the source code for the Indexer and modify it so that it

> doesn't require the SourceDocumentInformation annotation, but there is

> no source code included in the jar containing this class.
> 
> My only other options would appear to be; to write an Indexer of my 
> own which does not require the SourceDocumentInformation annotation, 
> or, to put in a SourceDocumentAnnotation annotation just for the 
> indexer which contains nothing useful.
> 
> I would like to avoid putting useless annotations into the CAS, so 
> I've been reading through the SIAPI.pdf and the javadoc. The SIAPI pdf

> seems largely focused on the searching with next to nothing about how 
> to write an indexer. Is there a better source of information on the 
> SIAPI from an indexing perspective? And overall, is there a better 
> solution to this problem?
> 
> Cheers,
> Scott Sommer.
> 
> 
> IMPORTANT: This email remains the property of the Australian Defence
Organisation and is subject to the jurisdiction of section 70 of the
CRIMES ACT 1914.  If you have received this email in error, you are
requested to contact the sender and delete the email.
> 
> 
> 

Re: SIAPI Indexing

Posted by Thilo Goetz <tw...@gmx.de>.
Hi Scott,

what version of UIMA are you referring to?

--Thilo

Sommer, Scott (Contractor) wrote:
> Hi,
> 
> I'm looking into writing an indexer for the semantic search cas consumer
>>from the UIMA-examples. 
> 
> The main reason I want to do this is because our CAS passed to the
> indexer does not have a SourceDocumentInformation annotation in our
> current set up. I have had a look at the API docs and can see that the
> UimaJuruDocument class has 5 different constructors, which basically
> shows that most of the arguments are optional (If this is incorrect,
> please let me know). I am assuming that in the example, the
> SourceDocumentInformation annotation contains information that is passed
> to the UimaJuruDocument constructor, but I don't have the
> SourceDocumentInformation annotation in the CAS so of course the
> SemanticSearchCasIndexer in the example fails. I would love to be able
> to simply get the source code for the Indexer and modify it so that it
> doesn't require the SourceDocumentInformation annotation, but there is
> no source code included in the jar containing this class.
> 
> My only other options would appear to be; to write an Indexer of my own
> which does not require the SourceDocumentInformation annotation, or, to
> put in a SourceDocumentAnnotation annotation just for the indexer which
> contains nothing useful.
> 
> I would like to avoid putting useless annotations into the CAS, so I've
> been reading through the SIAPI.pdf and the javadoc. The SIAPI pdf seems
> largely focused on the searching with next to nothing about how to write
> an indexer. Is there a better source of information on the SIAPI from an
> indexing perspective? And overall, is there a better solution to this
> problem?
> 
> Cheers,
> Scott Sommer.
> 
> 
> IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.
> 
> 
>