You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Erik Fäßler <er...@uni-jena.de> on 2011/04/13 14:00:15 UTC

Sandbox: LuCas Lucene update

  Hey all,

back in January, I had the need to have the CAS Lucene indexer (LuCas, 
UIMA Sandbox component) working with Lucene 2.9.x. So I checked it out 
from the Sandbox SVN, updated the libraries and fixed the compiling 
bugs. The result is a LuCas component working with Lucene 2.9.3. At 
least all tests are working and I used the component (together with Solr 
which was why I needed Lucene 2.9.x) successfully.
The changes needed were not too big as I did not take the leap to Lucene 
3.x. Some filters have been updated to the new Token API and one or two 
classes required a more or less complete rewrite until the tests would 
work again.

So, my question: Would it be desirable to commit these changes back to 
the Sandbox SVN? Which steps would have I have to take for this? Or 
should I just send my sources to a developer? The component has been 
created in my lab originally, but the developer has moved to another 
working place quite a while ago.

Best regards,

     Erik

Re: Sandbox: LuCas Lucene update

Posted by Tommaso Teofili <to...@gmail.com>.

Hello again Erik,
from what I've seen and tested your patch looks good, if Jörn's tests behave
as expected I think we can commit it.
Regards,
Tommaso

2011/4/13 Tommaso Teofili <to...@gmail.com>

> Thanks Erik,
> I'm going to review your patch now :)
> The capabilities you mentioned are not implemented yet in Solrcas but I
> hope we can bring them there as well.
> Regads,
> Tommaso
>
> 2011/4/13 Erik Fäßler <er...@uni-jena.de>
>
>>  Am 13.04.2011 14:49, schrieb Jörn Kottmann:
>>
>>  On 4/13/11 2:44 PM, Erik Fäßler wrote:
>>>
>>>>  Hello Tommaso,
>>>>
>>>> thanks a lot for your reply :) I will follow the steps you gave me as
>>>> soon as there is a little time for this.
>>>>
>>>> Also thanks for the SolrCas hint. I think we already talked about this.
>>>> As far as I understood, Solrcas as well as the Solr-UIMA integration lack
>>>> some of the features offered by LuCas, for example the alignment of
>>>> TokenStreams which allows you to merge multiple CAS indexes into a single
>>>> Lucene field where position_increments are adjusted to stack Lucene tokens
>>>> with the same offsets. Please (!!) tell me when I'm wrong here, as I am
>>>> still working on my own ways to use UIMA together with Solr.
>>>>
>>>
>>> I might have time next week to work on the Lucas component, because I
>>> also need it for a project.
>>> Maybe that would be a good chance to apply and test your patch.
>>>
>>> Jörn
>>>
>> Great - as its best to get things done quickly, I just updated my version
>> to the latest trunk version, made sure the tests are still running and
>> created the patch. The corresponding Jira issue can be found at
>> https://issues.apache.org/jira/browse/UIMA-2126
>>
>> If something is wrong with the issue, please let me know - it's the first
>> I've created (e.g. I expect the "Affects Version/s" field to not match the
>> issue, but I could be wrong).
>>
>> Erik
>> **
>>
>
>

Re: Sandbox: LuCas Lucene update

Posted by Tommaso Teofili <to...@gmail.com>.

Thanks Erik,
I'm going to review your patch now :)
The capabilities you mentioned are not implemented yet in Solrcas but I hope
we can bring them there as well.
Regads,
Tommaso

2011/4/13 Erik Fäßler <er...@uni-jena.de>

>  Am 13.04.2011 14:49, schrieb Jörn Kottmann:
>
>  On 4/13/11 2:44 PM, Erik Fäßler wrote:
>>
>>>  Hello Tommaso,
>>>
>>> thanks a lot for your reply :) I will follow the steps you gave me as
>>> soon as there is a little time for this.
>>>
>>> Also thanks for the SolrCas hint. I think we already talked about this.
>>> As far as I understood, Solrcas as well as the Solr-UIMA integration lack
>>> some of the features offered by LuCas, for example the alignment of
>>> TokenStreams which allows you to merge multiple CAS indexes into a single
>>> Lucene field where position_increments are adjusted to stack Lucene tokens
>>> with the same offsets. Please (!!) tell me when I'm wrong here, as I am
>>> still working on my own ways to use UIMA together with Solr.
>>>
>>
>> I might have time next week to work on the Lucas component, because I also
>> need it for a project.
>> Maybe that would be a good chance to apply and test your patch.
>>
>> Jörn
>>
> Great - as its best to get things done quickly, I just updated my version
> to the latest trunk version, made sure the tests are still running and
> created the patch. The corresponding Jira issue can be found at
> https://issues.apache.org/jira/browse/UIMA-2126
>
> If something is wrong with the issue, please let me know - it's the first
> I've created (e.g. I expect the "Affects Version/s" field to not match the
> issue, but I could be wrong).
>
> Erik
> **
>

Re: Sandbox: LuCas Lucene update

Posted by Erik Fäßler <er...@uni-jena.de>.

  Am 13.04.2011 14:49, schrieb Jörn Kottmann:
> On 4/13/11 2:44 PM, Erik Fäßler wrote:
>>  Hello Tommaso,
>>
>> thanks a lot for your reply :) I will follow the steps you gave me as 
>> soon as there is a little time for this.
>>
>> Also thanks for the SolrCas hint. I think we already talked about 
>> this. As far as I understood, Solrcas as well as the Solr-UIMA 
>> integration lack some of the features offered by LuCas, for example 
>> the alignment of TokenStreams which allows you to merge multiple CAS 
>> indexes into a single Lucene field where position_increments are 
>> adjusted to stack Lucene tokens with the same offsets. Please (!!) 
>> tell me when I'm wrong here, as I am still working on my own ways to 
>> use UIMA together with Solr.
>
> I might have time next week to work on the Lucas component, because I 
> also need it for a project.
> Maybe that would be a good chance to apply and test your patch.
>
> Jörn
Great - as its best to get things done quickly, I just updated my 
version to the latest trunk version, made sure the tests are still 
running and created the patch. The corresponding Jira issue can be found 
at https://issues.apache.org/jira/browse/UIMA-2126

If something is wrong with the issue, please let me know - it's the 
first I've created (e.g. I expect the "Affects Version/s" field to not 
match the issue, but I could be wrong).

Erik
**

Re: Sandbox: LuCas Lucene update

Posted by Jörn Kottmann <ko...@gmail.com>.

On 4/13/11 2:44 PM, Erik Fäßler wrote:
>  Hello Tommaso,
>
> thanks a lot for your reply :) I will follow the steps you gave me as 
> soon as there is a little time for this.
>
> Also thanks for the SolrCas hint. I think we already talked about 
> this. As far as I understood, Solrcas as well as the Solr-UIMA 
> integration lack some of the features offered by LuCas, for example 
> the alignment of TokenStreams which allows you to merge multiple CAS 
> indexes into a single Lucene field where position_increments are 
> adjusted to stack Lucene tokens with the same offsets. Please (!!) 
> tell me when I'm wrong here, as I am still working on my own ways to 
> use UIMA together with Solr.

I might have time next week to work on the Lucas component, because I 
also need it for a project.
Maybe that would be a good chance to apply and test your patch.

Jörn

Re: Sandbox: LuCas Lucene update

Posted by Erik Fäßler <er...@uni-jena.de>.

  Hello Tommaso,

thanks a lot for your reply :) I will follow the steps you gave me as 
soon as there is a little time for this.

Also thanks for the SolrCas hint. I think we already talked about this. 
As far as I understood, Solrcas as well as the Solr-UIMA integration 
lack some of the features offered by LuCas, for example the alignment of 
TokenStreams which allows you to merge multiple CAS indexes into a 
single Lucene field where position_increments are adjusted to stack 
Lucene tokens with the same offsets. Please (!!) tell me when I'm wrong 
here, as I am still working on my own ways to use UIMA together with Solr.

Thanks again and warm regards,

     Erik

Am 13.04.2011 14:16, schrieb Tommaso Teofili:
> Hello Erik,
> that would be a very valuable contribution indeed!
>
> The common way of contributing code is creating a patch file which contains
> the differences between your current working copy and the latest revision
> available in SVN; you can check better how to do this at
> http://www.apache.org/dev/contributors.html#patches .
> Then you create a Jira issue under the UIMA project [1] and attach the
> created file to the issue.
> At that point a committer will review your patch and will commit it if
> everything is fine :)
>
>   As a side note if you want to use Solr within a UIMA pipeline you could be
> interested in Solrcas [2] or in the Solr-UIMA integration available in Solr
> 3.1.0 release [3].
>
> Hope this helps,
> Tommaso
>
> [1] : https://issues.apache.org/jira/browse/UIMA
> [2] : http://uima.apache.org/sandbox.html#solrcas.consumer
> [3] : http://wiki.apache.org/solr/SolrUIMA
>
> 2011/4/13 Erik Fäßler<er...@uni-jena.de>
>
>>   Hey all,
>>
>> back in January, I had the need to have the CAS Lucene indexer (LuCas, UIMA
>> Sandbox component) working with Lucene 2.9.x. So I checked it out from the
>> Sandbox SVN, updated the libraries and fixed the compiling bugs. The result
>> is a LuCas component working with Lucene 2.9.3. At least all tests are
>> working and I used the component (together with Solr which was why I needed
>> Lucene 2.9.x) successfully.
>> The changes needed were not too big as I did not take the leap to Lucene
>> 3.x. Some filters have been updated to the new Token API and one or two
>> classes required a more or less complete rewrite until the tests would work
>> again.
>>
>> So, my question: Would it be desirable to commit these changes back to the
>> Sandbox SVN? Which steps would have I have to take for this? Or should I
>> just send my sources to a developer? The component has been created in my
>> lab originally, but the developer has moved to another working place quite a
>> while ago.
>>
>> Best regards,
>>
>>     Erik
>>

Re: Sandbox: LuCas Lucene update

Posted by Tommaso Teofili <to...@gmail.com>.

Hello Erik,
that would be a very valuable contribution indeed!

The common way of contributing code is creating a patch file which contains
the differences between your current working copy and the latest revision
available in SVN; you can check better how to do this at
http://www.apache.org/dev/contributors.html#patches .
Then you create a Jira issue under the UIMA project [1] and attach the
created file to the issue.
At that point a committer will review your patch and will commit it if
everything is fine :)

 As a side note if you want to use Solr within a UIMA pipeline you could be
interested in Solrcas [2] or in the Solr-UIMA integration available in Solr
3.1.0 release [3].

Hope this helps,
Tommaso

[1] : https://issues.apache.org/jira/browse/UIMA
[2] : http://uima.apache.org/sandbox.html#solrcas.consumer
[3] : http://wiki.apache.org/solr/SolrUIMA

2011/4/13 Erik Fäßler <er...@uni-jena.de>

>  Hey all,
>
> back in January, I had the need to have the CAS Lucene indexer (LuCas, UIMA
> Sandbox component) working with Lucene 2.9.x. So I checked it out from the
> Sandbox SVN, updated the libraries and fixed the compiling bugs. The result
> is a LuCas component working with Lucene 2.9.3. At least all tests are
> working and I used the component (together with Solr which was why I needed
> Lucene 2.9.x) successfully.
> The changes needed were not too big as I did not take the leap to Lucene
> 3.x. Some filters have been updated to the new Token API and one or two
> classes required a more or less complete rewrite until the tests would work
> again.
>
> So, my question: Would it be desirable to commit these changes back to the
> Sandbox SVN? Which steps would have I have to take for this? Or should I
> just send my sources to a developer? The component has been created in my
> lab originally, but the developer has moved to another working place quite a
> while ago.
>
> Best regards,
>
>    Erik
>