You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Giovanni Novelli <gi...@gmail.com> on 2005/04/05 19:30:18 UTC

Re: PHP-Lucene Integration

As Lucene native language is Java it should be more natural to access its 
functionalities through JSP; anyway the idea of accessing Lucene 
functionalities seems interesting as PHP is perhaps most widely deployed 
server side scripting language. 
I think that the way to provide access to Lucene API in PHP development 
should be more general and clean as possible, so in my opinion the natural 
way should be based on a single layer that interoperates with Lucene API: an 
Apache module. Then should be needed a PHP API to call such module from PHP. 

Having an Apache module for Lucene as a component of Lucene project should 
allow the spread of Lucene not only in PHP development arena.

On Mar 27, 2005 5:49 AM, Owen Densmore <ow...@backspaces.net> wrote:
> 
> Thanks all for the interesting responses. Sorry for being a bit late
> in responding!
> 
> -- Owen
> 
> Owen Densmore - http://backspaces.net - http://redfish.com -
> owen@backspaces.net
> 
> Begin forwarded message:
> 
> > From: "Philippe Ombredanne" <po...@nexb.com>
> > Subject: RE: PHP-Lucene Integration
> >
> > Owen,
> > very interesting!
> > Anything (code) you can share?
> 
> Hi Philippe. We will definitely make our code available. I suspect,
> however, it is not terribly interesting. But if simply useful as a
> "case study" that would still be good.
> 
> > From: Dawid Weiss <da...@cs.put.poznan.pl>
> > Subject: Re: PHP-Lucene Integration
> >
> > Your implementation and ideas sound very interesting, Owen. Can we see
> > the system anywhere in public (and play with it?)
> 
> We'll send a link to the site fairly soon. We're having our final
> review tomorrow, and should have a good idea when we can let folks look
> at it.
> 
> >> We are hoping the institute can afford to have us work on true
> >> clustering techniques such as Carrot2 uses. (Thanks to Dawid and all
> >> the Poznan University folks who's papers were so stimulating!)
> >
> > You are very welcome. We are also academic, so in the feeling of
> > brotherhood we might help you set up a demo on-line clustering server
> > free of charge. There really is not better clustering technique than
> > the one devised to a particular problem and it seems like you found
> > that niche. Although it's always worth experimenting with other stuff
> > just for the sake of comparison. Just let me know if you're interested
> > (if we can access the 'feed' of those plain search results I can set
> > up the clustering demo in a few minutes, really).
> 
> This would be really great! Indeed, we'd like to help SFI to be a bit
> more involved with exploring their collection with innovative, research
> oriented methods.
> 
> Some of the staff at SFI are excited by DSpace, for example, and we'd
> be interested in helping them explore its use in the lucene/clustering
> context. That, and their use of Dublin Core for cataloging their
> future work might be of general interest here in the mail list too.
> 
> > > We did do a
> >> quick LSA SVD on a random set of the papers to see what the
> >> performance (both CPU and good clustering) would be like. Our
> >> results are encouraging, and I think the frequent phrases approach
> >> would be best for this collection.
> >
> > It is always going to be challanging if you attempt to cluster the
> > entire collection, you know. I'm (or rather: I will be) working on
> > algorithm's extensions to deal with full text documents.
> 
> We're mainly using Abstracts and other meta data (Title, Authors, Key
> phrases, Abstracts, Dates, and so on). These are reasonably small:
> Abstracts are 150 words on the average over the current 1122 document
> collection. If we include the title and key phrases, we get 172
> words/doc.
> 
> I suspect we could safely limit the abstracts to the first few
> sentences too, getting us to a much smaller number. Indeed, if we
> tossed the abstracts altogether, and used just titles and key phrases,
> we're down to less than 20 words/doc! I bet simply using reasonable
> preprocessing we could get small enough "snippets" as to be workable.
> 
> > Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

Re: PHP-Lucene Integration

Posted by Andi Vajda <an...@osafoundation.org>.

As an alternative, you could also take the approach taken for PyLucene: 
compile the Java code with GCJ and generate bindings for Python with SWIG.
SWIG supports a number of languages in addition to Python such as Ruby, PHP, 
Perl, and a bunch more.

For more information, see:
   http://pylucene.osafoundation.org
   http://www.python.org/pycon/2005/papers/27/paper.txt
   http://www.swig.org
   http://gcc.gnu.org/java

As a matter of fact, a team of people is working on such a construction for 
Ruby at the moment.

Andi..


On Tue, 5 Apr 2005, Giovanni Novelli wrote:

> As Lucene native language is Java it should be more natural to access its
> functionalities through JSP; anyway the idea of accessing Lucene
> functionalities seems interesting as PHP is perhaps most widely deployed
> server side scripting language.
> I think that the way to provide access to Lucene API in PHP development
> should be more general and clean as possible, so in my opinion the natural
> way should be based on a single layer that interoperates with Lucene API: an
> Apache module. Then should be needed a PHP API to call such module from PHP.
>
> Having an Apache module for Lucene as a component of Lucene project should
> allow the spread of Lucene not only in PHP development arena.
>
> On Mar 27, 2005 5:49 AM, Owen Densmore <ow...@backspaces.net> wrote:
>>
>> Thanks all for the interesting responses. Sorry for being a bit late
>> in responding!
>>
>> -- Owen
>>
>> Owen Densmore - http://backspaces.net - http://redfish.com -
>> owen@backspaces.net
>>
>> Begin forwarded message:
>>
>>> From: "Philippe Ombredanne" <po...@nexb.com>
>>> Subject: RE: PHP-Lucene Integration
>>>
>>> Owen,
>>> very interesting!
>>> Anything (code) you can share?
>>
>> Hi Philippe. We will definitely make our code available. I suspect,
>> however, it is not terribly interesting. But if simply useful as a
>> "case study" that would still be good.
>>
>>> From: Dawid Weiss <da...@cs.put.poznan.pl>
>>> Subject: Re: PHP-Lucene Integration
>>>
>>> Your implementation and ideas sound very interesting, Owen. Can we see
>>> the system anywhere in public (and play with it?)
>>
>> We'll send a link to the site fairly soon. We're having our final
>> review tomorrow, and should have a good idea when we can let folks look
>> at it.
>>
>>>> We are hoping the institute can afford to have us work on true
>>>> clustering techniques such as Carrot2 uses. (Thanks to Dawid and all
>>>> the Poznan University folks who's papers were so stimulating!)
>>>
>>> You are very welcome. We are also academic, so in the feeling of
>>> brotherhood we might help you set up a demo on-line clustering server
>>> free of charge. There really is not better clustering technique than
>>> the one devised to a particular problem and it seems like you found
>>> that niche. Although it's always worth experimenting with other stuff
>>> just for the sake of comparison. Just let me know if you're interested
>>> (if we can access the 'feed' of those plain search results I can set
>>> up the clustering demo in a few minutes, really).
>>
>> This would be really great! Indeed, we'd like to help SFI to be a bit
>> more involved with exploring their collection with innovative, research
>> oriented methods.
>>
>> Some of the staff at SFI are excited by DSpace, for example, and we'd
>> be interested in helping them explore its use in the lucene/clustering
>> context. That, and their use of Dublin Core for cataloging their
>> future work might be of general interest here in the mail list too.
>>
>>>> We did do a
>>>> quick LSA SVD on a random set of the papers to see what the
>>>> performance (both CPU and good clustering) would be like. Our
>>>> results are encouraging, and I think the frequent phrases approach
>>>> would be best for this collection.
>>>
>>> It is always going to be challanging if you attempt to cluster the
>>> entire collection, you know. I'm (or rather: I will be) working on
>>> algorithm's extensions to deal with full text documents.
>>
>> We're mainly using Abstracts and other meta data (Title, Authors, Key
>> phrases, Abstracts, Dates, and so on). These are reasonably small:
>> Abstracts are 150 words on the average over the current 1122 document
>> collection. If we include the title and key phrases, we get 172
>> words/doc.
>>
>> I suspect we could safely limit the abstracts to the first few
>> sentences too, getting us to a much smaller number. Indeed, if we
>> tossed the abstracts altogether, and used just titles and key phrases,
>> we're down to less than 20 words/doc! I bet simply using reasonable
>> preprocessing we could get small enough "snippets" as to be workable.
>>
>>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org