You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Otis Gospodnetic <og...@yahoo.com> on 2011/01/06 09:19:39 UTC

Re: The Constellio team is proud to release its version 1.1

I think this is a good question and I'd be curious what the answer is, too.
Rida, could you please shed some light on this crawler side of Constellio?

This is also interesting because LWE chose Aperture's crawler instead of Nutch, 
even though Andrzej works for Lucid.  How come?  Is Nutch simply too big and 
complex, while Aperture's stuff is more suitable for typical non-Web-scale 
crawling needs of a typical enterprise/LWE customer?


Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>
>From: Davide Cavalaglio <da...@desktopsrl.com>
>To: dev@nutch.apache.org
>Sent: Tue, December 28, 2010 7:08:27 AM
>Subject: Re: The Constellio team is proud to release its version 1.1
>
>Hi,
>but the crawler used by Constellio is Nutch?
>
>
>2010/12/20 Rida Benjelloun <ri...@doculibre.com>
>
>The Constellio team is proud to release its version 1.1 
>>
>>Constellio Open Source Enterprise Search is based on Apache Solr and using 
>>Google Search Appliances connectors architecture, it allows, with a single 
>>click, to find all relevant content in your organization (Web, email, ECM, CRM 
>>etc.). 
>>
>>Please be advised that the GPL v.3.0 Constellio licence has been changed for  
>>the version LGPL v.3.0. 
>>
>>
>>The new licence LGPL v.3.0 gives more flexibility to developers interested in 
>>plugs-in/modules development or the integration of Constellio to other 
>>solutions. The SVN (svn.constellio.com) and the issue tracker ( 
>>issues.constellio.com) are now also open. 
>>
>>
>>Many important  changes have been done in this new version. 
>>
>>Here are some of new features developed in the 1.1 version: 
>>
>>   - Constellio multi-platform installer 
>>   - Federeted search 
>>   - Document security 
>>   - Autocomplete for simple search base on most popular queries 
>>   - Configurable advanced search interface and autocomplete based on field 
>>content 
>>
>>   - Solr connector (upload your schema.xml and content - xml and binary - 
>>files) 
>>
>>   - Activation of Solr HTTP Web services and make Constellio spell checker 
>>available through these services 
>>
>>   - Implementation of multiselect faceting 
>>   - Configuration of display fields 
>>   - Documents consultation used in the relevance calculation of search  results 
>>
>>   - Add field boost, document boost, and Solr dismax (relevance) 
>>   - Add Carrot2 for faceting 
>>   - Web crawler improvements 
>>   - Add new theme 
>>   - and more ... 
>>
Your comments/suggestions are also welcomed ! 
>>
>>
>>
>>-- 
>>---------------------------------------------------------
>>Rida Benjelloun
>>Constellio -  Doculibre
>>ridabenjelloun@apache.org
>>rida.benjelloun@doculibre.com
>>---------------------------------------------------------
>>
>

Re: The Constellio team is proud to release its version 1.1

Posted by work only <vo...@gmail.com>.
Love Constellio admin interface, easy to use :)



On Thu, Jan 6, 2011 at 9:13 AM, Rida Benjelloun <
rida.benjelloun@doculibre.com> wrote:

> Hi,
>
> We developed our own crawler.
>
> It's a lightweight crawler, conforming to the Google Connector Manager
> architecture.
>
> However, some neat features of the crawler:
> - Near real-time indexing. New pages are indexed seconds after they are
> crawled.
> - On demand pages. These pages are crawled in higher priority.
> - Depth control between recrawls (prevents loops)
> - Based on HtmlUnit, which supports JavaScript.
>
> Regards.
>
>
> On Thu, Jan 6, 2011 at 3:19 AM, Otis Gospodnetic <og...@yahoo.com>wrote:
>
>> I think this is a good question and I'd be curious what the answer is,
>> too.
>> Rida, could you please shed some light on this crawler side of Constellio?
>>
>> This is also interesting because LWE chose Aperture's crawler instead of
>> Nutch, even though Andrzej works for Lucid.  How come?  Is Nutch simply too
>> big and complex, while Aperture's stuff is more suitable for typical
>> non-Web-scale crawling needs of a typical enterprise/LWE customer?
>>
>> Thanks,
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>> *From:* Davide Cavalaglio <da...@desktopsrl.com>
>> *To:* dev@nutch.apache.org
>> *Sent:* Tue, December 28, 2010 7:08:27 AM
>> *Subject:* Re: The Constellio team is proud to release its version 1.1
>>
>> Hi,
>> but the crawler used by Constellio is Nutch?
>>
>> 2010/12/20 Rida Benjelloun <ri...@doculibre.com>
>>
>>> The Constellio team is proud to release its version 1.1
>>>
>>> Constellio Open Source Enterprise Search is based on Apache Solr and
>>> using Google Search Appliances connectors architecture, it allows, with a
>>> single click, to find all relevant content in your organization (Web, email,
>>> ECM, CRM etc.).
>>>
>>> Please be advised that the GPL v.3.0 Constellio licence has been changed
>>> for the version LGPL v.3.0.
>>>
>>> The new licence LGPL v.3.0 gives more flexibility to developers
>>> interested in plugs-in/modules development or the integration of Constellio
>>> to other solutions. The SVN (svn.constellio.com) and the issue tracker (
>>> issues.constellio.com) are now also open.
>>>
>>> Many important changes have been done in this new version.
>>>
>>> Here are some of new features developed in the 1.1 version:
>>>
>>>    - Constellio multi-platform installer
>>>    - Federeted search
>>>    - Document security
>>>    - Autocomplete for simple search base on most popular queries
>>>    - Configurable advanced search interface and autocomplete based on
>>> field content
>>>    - Solr connector (upload your schema.xml and content - xml and binary
>>> - files)
>>>    - Activation of Solr HTTP Web services and make Constellio spell
>>> checker available through these services
>>>    - Implementation of multiselect faceting
>>>    - Configuration of display fields
>>>    - Documents consultation used in the relevance calculation of search
>>> results
>>>    - Add field boost, document boost, and Solr dismax (relevance)
>>>    - Add Carrot2 for faceting
>>>    - Web crawler improvements
>>>    - Add new theme
>>>    - and more ...
>>>  Your comments/suggestions are also welcomed !
>>>
>>>
>>>
>>> --
>>> ---------------------------------------------------------
>>> Rida Benjelloun
>>> Constellio -  Doculibre
>>> ridabenjelloun@apache.org
>>> rida.benjelloun@doculibre.com
>>> ---------------------------------------------------------
>>>
>>
>>
>
>
> --
> ---------------------------------------------------------
> Rida Benjelloun
> Constellio -  Doculibre
> ridabenjelloun@apache.org
> rida.benjelloun@doculibre.com
> ---------------------------------------------------------
>

Re: The Constellio team is proud to release its version 1.1

Posted by Rida Benjelloun <ri...@doculibre.com>.
Hi,

We developed our own crawler.

It's a lightweight crawler, conforming to the Google Connector Manager
architecture.

However, some neat features of the crawler:
- Near real-time indexing. New pages are indexed seconds after they are
crawled.
- On demand pages. These pages are crawled in higher priority.
- Depth control between recrawls (prevents loops)
- Based on HtmlUnit, which supports JavaScript.

Regards.

On Thu, Jan 6, 2011 at 3:19 AM, Otis Gospodnetic <og...@yahoo.com>wrote:

> I think this is a good question and I'd be curious what the answer is, too.
> Rida, could you please shed some light on this crawler side of Constellio?
>
> This is also interesting because LWE chose Aperture's crawler instead of
> Nutch, even though Andrzej works for Lucid.  How come?  Is Nutch simply too
> big and complex, while Aperture's stuff is more suitable for typical
> non-Web-scale crawling needs of a typical enterprise/LWE customer?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> *From:* Davide Cavalaglio <da...@desktopsrl.com>
> *To:* dev@nutch.apache.org
> *Sent:* Tue, December 28, 2010 7:08:27 AM
> *Subject:* Re: The Constellio team is proud to release its version 1.1
>
> Hi,
> but the crawler used by Constellio is Nutch?
>
> 2010/12/20 Rida Benjelloun <ri...@doculibre.com>
>
>> The Constellio team is proud to release its version 1.1
>>
>> Constellio Open Source Enterprise Search is based on Apache Solr and using
>> Google Search Appliances connectors architecture, it allows, with a single
>> click, to find all relevant content in your organization (Web, email, ECM,
>> CRM etc.).
>>
>> Please be advised that the GPL v.3.0 Constellio licence has been changed
>> for the version LGPL v.3.0.
>>
>> The new licence LGPL v.3.0 gives more flexibility to developers interested
>> in plugs-in/modules development or the integration of Constellio to other
>> solutions. The SVN (svn.constellio.com) and the issue tracker (
>> issues.constellio.com) are now also open.
>>
>> Many important changes have been done in this new version.
>>
>> Here are some of new features developed in the 1.1 version:
>>
>>    - Constellio multi-platform installer
>>    - Federeted search
>>    - Document security
>>    - Autocomplete for simple search base on most popular queries
>>    - Configurable advanced search interface and autocomplete based on
>> field content
>>    - Solr connector (upload your schema.xml and content - xml and binary -
>> files)
>>    - Activation of Solr HTTP Web services and make Constellio spell
>> checker available through these services
>>    - Implementation of multiselect faceting
>>    - Configuration of display fields
>>    - Documents consultation used in the relevance calculation of search
>> results
>>    - Add field boost, document boost, and Solr dismax (relevance)
>>    - Add Carrot2 for faceting
>>    - Web crawler improvements
>>    - Add new theme
>>    - and more ...
>>  Your comments/suggestions are also welcomed !
>>
>>
>>
>> --
>> ---------------------------------------------------------
>> Rida Benjelloun
>> Constellio -  Doculibre
>> ridabenjelloun@apache.org
>> rida.benjelloun@doculibre.com
>> ---------------------------------------------------------
>>
>
>


-- 
---------------------------------------------------------
Rida Benjelloun
Constellio -  Doculibre
ridabenjelloun@apache.org
rida.benjelloun@doculibre.com
---------------------------------------------------------