You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Robert Selvaraj <rs...@searchblox.com> on 2004/02/16 12:49:41 UTC

SearchBlox J2EE Search Component Version 1.2 released

SearchBlox is a J2EE Search Component that delivers out-of-the-box 
search functionality for quick integration with your websites, 
applications, intranets and portals. SearchBlox uses the Lucene Search 
API and incorporates integrated HTTP and File System crawlers, support 
for various document formats, support for indexing and searching content 
in 17 languages and customizable search results, all controlled from a 
browser-based Admin Console.


Main features in this release:
==============================
- Support for PowerPoint documents
- Support for Polish and Russian language content
- Enhanced HTML document parsing.
- The FREE and BASIC Editions of SearchBlox now support all document 
formats including Word, PDF, Excel and PowerPoint

SearchBlox is available as a Web Archive (WAR) and is deployable on any 
Servlet 2.3/JSP 1.2 compliant server. Features in the product include:

Content Features
================
- Crawlers: can index both HTTP and File-system based content
- Languages: supports English, Spanish, French, German, Italian, Danish, 
Dutch, Finnish, Norwegian, Polish, Portuguese, Russian, Swedish, 
Japanese, Korean, Chinese(Simplified), Chinese(Traditional)
- Stopwords: separate stopword list for each supported language
- File Types: supports HTML, Word, Excel, PDF, Text, RTF, PowerPoint
- MetaTags: supports standard meta tag fields (title, description, keyword)


Administrator Features
======================
- Web-based Admin Console: easy to use and intuitive console to manage 
all aspects of the Search application
- Collections: create up to 5 document collections with customized settings
- Look & Feel: search results customizable using XSLT stylesheets. Can 
also be delivered as XML
- Reporting: real-time reporting with weekly, daily and hourly result 
sets, top queries and zero match queries
- Schedulers: flexible scheduling of indexing and index-refresh operations


End User Features
=================
- Advanced Search: supports Boolean AND, OR, and NOT searches, Fuzzy and 
fielded query searches
- Sort: search results can be sorted by date, relevance or alphabetically.
- Hit Highlighting: query terms are highlighted on content title and 
description
- Collections: users can limit search to specific collections


SearchBlox Getting-Started Guides are available for the following servers:

JBoss -http://www.searchblox.com/gettingstarted_jboss.html
Jetty - http://www.searchblox.com/gettingstarted_jetty.html
JRun - http://www.searchblox.com/gettingstarted_jrun.html
Pramati - http://www.searchblox.com/gettingstarted_pramati.html
Resin - http://www.searchblox.com/gettingstarted_resin.html
Sun - http://www.searchblox.com/gettingstarted_sun.html
Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html
Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html
Websphere - http://www.searchblox.com/gettingstarted_websphere.html

The SearchBlox FREE Edition is available free of charge and can index up 
to 1000 documents.

The software can be downloaded from http://www.searchblox.com






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Some results for the language guesser

Posted by Jean-Francois Halleux <ha...@skynet.be>.
Hello,

	I found some time to do some qualitative testing with the language guesser
I contributed some time ago (available in the patch queue :)

I tried with language references for da, de, en, fr, nl, sv. I picked at
random strings of varying length from a reference document in a specific
language and measured the probability to get it right. Here are some
results.

For French
----------

Length:Probability X 10000

30:9954 (means for String length 30, 99.54% chance that it returns French)
25:9926
20:9890
15:9789
10:9426
9:9209
8:9032
7:8852
6:8544
5:8085
4:7585
3:6732

For English
-----------

30:9960
25:9929
20:9848
10:8983
9:8801
8:8557
7:8240
6:7853
5:7356
4:6523
3:5733

For Danish
----------

30:9854
25:9853
20:9813
15:9664
10:9086
9:8924
8:8738
7:8340
6:7878
5:7374
4:6489
3:5630

For German
----------

30:9935
25:9922
20:9868
15:9715
10:9281
9:9117
8:8921
7:8582
6:8123
5:7545
4:6666
3:5568


Have fun,

Jean-Francois Halleux


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org