You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Robert Selvaraj <rs...@searchblox.com> on 2004/02/16 12:49:41 UTC
SearchBlox J2EE Search Component Version 1.2 released
SearchBlox is a J2EE Search Component that delivers out-of-the-box
search functionality for quick integration with your websites,
applications, intranets and portals. SearchBlox uses the Lucene Search
API and incorporates integrated HTTP and File System crawlers, support
for various document formats, support for indexing and searching content
in 17 languages and customizable search results, all controlled from a
browser-based Admin Console.
Main features in this release:
==============================
- Support for PowerPoint documents
- Support for Polish and Russian language content
- Enhanced HTML document parsing.
- The FREE and BASIC Editions of SearchBlox now support all document
formats including Word, PDF, Excel and PowerPoint
SearchBlox is available as a Web Archive (WAR) and is deployable on any
Servlet 2.3/JSP 1.2 compliant server. Features in the product include:
Content Features
================
- Crawlers: can index both HTTP and File-system based content
- Languages: supports English, Spanish, French, German, Italian, Danish,
Dutch, Finnish, Norwegian, Polish, Portuguese, Russian, Swedish,
Japanese, Korean, Chinese(Simplified), Chinese(Traditional)
- Stopwords: separate stopword list for each supported language
- File Types: supports HTML, Word, Excel, PDF, Text, RTF, PowerPoint
- MetaTags: supports standard meta tag fields (title, description, keyword)
Administrator Features
======================
- Web-based Admin Console: easy to use and intuitive console to manage
all aspects of the Search application
- Collections: create up to 5 document collections with customized settings
- Look & Feel: search results customizable using XSLT stylesheets. Can
also be delivered as XML
- Reporting: real-time reporting with weekly, daily and hourly result
sets, top queries and zero match queries
- Schedulers: flexible scheduling of indexing and index-refresh operations
End User Features
=================
- Advanced Search: supports Boolean AND, OR, and NOT searches, Fuzzy and
fielded query searches
- Sort: search results can be sorted by date, relevance or alphabetically.
- Hit Highlighting: query terms are highlighted on content title and
description
- Collections: users can limit search to specific collections
SearchBlox Getting-Started Guides are available for the following servers:
JBoss -http://www.searchblox.com/gettingstarted_jboss.html
Jetty - http://www.searchblox.com/gettingstarted_jetty.html
JRun - http://www.searchblox.com/gettingstarted_jrun.html
Pramati - http://www.searchblox.com/gettingstarted_pramati.html
Resin - http://www.searchblox.com/gettingstarted_resin.html
Sun - http://www.searchblox.com/gettingstarted_sun.html
Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html
Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html
Websphere - http://www.searchblox.com/gettingstarted_websphere.html
The SearchBlox FREE Edition is available free of charge and can index up
to 1000 documents.
The software can be downloaded from http://www.searchblox.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Some results for the language guesser
Posted by Jean-Francois Halleux <ha...@skynet.be>.
Hello,
I found some time to do some qualitative testing with the language guesser
I contributed some time ago (available in the patch queue :)
I tried with language references for da, de, en, fr, nl, sv. I picked at
random strings of varying length from a reference document in a specific
language and measured the probability to get it right. Here are some
results.
For French
----------
Length:Probability X 10000
30:9954 (means for String length 30, 99.54% chance that it returns French)
25:9926
20:9890
15:9789
10:9426
9:9209
8:9032
7:8852
6:8544
5:8085
4:7585
3:6732
For English
-----------
30:9960
25:9929
20:9848
10:8983
9:8801
8:8557
7:8240
6:7853
5:7356
4:6523
3:5733
For Danish
----------
30:9854
25:9853
20:9813
15:9664
10:9086
9:8924
8:8738
7:8340
6:7878
5:7374
4:6489
3:5630
For German
----------
30:9935
25:9922
20:9868
15:9715
10:9281
9:9117
8:8921
7:8582
6:8123
5:7545
4:6666
3:5568
Have fun,
Jean-Francois Halleux
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org