You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Simon Willnauer <si...@googlemail.com> on 2011/09/09 11:02:42 UTC

Checkout SearchWorkings.org - it just went live!

Hey folks,

Some of you might have heard, myself and a small group of other
passionate search technology professionals have been working hard in
the last few months to launch a community site known as
SearchWorkings.org [1]. This initiative has been set up for other
search professionals to have a single point of contact or
comprehensive resource where one can learn and talk about all the
exciting new developments in the world of open source search.

Anyone like yourselves familiar with open source search knows that
technologies like Lucene and Solr have grown tremendously in
popularity over the years, but with this growth there have also come a
number of challenges, such as limited support and education. With the
launch of SearchWorkings.org we are convinced we will overcome and
resolve some of these challenges.

Covering open source search technologies from Apache Lucene and Apache
Solr to Apache Mahout, one of the key objectives for the community is
to create a place where search specialists can engage with one another
and enjoy a single point of contact for various resources, downloads
and documentation.

Like any other community website, content will be added on a regular
basis and community members can also make their own contributions and
stay on top of everything search related too. For now, there is access
to a extensive resource centre offering online tutorials, downloads,
white papers and access to a host of search specialists in the forum.
With the ability to post blog items and keep up to date with relevant
news, the site is a search specialists dream come true and addresses
what we felt was a clear need in the market.

Searchworkings.org starts off with an initial focus on Lucene, Solr &
Friends but aims to be much broader. Each of you can & should
contribute, tell us their search, data-processing, setup or
optimization story. I am looking forward to more and more blogs,
articles and tutorials about smaller projects like Apache Lucy, real
world case-studies or 3rd party extensions for OSS Search components.

have fun,

Simon

[1] http://www.searchworkings.org
[2] Trademark Acknowledgement: Apache Lucene, Apache Solr, Apache
Mahout and Apache Lucy respective logos are trademarks of The Apache
Software Foundation. All other marks mentioned may be trademarks or
registered trademarks of their respective owners.

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Posted by "Jaeger, Jay - DOT" <Ja...@dot.wi.gov>.
Looking at the Wiki  ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters ), it looks like the solr.StandardTokenizerFactory changed with Solr 3.1 .

We use solr.KeyWordTokenizerFactory for our middle names (and then also throw in solr.LowerCaseFilterFactory to normalize to lower case).  It treats the entire field as a single token, and in general doesn't "futz" with what came in.

You might try the analyzer panel on the admin web page to see what exactly is happening during indexing and analysis.

JRJ

-----Original Message-----
From: Marc Des Garets [mailto:marc.desgarets@192.com] 
Sent: Friday, September 09, 2011 5:21 AM
To: solr-user@lucene.apache.org
Subject: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Hi,

I have a simple field defined like this:
    <fieldtype name="text" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
    </fieldtype>

Which I use here:
   <field name="middlename" type="text" indexed="true" stored="true" required="false" />

In solr 1.4, I could do:
?q=(middlename:a*)

And I was getting all documents where middlename = A or where middlename starts by the letter A.

In solr 3.3, I get only results where middlename starts by the letter A but not where middlename is equal to A.

The thing is this happens only with the letter A, with other letters, it is fine, I get the ones starting by the letter and the ones equal to the letter. My guess is that it considers A as the English article but I do not specify any filter with stopwords so how come the behaviour with the letter A is different from the other letters? Is there a bug? How can I change my field to work with the letter A, the same way it does with other letters.


Thanks,
Marc
----------------------------------------------------------
This transmission is strictly confidential, possibly legally privileged, and intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.  If you 
are not the intended recipient then you must not disclose, copy or take any action in reliance of this 
transmission. If you have received this transmission in error, please notify the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Posted by Marc Des Garets <ma...@192.com>.
Ok thanks, I don't know why the behaviour is different from my 1.4 index then but hopefully it will be the same by doing what you tell me.

Thanks again,

Marc

-----Original Message-----
From: Steven A Rowe [mailto:sarowe@syr.edu] 
Sent: 09 September 2011 14:40
To: solr-user@lucene.apache.org
Subject: RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Hi Marc,

StandardAnalyzer includes StopFilter.  See the Javadocs for Lucene 3.3 here: <http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/analysis/standard/StandardAnalyzer.html>

This is not new behavior - StandardAnalyzer in Lucene 2.9.1 (the version of Lucene bundled with Solr 1.4) also includes a StopFilter: <http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/analysis/standard/StandardAnalyzer.html>

If you don't want a StopFilter configured, you can specify the individual components directly, e.g. to get the equivalent of StandardAnalyzer, but without the StopFilter:

<fieldtype name="text" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldtype>

Steve

> -----Original Message-----
> From: Marc Des Garets [mailto:marc.desgarets@192.com]
> Sent: Friday, September 09, 2011 6:21 AM
> To: solr-user@lucene.apache.org
> Subject: question about StandardAnalyzer, differences between solr 1.4
> and solr 3.3
> 
> Hi,
> 
> I have a simple field defined like this:
>     <fieldtype name="text" class="solr.TextField">
>       <analyzer
> class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>     </fieldtype>
> 
> Which I use here:
>    <field name="middlename" type="text" indexed="true" stored="true"
> required="false" />
> 
> In solr 1.4, I could do:
> ?q=(middlename:a*)
> 
> And I was getting all documents where middlename = A or where middlename
> starts by the letter A.
> 
> In solr 3.3, I get only results where middlename starts by the letter A
> but not where middlename is equal to A.
> 
> The thing is this happens only with the letter A, with other letters, it
> is fine, I get the ones starting by the letter and the ones equal to the
> letter. My guess is that it considers A as the English article but I do
> not specify any filter with stopwords so how come the behaviour with the
> letter A is different from the other letters? Is there a bug? How can I
> change my field to work with the letter A, the same way it does with
> other letters.
> 
> 
> Thanks,
> Marc
> ----------------------------------------------------------
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take
> any action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.
----------------------------------------------------------
This transmission is strictly confidential, possibly legally privileged, and intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.  If you 
are not the intended recipient then you must not disclose, copy or take any action in reliance of this 
transmission. If you have received this transmission in error, please notify the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Marc,

StandardAnalyzer includes StopFilter.  See the Javadocs for Lucene 3.3 here: <http://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/analysis/standard/StandardAnalyzer.html>

This is not new behavior - StandardAnalyzer in Lucene 2.9.1 (the version of Lucene bundled with Solr 1.4) also includes a StopFilter: <http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/analysis/standard/StandardAnalyzer.html>

If you don't want a StopFilter configured, you can specify the individual components directly, e.g. to get the equivalent of StandardAnalyzer, but without the StopFilter:

<fieldtype name="text" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldtype>

Steve

> -----Original Message-----
> From: Marc Des Garets [mailto:marc.desgarets@192.com]
> Sent: Friday, September 09, 2011 6:21 AM
> To: solr-user@lucene.apache.org
> Subject: question about StandardAnalyzer, differences between solr 1.4
> and solr 3.3
> 
> Hi,
> 
> I have a simple field defined like this:
>     <fieldtype name="text" class="solr.TextField">
>       <analyzer
> class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
>     </fieldtype>
> 
> Which I use here:
>    <field name="middlename" type="text" indexed="true" stored="true"
> required="false" />
> 
> In solr 1.4, I could do:
> ?q=(middlename:a*)
> 
> And I was getting all documents where middlename = A or where middlename
> starts by the letter A.
> 
> In solr 3.3, I get only results where middlename starts by the letter A
> but not where middlename is equal to A.
> 
> The thing is this happens only with the letter A, with other letters, it
> is fine, I get the ones starting by the letter and the ones equal to the
> letter. My guess is that it considers A as the English article but I do
> not specify any filter with stopwords so how come the behaviour with the
> letter A is different from the other letters? Is there a bug? How can I
> change my field to work with the letter A, the same way it does with
> other letters.
> 
> 
> Thanks,
> Marc
> ----------------------------------------------------------
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take
> any action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.

question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

Posted by Marc Des Garets <ma...@192.com>.
Hi,

I have a simple field defined like this:
    <fieldtype name="text" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
    </fieldtype>

Which I use here:
   <field name="middlename" type="text" indexed="true" stored="true" required="false" />

In solr 1.4, I could do:
?q=(middlename:a*)

And I was getting all documents where middlename = A or where middlename starts by the letter A.

In solr 3.3, I get only results where middlename starts by the letter A but not where middlename is equal to A.

The thing is this happens only with the letter A, with other letters, it is fine, I get the ones starting by the letter and the ones equal to the letter. My guess is that it considers A as the English article but I do not specify any filter with stopwords so how come the behaviour with the letter A is different from the other letters? Is there a bug? How can I change my field to work with the letter A, the same way it does with other letters.


Thanks,
Marc
----------------------------------------------------------
This transmission is strictly confidential, possibly legally privileged, and intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.  If you 
are not the intended recipient then you must not disclose, copy or take any action in reliance of this 
transmission. If you have received this transmission in error, please notify the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.