You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Smith,Devon" <sm...@oclc.org> on 2007/02/02 19:58:21 UTC

Custom Tokenizer

Hi,

I'm trying to get a custom tokenizer working, but I'm having some
problems. Per the instructions on various pages [1][2], I've been able
to develop and build the factory and tokenizer. However, when I start
solr up, I get a stack trace, that says "java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory" That's really confusing.

Any thoughts on what I'm missing/doing wrong?

Devon

[1] http://wiki.apache.org/solr/SolrPlugins
[2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

... 
Feb 2, 2007 1:40:53 PM org.apache.solr.schema.IndexSchema readConfig
INFO: Schema name=mapstore
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.mortbay.start.Main.invokeMain(Main.java:151)
        at org.mortbay.start.Main.start(Main.java:476)
        at org.mortbay.start.Main.main(Main.java:94)
Caused by: java.lang.NoClassDefFoundError:
org/apache/solr/analysis/BaseTokenizerFactory
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
        at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at
org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
        at
java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:594)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at org.apache.solr.core.Config.findClass(Config.java:192)
        at org.apache.solr.core.Config.newInstance(Config.java:213)
        at
org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
:504)
        at
org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:478)
        at
org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:296)
        at
org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:69)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:191)
        at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:172)
        at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:72)
        at javax.servlet.GenericServlet.init(GenericServlet.java:168)
        at
org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:3
83)
        at
org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)
        at
org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHandl
er.java:446)
        at
org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
plicationHandler.java:321)
        at
org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
ntext.java:509)
        at org.mortbay.util.Container.start(Container.java:72)
        at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
        at org.mortbay.util.Container.start(Container.java:72)
        at org.mortbay.jetty.Server.main(Server.java:460)
        ... 7 more

--
Devon Smith <sm...@oclc.org>
Senior Software Engineer, Office of Research OCLC Online Computer
Library Center, Inc http://www.oclc.org/research/
http://www.oclc.org/research/staff/smith.htm

Re: Custom Tokenizer

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 3, 2007, at 11:18 AM, Yonik Seeley wrote:
> Hmmm, classloader hell...

Yeah, I had a bad feeling about that external lib thing.  It's a holy  
grail to allow dynamic pluggability in Java, but its much more  
difficult than it perhaps should be.

> I assume you are putting your analyzer in solr/lib?
>
> Perhaps try to explode the solr webapp and put your custom analyzer
> directly in WEB-INF/lib/

I recommended this to Devon in the #code4lib room as well when he  
mentioned this to me.  I'm curious to see how this resolves, as it  
would be mighty handy to allow external classes but from past  
experiences with classloaders I'd be surprised if this works out as  
well as we'd like.

	Erik

RE: Custom Tokenizer

Posted by "Smith,Devon" <sm...@oclc.org>.

Based on Erik's suggestion, this is exactly what I did and it worked.
Good to know for the future that the solr/lib thing is working yet.

/dev 

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
Seeley
Sent: Saturday, February 03, 2007 11:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Custom Tokenizer

Hmmm, classloader hell...
I assume you are putting your analyzer in solr/lib?

Perhaps try to explode the solr webapp and put your custom analyzer
directly in WEB-INF/lib/

-Yonik

On 2/2/07, Smith,Devon <sm...@oclc.org> wrote:
> Hi,
>
> I'm trying to get a custom tokenizer working, but I'm having some 
> problems. Per the instructions on various pages [1][2], I've been able

> to develop and build the factory and tokenizer. However, when I start 
> solr up, I get a stack trace, that says
"java.lang.NoClassDefFoundError:
> org/apache/solr/analysis/BaseTokenizerFactory" That's really
confusing.
>
> Any thoughts on what I'm missing/doing wrong?
>
> Devon
>
> [1] http://wiki.apache.org/solr/SolrPlugins
> [2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> ...
> Feb 2, 2007 1:40:53 PM org.apache.solr.schema.IndexSchema readConfig
> INFO: Schema name=mapstore
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
> av
> a:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess
> or
> Impl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.mortbay.start.Main.invokeMain(Main.java:151)
>         at org.mortbay.start.Main.start(Main.java:476)
>         at org.mortbay.start.Main.main(Main.java:94)
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/solr/analysis/BaseTokenizerFactory
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>         at
>
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>         at
java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at
> org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
>         at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:594)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at org.apache.solr.core.Config.findClass(Config.java:192)
>         at org.apache.solr.core.Config.newInstance(Config.java:213)
>         at
> org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.ja
> va
> :504)
>         at
> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:478)
>         at
> org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:296)
>         at
> org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:69)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:191)
>         at
org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:172)
>         at
org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:72)
>         at javax.servlet.GenericServlet.init(GenericServlet.java:168)
>         at
> org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java
> :3
> 83)
>         at
> org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)
>         at
> org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHan
> dl
> er.java:446)
>         at
> org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(Web
> Ap
> plicationHandler.java:321)
>         at
> org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplication
> Co
> ntext.java:509)
>         at org.mortbay.util.Container.start(Container.java:72)
>         at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
>         at org.mortbay.util.Container.start(Container.java:72)
>         at org.mortbay.jetty.Server.main(Server.java:460)
>         ... 7 more
>
> --
> Devon Smith <sm...@oclc.org>
> Senior Software Engineer, Office of Research OCLC Online Computer 
> Library Center, Inc http://www.oclc.org/research/ 
> http://www.oclc.org/research/staff/smith.htm
>

Re: Custom Tokenizer

Posted by Yonik Seeley <yo...@apache.org>.

Hmmm, classloader hell...
I assume you are putting your analyzer in solr/lib?

Perhaps try to explode the solr webapp and put your custom analyzer
directly in WEB-INF/lib/

-Yonik

On 2/2/07, Smith,Devon <sm...@oclc.org> wrote:
> Hi,
>
> I'm trying to get a custom tokenizer working, but I'm having some
> problems. Per the instructions on various pages [1][2], I've been able
> to develop and build the factory and tokenizer. However, when I start
> solr up, I get a stack trace, that says "java.lang.NoClassDefFoundError:
> org/apache/solr/analysis/BaseTokenizerFactory" That's really confusing.
>
> Any thoughts on what I'm missing/doing wrong?
>
> Devon
>
> [1] http://wiki.apache.org/solr/SolrPlugins
> [2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> ...
> Feb 2, 2007 1:40:53 PM org.apache.solr.schema.IndexSchema readConfig
> INFO: Schema name=mapstore
> java.lang.reflect.InvocationTargetException
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
> a:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
> Impl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.mortbay.start.Main.invokeMain(Main.java:151)
>         at org.mortbay.start.Main.start(Main.java:476)
>         at org.mortbay.start.Main.main(Main.java:94)
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/solr/analysis/BaseTokenizerFactory
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>         at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>         at java.net.URLClassLoader.access$000(URLClassLoader.java:56)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at
> org.mortbay.http.ContextLoader.loadClass(ContextLoader.java:233)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:299)
>         at
> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:594)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at org.apache.solr.core.Config.findClass(Config.java:192)
>         at org.apache.solr.core.Config.newInstance(Config.java:213)
>         at
> org.apache.solr.schema.IndexSchema.readTokenizerFactory(IndexSchema.java
> :504)
>         at
> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:478)
>         at
> org.apache.solr.schema.IndexSchema.readConfig(IndexSchema.java:296)
>         at
> org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:69)
>         at org.apache.solr.core.SolrCore.<init>(SolrCore.java:191)
>         at org.apache.solr.core.SolrCore.getSolrCore(SolrCore.java:172)
>         at org.apache.solr.servlet.SolrServlet.init(SolrServlet.java:72)
>         at javax.servlet.GenericServlet.init(GenericServlet.java:168)
>         at
> org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:3
> 83)
>         at
> org.mortbay.jetty.servlet.ServletHolder.start(ServletHolder.java:243)
>         at
> org.mortbay.jetty.servlet.ServletHandler.initializeServlets(ServletHandl
> er.java:446)
>         at
> org.mortbay.jetty.servlet.WebApplicationHandler.initializeServlets(WebAp
> plicationHandler.java:321)
>         at
> org.mortbay.jetty.servlet.WebApplicationContext.doStart(WebApplicationCo
> ntext.java:509)
>         at org.mortbay.util.Container.start(Container.java:72)
>         at org.mortbay.http.HttpServer.doStart(HttpServer.java:708)
>         at org.mortbay.util.Container.start(Container.java:72)
>         at org.mortbay.jetty.Server.main(Server.java:460)
>         ... 7 more
>
> --
> Devon Smith <sm...@oclc.org>
> Senior Software Engineer, Office of Research OCLC Online Computer
> Library Center, Inc http://www.oclc.org/research/
> http://www.oclc.org/research/staff/smith.htm
>

Re: index browsing with solr

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

>>Lucene does it throught the terms method from the class IndexReader, I 
>>think
>>:
>>abstract  TermEnum      terms(Term t) : Returns an enumeration of all 
>>terms after
>>a given term.
>>
>>Does an implementation of this method exists in solr ?
>
>You can get this functionality from the current faceting implementation,
>the downside is that it will be slower.
>
>-Yonik

ok. it seems great. time for me to look at faceted search more closely. Have 
you an idea of how much slower it will be ? does the implementation of an 
indexbrowsing method worth it (it will take me to learn how to integrate the 
new java method to the solr package - and I'm not a coding freak :(( )
anyway, at least i know now i can make it the easy way using faceted search, 
if necessary. thanks !

Pierre-Yves Landron

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: index browsing with solr

Posted by Yonik Seeley <yo...@apache.org>.

On 2/23/07, Pierre-Yves LANDRON <pl...@hotmail.com> wrote:
> I've used solr for two weeks now, and so far it's a really neat solution.
> I've replaced my previous index searcher app by solr in my current project,
> but can not find a way to substitute the browseIndex(field, startterm,
> numberoftermsretuened) function i've used so far. It's a very usefull method
> and I'm sure it can be accomplished with solr, but I can't figure how.
> Lucene does it throught the terms method from the class IndexReader, I think
> :
> abstract  TermEnum      terms(Term t) : Returns an enumeration of all terms after
> a given term.
>
> Does an implementation of this method exists in solr ?

You can get this functionality from the current faceting implementation,
the downside is that it will be slower.

-Yonik

Re: index browsing with solr

Posted by Ryan McKinley <ry...@gmail.com>.

>
> I'm not sure I understand...
> Ryan, how can I have access to your contribution ? (is it a contribution to
> Luke or to the solr rest interface ?)
> Is this implemented yet, and if so, how can I use it ?
> Thanks.
>

Solr has a pluggable request handling framework that lets you easily
write custom logic and takes care of the xml/json/etc writing for you.
 Check:

http://wiki.apache.org/solr/SolrPlugins#head-7c0d03515c496017f6c0116ebb096e34a872cb61
http://wiki.apache.org/solr/SolrRequestHandler

Since the exact term browsing mechanism you asked for is not
supported, I suggested writing your own and looking to the
IndexInfoRequestHandler as a simple starting place.

After more thought (and Yonik pointing it out), you are probably best
off if you can use faced browsing to do what you need.
http://wiki.apache.org/solr/SolrFacetingOverview

The 'luke' comments are unrelated to your direct question, but you can
check that out here: http://issues.apache.org/jira/browse/SOLR-162

ryan

Re: No segments file after optimizing using Luke

Posted by Yonik Seeley <yo...@apache.org>.

On 4/11/07, Michael Levy <Lu...@gmail.com> wrote:
> After using Luke v0.7 to Optimize Index, with either "Using Standard
> Format" or "Using Compound Format", and restarting Solr using jetty, I
> get a java.io.FileNotFoundException indicating that it can't find the
> segments file.
>
> There's no segments file in the index directory, but Luke optimization
> created files named segments_n (n being 3 for standard format and 6 for
> compound format) and segments.gen.
>
> How can I resolve this?  Thanks in advance!

If you are going to use another program to modify a Solr-Lucene index,
make sure that program is using the same version of Lucene that Solr
is :-)

The very latest version of Luke uses Lucene 2.1, which is later than
Solr1.1 uses.
Try using a nightly build of Solr.

-Yonik

No segments file after optimizing using Luke

Posted by Michael Levy <Lu...@gmail.com>.

After using Luke v0.7 to Optimize Index, with either "Using Standard 
Format" or "Using Compound Format", and restarting Solr using jetty, I 
get a java.io.FileNotFoundException indicating that it can't find the 
segments file.

There's no segments file in the index directory, but Luke optimization 
created files named segments_n (n being 3 for standard format and 6 for 
compound format) and segments.gen.

How can I resolve this?  Thanks in advance!

Re: how to interpret the Solr score ?

Posted by Yonik Seeley <yo...@apache.org>.

On 3/1/07, Pierre-Yves LANDRON <pl...@hotmail.com> wrote:
> I haven't seen that maxScore you're speaking about. It's exactly what I
> needed.

It's an attribute in <result> when you elect to return scores.

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">0</int>
 <lst name="params">
  <str name="fl">score,*</str>
  <str name="q">ipod</str>
  <str name="indent">on</str>
 </lst>
</lst>
<result name="response" numFound="3" start="0" maxScore="2.4851787">
 <doc>
  <float name="score">2.4851787</float>
  <arr name="cat"><str>electronics</str><str>connector</str></arr>
  <arr name="features"><str>car power adapter for iPod, white</str></arr>
  <str name="id">IW-02</str>
  <bool name="inStock">false</bool>
  <str name="manu">Belkin</str>
  <str name="name">iPod &amp; iPod Mini USB 2.0 Cable</str>
  <int name="popularity">1</int>
  <float name="price">11.5</float>
  <str name="sku">IW-02</str>
  <date name="timestamp">2007-01-31T05:12:44.484Z</date>
  <float name="weight">2.0</float>
 </doc>
 <doc>
[...]

Re: how to interpret the Solr score ?

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

Thank you Yonik,

I haven't seen that maxScore you're speaking about. It's exactly what I 
needed.

Cheers,
P-Y L


>From: "Yonik Seeley" <yo...@apache.org>
>Reply-To: solr-user@lucene.apache.org
>To: solr-user@lucene.apache.org
>Subject: Re: how to interpret the Solr score ?
>Date: Wed, 28 Feb 2007 12:31:34 -0500
>
>On 2/28/07, Pierre-Yves LANDRON <pl...@hotmail.com> wrote:
>>A silly question, but I think the subject isn't cover on the Solr wiki :
>>I'd like to use the score returned by Solr to give the user an estimation 
>>of
>>the response adequacy to his request (ok, that's what the score is 
>>for...).
>>But I cannot figure the meaning of this score : is it a precentage of the
>>response relevancy ? is it absolute, or relative to others given response 
>>?
>>Is it the raw Lucene score (perhaps I'm wrong but it doesn't seems it is 
>>in
>>my case) ? Is there a way to obtain a meaningful score ( ie: human 
>>readable
>>) ?
>
>It's the raw lucene score.  Some lucene functions normalize the score
>by dividing all scores by the top score in matching documents, if that
>top score is greater than 1.  Solr also returns maxScore so you can do
>the same if you desire.
>
>-Yonik

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: how to interpret the Solr score ?

Posted by Yonik Seeley <yo...@apache.org>.

On 2/28/07, Pierre-Yves LANDRON <pl...@hotmail.com> wrote:
> A silly question, but I think the subject isn't cover on the Solr wiki :
> I'd like to use the score returned by Solr to give the user an estimation of
> the response adequacy to his request (ok, that's what the score is for...).
> But I cannot figure the meaning of this score : is it a precentage of the
> response relevancy ? is it absolute, or relative to others given response ?
> Is it the raw Lucene score (perhaps I'm wrong but it doesn't seems it is in
> my case) ? Is there a way to obtain a meaningful score ( ie: human readable
> ) ?

It's the raw lucene score.  Some lucene functions normalize the score
by dividing all scores by the top score in matching documents, if that
top score is greater than 1.  Solr also returns maxScore so you can do
the same if you desire.

-Yonik

how to interpret the Solr score ?

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

Hello,

A silly question, but I think the subject isn't cover on the Solr wiki :
I'd like to use the score returned by Solr to give the user an estimation of 
the response adequacy to his request (ok, that's what the score is for...). 
But I cannot figure the meaning of this score : is it a precentage of the 
response relevancy ? is it absolute, or relative to others given response ? 
Is it the raw Lucene score (perhaps I'm wrong but it doesn't seems it is in 
my case) ? Is there a way to obtain a meaningful score ( ie: human readable 
) ?

Thanks for reading !
Pierre-Yves Landron

_________________________________________________________________
Play Flexicon: the crossword game that feeds your brain. PLAY now for FREE.  
  http://zone.msn.com/en/flexicon/default.htm?icid=flexicon_hmtagline

Re: index browsing with solr

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

>>http://localhost:8983/solr/select?qt=indexinfo&wt=ruby&indent=on
>>
>>Though IndexInfoRequestHandler is practically obsolete with Ryan's
>>"Luke" contribution... isn't that so, Ryan?
>>
>
>functionality-wise, yes.
>
>I pointed to the IndexInfoRequestHandler because it is the simplest
>SolrRequestHandler that gets into lucene internals.  Adding term
>browsing to it is really straight forward - thats how i stumbled into
>writing the "luke" thing!

I'm not sure I understand...
Ryan, how can I have access to your contribution ? (is it a contribution to 
Luke or to the solr rest interface ?)
Is this implemented yet, and if so, how can I use it ?
Thanks.

Pierre-Yves Landron

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: index browsing with solr

Posted by Ryan McKinley <ry...@gmail.com>.

On 2/24/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Feb 24, 2007, at 6:26 AM, Erik Hatcher wrote:
>
> >
> > On Feb 24, 2007, at 3:36 AM, Pierre-Yves LANDRON wrote:
> >
> >>> it will be easy to add.  take a look at a simple SolrRequestHandler:
> >>>
> >>> http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/
> >>> apache/solr/handler/IndexInfoRequestHandler.java
> >>>
> >>> this gets the IndexReader and writes out some stuff.
> >>
> >> thanks ! i will look at it. This handler is not accessible from
> >> the "rest" interface ? if so, why does it exists : is it used for
> >> the implementation of other rest instruction ?
> >
> > Sure, this handler is available (as long as you have it configured
> > in your solrconfig.xml; it is configured in the example one):
> >
> >       http://localhost:8983/solr/select?qt=indexinfo&wt=ruby&indent=on
>
> Though IndexInfoRequestHandler is practically obsolete with Ryan's
> "Luke" contribution... isn't that so, Ryan?
>

functionality-wise, yes.

I pointed to the IndexInfoRequestHandler because it is the simplest
SolrRequestHandler that gets into lucene internals.  Adding term
browsing to it is really straight forward - thats how i stumbled into
writing the "luke" thing!


> I can't wait for Flare to start getting its hands on the Luke
> handlers!  wow.
>

i can't wait to see it

Re: index browsing with solr

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 24, 2007, at 6:26 AM, Erik Hatcher wrote:

>
> On Feb 24, 2007, at 3:36 AM, Pierre-Yves LANDRON wrote:
>
>>> it will be easy to add.  take a look at a simple SolrRequestHandler:
>>>
>>> http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/ 
>>> apache/solr/handler/IndexInfoRequestHandler.java
>>>
>>> this gets the IndexReader and writes out some stuff.
>>
>> thanks ! i will look at it. This handler is not accessible from  
>> the "rest" interface ? if so, why does it exists : is it used for  
>> the implementation of other rest instruction ?
>
> Sure, this handler is available (as long as you have it configured  
> in your solrconfig.xml; it is configured in the example one):
>
> 	http://localhost:8983/solr/select?qt=indexinfo&wt=ruby&indent=on

Though IndexInfoRequestHandler is practically obsolete with Ryan's  
"Luke" contribution... isn't that so, Ryan?

I can't wait for Flare to start getting its hands on the Luke  
handlers!  wow.

	Erik

Re: index browsing with solr

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 24, 2007, at 3:36 AM, Pierre-Yves LANDRON wrote:

>> it will be easy to add.  take a look at a simple SolrRequestHandler:
>>
>> http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/ 
>> apache/solr/handler/IndexInfoRequestHandler.java
>>
>> this gets the IndexReader and writes out some stuff.
>
> thanks ! i will look at it. This handler is not accessible from the  
> "rest" interface ? if so, why does it exists : is it used for the  
> implementation of other rest instruction ?

Sure, this handler is available (as long as you have it configured in  
your solrconfig.xml; it is configured in the example one):

	http://localhost:8983/solr/select?qt=indexinfo&wt=ruby&indent=on

Erik

Re: index browsing with solr

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

>it will be easy to add.  take a look at a simple SolrRequestHandler:
>
>http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/IndexInfoRequestHandler.java
>
>this gets the IndexReader and writes out some stuff.

thanks ! i will look at it. This handler is not accessible from the "rest" 
interface ? if so, why does it exists : is it used for the implementation of 
other rest instruction ?

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Re: index browsing with solr

Posted by Ryan McKinley <ry...@gmail.com>.

>
> Does an implementation of this method exists in solr ?
>

i don;t think so.


> If not, is it difficult to develop new instructions for solr ? where I must
> start to do so ?
>

it will be easy to add.  take a look at a simple SolrRequestHandler:

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/IndexInfoRequestHandler.java

this gets the IndexReader and writes out some stuff.

index browsing with solr

Posted by Pierre-Yves LANDRON <pl...@hotmail.com>.

Hello everybody,

I'm new to this mailing list, so excuse me if my question has already been 
debated here (I've searched on the web and found nothing about it).

I've used solr for two weeks now, and so far it's a really neat solution. 
I've replaced my previous index searcher app by solr in my current project, 
but can not find a way to substitute the browseIndex(field, startterm, 
numberoftermsretuened) function i've used so far. It's a very usefull method 
and I'm sure it can be accomplished with solr, but I can't figure how.
Lucene does it throught the terms method from the class IndexReader, I think 
:
abstract  TermEnum	terms(Term t) : Returns an enumeration of all terms after 
a given term.

Does an implementation of this method exists in solr ?

If not, is it difficult to develop new instructions for solr ? where I must 
start to do so ?

Thanks !
Pierre-Yves Landron

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

Re: Tagging

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 22, 2007, at 11:30 PM, Gmail Account wrote:
> I use solr for searching and facets and love it.. The performance  
> is awesome.
>
> However I am about to add tagging to my application and I'm having  
> a hard time deciding if I should just database my tags for now  
> until a better solr solution is worked out... Does anyone know what  
> technology some of the larger sites use for tagging? Database  
> (MySQL, SQL Server) with denormalized cache tables everywhere,  
> something similar to solr/lucene, or something else?

Simpy, Otis Gospodnetic's creation, uses Lucene.  I suspect most of  
the others use a relational database and lots and lots of caching...  
especially since the others use tags but not full-text search.  Simpy  
is special!

	Erik

Re: Tagging

Posted by Gmail Account <ma...@gmail.com>.

I use solr for searching and facets and love it.. The performance is 
awesome.

However I am about to add tagging to my application and I'm having a hard 
time deciding if I should just database my tags for now until a better solr 
solution is worked out... Does anyone know what technology some of the 
larger sites use for tagging? Database (MySQL, SQL Server) with denormalized 
cache tables everywhere, something similar to solr/lucene, or something 
else?

Thanks,
Mike

----- Original Message ----- 
From: "Mekin Maheshwari" <me...@gmail.com>
To: <so...@lucene.apache.org>
Sent: Thursday, February 22, 2007 7:39 AM
Subject: Re: Tagging


>> For a more general solution, I'm thinking a separate lucene index
>> might be ideal.
>>
>> -Yonik
>>
>
> I dont know if this will work for others, below is what we do. Also, if
> there are things I can improve, do let me know.
>
> All tag inserts go to a small DB table.
> And I reindex the docs that these tags belong to in a backup index that I
> keep, and swap the new Index in from time to time. I dont do this on the
> production index, as optimizing the index takes a long time.
>
> A hack that I need to do is when looking up for tags, I also look in this
> small table. For me exact matches suffice, hence a db table works, may not
> work for others. I understand that searches on this tag dont work, till it
> gets into the index.
>
> The solution can obviously be made much smarter.
>
> Basically use a queue, from which the indexUpdater can pick up documents 
> to
> reindex & update them when search volumes are low.
>
> I am sure a small lucene index can be used as the queue, and while 
> searching
> both the indices are looked at.
>
> Btw, we are still using lucene for our search, hope to move to solr soon.
>
> -mekin
>

Re: Tagging

Posted by Mekin Maheshwari <me...@gmail.com>.

> For a more general solution, I'm thinking a separate lucene index
> might be ideal.
>
> -Yonik
>

I dont know if this will work for others, below is what we do. Also, if
there are things I can improve, do let me know.

All tag inserts go to a small DB table.
And I reindex the docs that these tags belong to in a backup index that I
keep, and swap the new Index in from time to time. I dont do this on the
production index, as optimizing the index takes a long time.

A hack that I need to do is when looking up for tags, I also look in this
small table. For me exact matches suffice, hence a db table works, may not
work for others. I understand that searches on this tag dont work, till it
gets into the index.

The solution can obviously be made much smarter.

Basically use a queue, from which the indexUpdater can pick up documents to
reindex & update them when search volumes are low.

I am sure a small lucene index can be used as the queue, and while searching
both the indices are looked at.

Btw, we are still using lucene for our search, hope to move to solr soon.

-mekin

Re: Tagging

Posted by Mike Klaas <mi...@gmail.com>.

On 2/13/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:

> Sorry if I'm sending things mangled somehow - and if anyone has
> suggestions on correcting I'm all ears.

Unfortunately, no.

> There is some precedent for putting angle brackets around URLs in e-
> mails:  this mechanism was documented in Tim Berners-Lee's original
> URL format specification, RFC1738:

Absolutely.  The problem seems to be that Mail.app does not recognize
this specification, not that you are following it.

-Mike

Re: Tagging

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 2/14/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:

> ...Sorry if I'm sending things mangled somehow - and if anyone has
> suggestions on correcting I'm all ears....

For long links I tend to use http://tinyurl.com/, but it's a bit
painful to do that for all links.

-Bertrand

Re: Tagging

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 13, 2007, at 9:23 PM, Yonik Seeley wrote:

> On 2/13/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>>
>> On Feb 13, 2007, at 9:01 PM, Yonik Seeley wrote:
>> >> And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff
>> >> with Lucene and Solr already: <http://peel.library.ualberta.ca/
>> >> search/?
>> >>  
>> search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstat
>> >> us=
>> >> on>
>> >
>> > FYI, your mailer is always breaking your links... I always have to
>> > cut-n-paste them back together again.
>>
>> The links are completely intact when viewing my own messages (and
>> others with long links that are surrounded by <brackets>) in that
>> same mailer (Mail.app on Mac OS X).  *shrugs*
>
> Nabble thinks they're broken too:
> http://www.nabble.com/Re%3A-Tagging-p8957261.html
> vs
> http://www.nabble.com/Re%3A-question-about-synonyms-p8954457.html

(sending this message as rich text instead of plain text - there are  
no wrapping options in Mail.app that I've found).

Sorry if I'm sending things mangled somehow - and if anyone has  
suggestions on correcting I'm all ears.

There is some precedent for putting angle brackets around URLs in e- 
mails:  this mechanism was documented in Tim Berners-Lee's original  
URL format specification, RFC1738:

APPENDIX: Recommendations for URLs in Context

    URIs, including URLs, are intended to be transmitted through
    protocols which provide a context for their interpretation.

    In some cases, it will be necessary to distinguish URLs from other
    possible data structures in a syntactic structure. In this case, is
    recommended that URLs be preceeded with a prefix consisting of the
    characters "URL:". For example, this prefix may be used to
    distinguish URLs from other kinds of URIs.

    In addition, there are many occasions when URLs are included in  
other
    kinds of text; examples include electronic mail, USENET news
    messages, or printed on paper. In such cases, it is convenient to
    have a separate syntactic wrapper that delimits the URL and  
separates
    it from the rest of the text, and in particular from punctuation
    marks that might be mistaken for part of the URL. For this purpose,
    is recommended that angle brackets ("<" and ">"), along with the
    prefix "URL:", be used to delimit the boundaries of the URL.  This
    wrapper does not form part of the URL and should not be used in
    contexts in which delimiters are already specified.

    In the case where a fragment/anchor identifier is associated with a
    URL (following a "#"), the identifier would be placed within the
    brackets as well.

    In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
    need to be added to break long URLs across lines.  The whitespace
    should be ignored when extracting the URL.

    No whitespace should be introduced after a hyphen ("-") character.
    Because some typesetters and printers may (erroneously) introduce a
    hyphen at the end of line when breaking a line, the interpreter of a
    URL containing a line break immediately after a hyphen should ignore
    all unencoded whitespace around the line break, and should be aware
    that the hyphen may or may not actually be part of the URL.

-----

Re: Tagging

Posted by Yonik Seeley <yo...@apache.org>.

On 2/13/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Feb 13, 2007, at 9:01 PM, Yonik Seeley wrote:
> >> And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff
> >> with Lucene and Solr already: <http://peel.library.ualberta.ca/
> >> search/?
> >> search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstat
> >> us=
> >> on>
> >
> > FYI, your mailer is always breaking your links... I always have to
> > cut-n-paste them back together again.
>
> The links are completely intact when viewing my own messages (and
> others with long links that are surrounded by <brackets>) in that
> same mailer (Mail.app on Mac OS X).  *shrugs*

Nabble thinks they're broken too:
http://www.nabble.com/Re%3A-Tagging-p8957261.html
vs
http://www.nabble.com/Re%3A-question-about-synonyms-p8954457.html

-Yonik

Re: Tagging

Posted by Yonik Seeley <yo...@apache.org>.

On 2/13/07, Yonik Seeley <yo...@apache.org> wrote:
> I think it's the spaces at the ends of your lines that mess up most
> other clients trying to put the URL back together again.

Yep, seems it's "delsp=yes" being used by Mail.app, but not being
supported by other mail clients.

http://www.macintouch.com/mail.app15.html
Jun. 3, 2005

    Long URLs

    David Duff
    Other posters correctly point out problems with sending url's in
mail...when sending, Mail.app uses a "Content-Type: text/plain;" with
the option "format=flowed", which seems to be fairly standard. It also
uses the option "delsp=yes". the semantics of the delsp option are
that if delsp=yes, then the space at the end of the line should be
removed when the lines are joined together into paragraphs. When doing
"normal" line wrapping between words, where the space should be
present, Mail.app ends the line with two spaces. When wrapping a line
with a URL, where the space should not be present, Mail.app uses a
single space. Thus, it should be possible, in theory, to correctly
reconstruct the URL.

    Unfortunately, this doesn't work. At least not when sending mail
among the most popular two mail clients on the Mac platform (namely
Mail.app and Microsoft entourage). the problem may be that the delsp
option is a fairly new (added between RFC2646 and RFC3676) and not yet
widely adopted.

    There is also widespread "internet wisdom" that suggests that
url's can be protected by surrounding them with angle-brackets or with
the strings "<URL:" and ">". I am not completely clear on whether this
is supposed to prevent the sending program from breaking the URL or to
assist the receiving program in reconstructing it. I have confirmed
empirically however, that Mail.app WILL break url's that are enclosed
in angle brackets. I have further confirmed that (at least) the
entourage mail program is unable to correctly reconstruct a URL that
has been split by Mail.app, with or without angle brackets. As of
Mail.app version 2 (Tiger), mail can send messages with "Content-Type:
text/html", which it (confusingly) refers to as "rich text" in the
format menu. Using this type of encoding, it is fairly easy to embed
url's reliably in a message. You simply select some text in the
message, right-click on it and select "edit link..." and paste or type
the URL. This will embed the link and sidestep any problems related to
line breaks and flowed text.

    Parenthetically, I discovered a bug in this feature that users may
want to be aware of: Mail.app is overly zealous in trying to encode
nonstandard characters. To demonstrate the bug, in Safari, go to any
URL that has a space in it. In the address bar, safari will display
the URL (correctly) with the space encoded as "%20". if you copy the
URL out of Safari and paste it as a link in an email message using
"edit link...", you will see that mail replaces the % symbol with the
string "%25" (you have to examine the raw source of the sent message
to discover this). thus, the link will not work. It is fine that
Mail.app is trying to catch "non-standard" characters and encode them
for you, but the app should be smart enough to not try to replace
percent characters that are already part of an extended encoding
sequence (such as "%20" in this case). this bug has been reported to
Apple (bug id 4126109).

Re: Tagging

Posted by Yonik Seeley <yo...@apache.org>.

On 2/13/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Feb 13, 2007, at 9:01 PM, Yonik Seeley wrote:
> >> And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff
> >> with Lucene and Solr already: <http://peel.library.ualberta.ca/
> >> search/?
> >> search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstat
> >> us=
> >> on>
> >
> > FYI, your mailer is always breaking your links... I always have to
> > cut-n-paste them back together again.
>
> The links are completely intact when viewing my own messages (and
> others with long links that are surrounded by <brackets>) in that
> same mailer (Mail.app on Mac OS X).  *shrugs*

I think it's the spaces at the ends of your lines that mess up most
other clients trying to put the URL back together again.  I
cut'n'pasted your message to myself, and gmail put it all back
together, except the last "on>", presumably because the previous line
ended with a "=".

I guess when your mail client wrapps your lines, it breaks on a space,
but instead of replacing the space with a newline, it adds the newline
after the space.

-Yonik

Re: Tagging

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Feb 13, 2007, at 9:01 PM, Yonik Seeley wrote:
>> And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff
>> with Lucene and Solr already: <http://peel.library.ualberta.ca/
>> search/?
>> search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstat 
>> us=
>> on>
>
> FYI, your mailer is always breaking your links... I always have to
> cut-n-paste them back together again.

The links are completely intact when viewing my own messages (and  
others with long links that are surrounded by <brackets>) in that  
same mailer (Mail.app on Mac OS X).  *shrugs*

	Erik

Re: Tagging

Posted by Yonik Seeley <yo...@apache.org>.

On 2/13/07, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> There is also the possibility of keeping tags with the original
> documents and having them individually updated without having to
> resend the original full text as well: <https://issues.apache.org/
> jira/browse/SOLR-139>

But it does require having all original fields stored, and does
re-analyze and re-index.
Then there's the caching issue... you've changed the index and
internal docids, and all the filters needed for efficient faceting
need to be re-generated (document ones too, not just the tag related
ones, since they are on the same documents).

I do agree that tags-on-docs is desirable and simpler, if the
performance is acceptable from both a re-indexing perspective, and a
time-to-viewable perspective.  The latter will probably be a bigger
problem than the former unless you have a really popular site.

> And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff
> with Lucene and Solr already: <http://peel.library.ualberta.ca/
> search/?
> search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstatus=
> on>

FYI, your mailer is always breaking your links... I always have to
cut-n-paste them back together again.

-Yonik

Re: Tagging

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

There is also the possibility of keeping tags with the original  
documents and having them individually updated without having to  
resend the original full text as well: <https://issues.apache.org/ 
jira/browse/SOLR-139>

And yeah, Peter is a solr4lib kinda guy, doing some way cool stuff  
with Lucene and Solr already: <http://peel.library.ualberta.ca/ 
search/? 
search=raw&pageNumber=1&index=peelbib&field=body&rawQuery=dog&digstatus= 
on>

With separate indexes we're back to the relational model that adds a  
lot of complexity.  For example, I cannot use MoreLikeThis with tags  
to allow commonly tagged objects to be considered similar.  I'm sure  
there are other ways to implement that sort of thing, though I've not  
thought it through.

	Erik

On Feb 13, 2007, at 6:17 PM, Yonik Seeley wrote:

> On 2/13/07, Binkley, Peter <Pe...@ualberta.ca> wrote:
>> I still wonder if there's a good way of storing the tags outside the
>> Lucene index and using them via facets whose bitsets are manipulated
>> directly rather than being populated from the index. In my project,
>> reindexing a documents whenever a user adds a tag is very very bad,
>> since we're indexing potentially hundreds of pages of full text in  
>> the
>> body field of the document. A solution that gets the tag into the  
>> system
>> immediately without forcing a reindexing of the document is  
>> essential.
>
> Interesting... what are you indexing that is that large, the book  
> contents?
> You could build a custom request handler and store tag info outside
> the index.  You could also store it inside the index in separate
> documents as Erik does with Collex.
>
> For a more general solution, I'm thinking a separate lucene index
> might be ideal.
>
> -Yonik

Re: Tagging

Posted by Yonik Seeley <yo...@apache.org>.

On 2/13/07, Binkley, Peter <Pe...@ualberta.ca> wrote:
> I still wonder if there's a good way of storing the tags outside the
> Lucene index and using them via facets whose bitsets are manipulated
> directly rather than being populated from the index. In my project,
> reindexing a documents whenever a user adds a tag is very very bad,
> since we're indexing potentially hundreds of pages of full text in the
> body field of the document. A solution that gets the tag into the system
> immediately without forcing a reindexing of the document is essential.

Interesting... what are you indexing that is that large, the book contents?
You could build a custom request handler and store tag info outside
the index.  You could also store it inside the index in separate
documents as Erik does with Collex.

For a more general solution, I'm thinking a separate lucene index
might be ideal.

 -Yonik

RE: Tagging

Posted by "Binkley, Peter" <Pe...@ualberta.ca>.

I still wonder if there's a good way of storing the tags outside the
Lucene index and using them via facets whose bitsets are manipulated
directly rather than being populated from the index. In my project,
reindexing a documents whenever a user adds a tag is very very bad,
since we're indexing potentially hundreds of pages of full text in the
body field of the document. A solution that gets the tag into the system
immediately without forcing a reindexing of the document is essential.

Peter

-----Original Message-----
From: Ryan McKinley [mailto:ryantxu@gmail.com] 
Sent: Monday, February 12, 2007 7:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Tagging

there no good solution yet.  There has been discussion on possible
approaches

http://www.nabble.com/convert-custom-facets-to-Solr-facets...-tf3163183.
html#a8790179

http://wiki.apache.org/solr/UserTagDesign

On 2/12/07, Gmail Account <ma...@gmail.com> wrote:
> I know that I've seen this topic before.. Is there a guidline on the 
> best way to create tagging in solr?  For example, keeping track of 
> what user tagged what item in solr. And facetting based on tags?
>
> Thanks,
> Mike
>
>

Re: Tagging

Posted by Ryan McKinley <ry...@gmail.com>.

there no good solution yet.  There has been discussion on possible approaches

http://www.nabble.com/convert-custom-facets-to-Solr-facets...-tf3163183.html#a8790179

http://wiki.apache.org/solr/UserTagDesign



On 2/12/07, Gmail Account <ma...@gmail.com> wrote:
> I know that I've seen this topic before.. Is there a guidline on the best
> way to create tagging in solr?  For example, keeping track of what user
> tagged what item in solr. And facetting based on tags?
>
> Thanks,
> Mike
>
>

Tagging

Posted by Gmail Account <ma...@gmail.com>.

I know that I've seen this topic before.. Is there a guidline on the best 
way to create tagging in solr?  For example, keeping track of what user 
tagged what item in solr. And facetting based on tags?

Thanks,
Mike

RE: Custom Tokenizer

Posted by Chris Hostetter <ho...@fucit.org>.

: After reading the docs, I put it in example/solr/lib, but didn't remove
: it from example/ext. Whoops.
:
: Long story short, putting the custom.jar into example/solr/lib worked
: with java 5 and 6, so long as it wasn't also in example/ext.

That's excellent news Devon, thanks or the followup.

(having a class defined more then once in an applications classpaths is
typically problematic ... i believe jetty loads /ext classes into the
system class loader, so it's no suprise that would have caused you
problems)



-Hoss

RE: Custom Tokenizer

Posted by "Smith,Devon" <sm...@oclc.org>.

While pursuing the check with java 5, I tried to recreate the original
problem.
It took me a while, and I've discovered that underlying it all was a
PEBKAC situation.

Before reading the docs, I tried putting my custom.jar into any and
every directory that had jars.
After reading the docs, I put it in example/solr/lib, but didn't remove
it from example/ext. Whoops.

Long story short, putting the custom.jar into example/solr/lib worked
with java 5 and 6, so long as it wasn't also in example/ext.

/dev

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Friday, February 09, 2007 3:34 AM
To: solr-user@lucene.apache.org
Subject: RE: Custom Tokenizer

Sorry, one other thing to verify... did you see an INFO message like
this logged at somepoint...

	Adding 'custom.jar' to Solr classloader

also be on the lookout for "Can't construct solr lib class loader"


: Date: Fri, 9 Feb 2007 00:30:33 -0800 (PST)
: From: Chris Hostetter <ho...@fucit.org>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: RE: Custom Tokenizer
:
:
: : Yes, this is with the Jetty that comes with Solr. Right now I'm just
: : familiarizing myself with everything.
:
: i ment to follow up on this earlier and it slipped through the cracks,
: just to clarify, what you attempted was:
:
: 1) wrote a new Tokenizer
: 2) wrote a new TokenizerFactory that subclassed BaseTokenizerFactory
: 3) compiled these classes, and put them in some "custom.jar"
: 4) put custom.jar in ./example/solr/lib/
: 5) started jetty from ./example using "java -jar start.jar"
:
: ...does that sound right?
:
: could you by anychance try this again using java 1.5 -- i'm wondering
if
: something subtle changed in the 1.6 classloaders
:
: : smithde ~>java -version
: : java version "1.6.0"
: : Java(TM) SE Runtime Environment (build 1.6.0-b105)
: : Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
: :
: : /dev
: :
: :
: : -----Original Message-----
: : From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: : Sent: Monday, February 05, 2007 12:52 AM
: : To: solr-user@lucene.apache.org
: : Subject: Re: Custom Tokenizer
: :
: :
: : : to develop and build the factory and tokenizer. However, when I
start
: : : solr up, I get a stack trace, that says
: : "java.lang.NoClassDefFoundError:
: : : org/apache/solr/analysis/BaseTokenizerFactory" That's really
: : confusing.
: : :
: : : Any thoughts on what I'm missing/doing wrong?
: :
: : based on your stack trace, this is with Jetty right? ... this is
very
: : strange, i definitely tested the whole "plugin lib" thing under
Jetty
: : when i worked on it...
: :
: : http://issues.apache.org/jira/browse/SOLR-68
: :
: : ..can you verify which version of jetty you are using (ie: is it the
: : example install from Solr?) and what OS and JVM you are running?
: :
: :
: : -Hoss
: :
:
:
:
: -Hoss
:



-Hoss

RE: Custom Tokenizer

Posted by Chris Hostetter <ho...@fucit.org>.


Sorry, one other thing to verify... did you see an INFO message like this
logged at somepoint...

	Adding 'custom.jar' to Solr classloader

also be on the lookout for "Can't construct solr lib class loader"




: Date: Fri, 9 Feb 2007 00:30:33 -0800 (PST)
: From: Chris Hostetter <ho...@fucit.org>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: RE: Custom Tokenizer
:
:
: : Yes, this is with the Jetty that comes with Solr. Right now I'm just
: : familiarizing myself with everything.
:
: i ment to follow up on this earlier and it slipped through the cracks,
: just to clarify, what you attempted was:
:
: 1) wrote a new Tokenizer
: 2) wrote a new TokenizerFactory that subclassed BaseTokenizerFactory
: 3) compiled these classes, and put them in some "custom.jar"
: 4) put custom.jar in ./example/solr/lib/
: 5) started jetty from ./example using "java -jar start.jar"
:
: ...does that sound right?
:
: could you by anychance try this again using java 1.5 -- i'm wondering if
: something subtle changed in the 1.6 classloaders
:
: : smithde ~>java -version
: : java version "1.6.0"
: : Java(TM) SE Runtime Environment (build 1.6.0-b105)
: : Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
: :
: : /dev
: :
: :
: : -----Original Message-----
: : From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: : Sent: Monday, February 05, 2007 12:52 AM
: : To: solr-user@lucene.apache.org
: : Subject: Re: Custom Tokenizer
: :
: :
: : : to develop and build the factory and tokenizer. However, when I start
: : : solr up, I get a stack trace, that says
: : "java.lang.NoClassDefFoundError:
: : : org/apache/solr/analysis/BaseTokenizerFactory" That's really
: : confusing.
: : :
: : : Any thoughts on what I'm missing/doing wrong?
: :
: : based on your stack trace, this is with Jetty right? ... this is very
: : strange, i definitely tested the whole "plugin lib" thing under Jetty
: : when i worked on it...
: :
: : http://issues.apache.org/jira/browse/SOLR-68
: :
: : ..can you verify which version of jetty you are using (ie: is it the
: : example install from Solr?) and what OS and JVM you are running?
: :
: :
: : -Hoss
: :
:
:
:
: -Hoss
:



-Hoss

RE: Custom Tokenizer

Posted by Chris Hostetter <ho...@fucit.org>.

: Yes, this is with the Jetty that comes with Solr. Right now I'm just
: familiarizing myself with everything.

i ment to follow up on this earlier and it slipped through the cracks,
just to clarify, what you attempted was:

1) wrote a new Tokenizer
2) wrote a new TokenizerFactory that subclassed BaseTokenizerFactory
3) compiled these classes, and put them in some "custom.jar"
4) put custom.jar in ./example/solr/lib/
5) started jetty from ./example using "java -jar start.jar"

...does that sound right?

could you by anychance try this again using java 1.5 -- i'm wondering if
something subtle changed in the 1.6 classloaders

: smithde ~>java -version
: java version "1.6.0"
: Java(TM) SE Runtime Environment (build 1.6.0-b105)
: Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)
:
: /dev
:
:
: -----Original Message-----
: From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: Sent: Monday, February 05, 2007 12:52 AM
: To: solr-user@lucene.apache.org
: Subject: Re: Custom Tokenizer
:
:
: : to develop and build the factory and tokenizer. However, when I start
: : solr up, I get a stack trace, that says
: "java.lang.NoClassDefFoundError:
: : org/apache/solr/analysis/BaseTokenizerFactory" That's really
: confusing.
: :
: : Any thoughts on what I'm missing/doing wrong?
:
: based on your stack trace, this is with Jetty right? ... this is very
: strange, i definitely tested the whole "plugin lib" thing under Jetty
: when i worked on it...
:
: http://issues.apache.org/jira/browse/SOLR-68
:
: ..can you verify which version of jetty you are using (ie: is it the
: example install from Solr?) and what OS and JVM you are running?
:
:
: -Hoss
:



-Hoss

RE: Custom Tokenizer

Posted by "Smith,Devon" <sm...@oclc.org>.

Yes, this is with the Jetty that comes with Solr. Right now I'm just
familiarizing myself with everything.

smithde ~>uname -a     
Linux smithde 2.6.19-ARCH #1 SMP PREEMPT Thu Jan 11 20:08:17 CET 2007
i686 Intel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux

smithde ~>java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode, sharing)

/dev
 

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Monday, February 05, 2007 12:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Custom Tokenizer


: to develop and build the factory and tokenizer. However, when I start
: solr up, I get a stack trace, that says
"java.lang.NoClassDefFoundError:
: org/apache/solr/analysis/BaseTokenizerFactory" That's really
confusing.
:
: Any thoughts on what I'm missing/doing wrong?

based on your stack trace, this is with Jetty right? ... this is very
strange, i definitely tested the whole "plugin lib" thing under Jetty
when i worked on it...

http://issues.apache.org/jira/browse/SOLR-68

..can you verify which version of jetty you are using (ie: is it the
example install from Solr?) and what OS and JVM you are running?


-Hoss

Re: Custom Tokenizer

Posted by Chris Hostetter <ho...@fucit.org>.

: to develop and build the factory and tokenizer. However, when I start
: solr up, I get a stack trace, that says "java.lang.NoClassDefFoundError:
: org/apache/solr/analysis/BaseTokenizerFactory" That's really confusing.
:
: Any thoughts on what I'm missing/doing wrong?

based on your stack trace, this is with Jetty right? ... this is very
strange, i definitely tested the whole "plugin lib" thing under Jetty when
i worked on it...

http://issues.apache.org/jira/browse/SOLR-68

..can you verify which version of jetty you are using (ie: is it the
example install from Solr?) and what OS and JVM you are running?


-Hoss