You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jorge Luis Betancourt Gonzalez <jl...@uci.cu> on 2012/12/15 02:34:49 UTC

Dedup component

Hi all:

I'm trying to build a query suggestion system using solr (also used to index all the data in the app). I've a separated core dedicated only for this purpose (along with some other for images, etc.). In the main app, written in Symfoy2 + Solarium Bundle, we store the queries in this core, to prevent the indexing of duplicated queries, I use the dedup component:

<!--
 Delete similar duplicated documents on index time, using some fuzzy text similary techniques 
-->
<updateRequestProcessorChain name="dedupe">
<processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<bool name="overwriteDupes">false</bool>
<str name="signatureField">signature</str>
<str name="fields">textsuggest,textng</str>
<str name="signatureClass">
org.apache.solr.update.processor.TextProfileSignature
</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

Which prevent the store of very similar queries, but with this configuration, but what I really trying to accomplish is to increment a count (popularity) field when the same query is sent to solr.

Any thought on this?

Greetings!

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Upayavira <uv...@odoko.co.uk>.
No, not that I am aware of, you would have to do it in your application.

Upayavira

On Sun, Dec 16, 2012, at 12:55 AM, Jorge Luis Betancourt Gonzalez wrote:
> Exist any similar approach that I could use in solr 3.6.1 or should I add
> this logic to my application?
> 
> ----- Mensaje original -----
> De: "Upayavira" <uv...@odoko.co.uk>
> Para: solr-user@lucene.apache.org
> Enviados: Sábado, 15 de Diciembre 2012 12:37:11
> Asunto: Re: Dedup component
> 
> Nope, it is a Solr 4.0 thing. In order for it to work, you need to store
> every field, as what it does behind the scenes is retrieve the stored
> fields, rebuilds the document, and then posts the whole document back.
> 
> Upayavira
> 
> On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
> > Is this updatable fields available in Solr 3.6.1, is the one I'm using
> > right now.
> > 
> > ----- Mensaje original -----
> > De: "Upayavira" <uv...@odoko.co.uk>
> > Para: solr-user@lucene.apache.org
> > Enviados: Sábado, 15 de Diciembre 2012 7:56:45
> > Asunto: Re: Dedup component
> > 
> > Make the ID field out of the query text so you don't have to use the
> > dedup component, then use the updatable fields functionality in Solr
> > 4.0:
> > 
> > $ curl http://localhost:8983/solr/update -H
> > 'Content-type:application/json' -d '
> > [
> >  {"id"        : "book1",
> >   "copies_i"  : { "inc" : 1},
> >   "cat"       : { "add" : "fantasy"},
> >   "ISBN_s"    : { "set" : "0-380-97365-0"}
> >   "remove_s"  : { "set" : null } }
> > ]'
> > 
> > /* example stolen from Yonik's ApacheCon talk */
> > 
> > Upayavira
> > 
> > 
> > On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> > > Hi all:
> > > 
> > > I'm trying to build a query suggestion system using solr (also used to
> > > index all the data in the app). I've a separated core dedicated only for
> > > this purpose (along with some other for images, etc.). In the main app,
> > > written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> > > to prevent the indexing of duplicated queries, I use the dedup component:
> > > 
> > > <!--
> > >  Delete similar duplicated documents on index time, using some fuzzy text
> > >  similary techniques 
> > > -->
> > > <updateRequestProcessorChain name="dedupe">
> > > <processor
> > > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> > > <bool name="enabled">true</bool>
> > > <bool name="overwriteDupes">false</bool>
> > > <str name="signatureField">signature</str>
> > > <str name="fields">textsuggest,textng</str>
> > > <str name="signatureClass">
> > > org.apache.solr.update.processor.TextProfileSignature
> > > </str>
> > > </processor>
> > > <processor class="solr.LogUpdateProcessorFactory"/>
> > > <processor class="solr.RunUpdateProcessorFactory"/>
> > > </updateRequestProcessorChain>
> > > 
> > > Which prevent the store of very similar queries, but with this
> > > configuration, but what I really trying to accomplish is to increment a
> > > count (popularity) field when the same query is sent to solr.
> > > 
> > > Any thought on this?
> > > 
> > > Greetings!
> > > 
> > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > > INFORMATICAS...
> > > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > > 
> > > http://www.uci.cu
> > > http://www.facebook.com/universidad.uci
> > > http://www.flickr.com/photos/universidad_uci
> > 
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > 
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> > 
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > 
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Jorge Luis Betancourt Gonzalez <jl...@uci.cu>.
Exist any similar approach that I could use in solr 3.6.1 or should I add this logic to my application?

----- Mensaje original -----
De: "Upayavira" <uv...@odoko.co.uk>
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 12:37:11
Asunto: Re: Dedup component

Nope, it is a Solr 4.0 thing. In order for it to work, you need to store
every field, as what it does behind the scenes is retrieve the stored
fields, rebuilds the document, and then posts the whole document back.

Upayavira

On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
> Is this updatable fields available in Solr 3.6.1, is the one I'm using
> right now.
> 
> ----- Mensaje original -----
> De: "Upayavira" <uv...@odoko.co.uk>
> Para: solr-user@lucene.apache.org
> Enviados: Sábado, 15 de Diciembre 2012 7:56:45
> Asunto: Re: Dedup component
> 
> Make the ID field out of the query text so you don't have to use the
> dedup component, then use the updatable fields functionality in Solr
> 4.0:
> 
> $ curl http://localhost:8983/solr/update -H
> 'Content-type:application/json' -d '
> [
>  {"id"        : "book1",
>   "copies_i"  : { "inc" : 1},
>   "cat"       : { "add" : "fantasy"},
>   "ISBN_s"    : { "set" : "0-380-97365-0"}
>   "remove_s"  : { "set" : null } }
> ]'
> 
> /* example stolen from Yonik's ApacheCon talk */
> 
> Upayavira
> 
> 
> On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> > Hi all:
> > 
> > I'm trying to build a query suggestion system using solr (also used to
> > index all the data in the app). I've a separated core dedicated only for
> > this purpose (along with some other for images, etc.). In the main app,
> > written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> > to prevent the indexing of duplicated queries, I use the dedup component:
> > 
> > <!--
> >  Delete similar duplicated documents on index time, using some fuzzy text
> >  similary techniques 
> > -->
> > <updateRequestProcessorChain name="dedupe">
> > <processor
> > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> > <bool name="enabled">true</bool>
> > <bool name="overwriteDupes">false</bool>
> > <str name="signatureField">signature</str>
> > <str name="fields">textsuggest,textng</str>
> > <str name="signatureClass">
> > org.apache.solr.update.processor.TextProfileSignature
> > </str>
> > </processor>
> > <processor class="solr.LogUpdateProcessorFactory"/>
> > <processor class="solr.RunUpdateProcessorFactory"/>
> > </updateRequestProcessorChain>
> > 
> > Which prevent the store of very similar queries, but with this
> > configuration, but what I really trying to accomplish is to increment a
> > count (popularity) field when the same query is sent to solr.
> > 
> > Any thought on this?
> > 
> > Greetings!
> > 
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > 
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Upayavira <uv...@odoko.co.uk>.
Nope, it is a Solr 4.0 thing. In order for it to work, you need to store
every field, as what it does behind the scenes is retrieve the stored
fields, rebuilds the document, and then posts the whole document back.

Upayavira

On Sat, Dec 15, 2012, at 04:52 PM, Jorge Luis Betancourt Gonzalez wrote:
> Is this updatable fields available in Solr 3.6.1, is the one I'm using
> right now.
> 
> ----- Mensaje original -----
> De: "Upayavira" <uv...@odoko.co.uk>
> Para: solr-user@lucene.apache.org
> Enviados: Sábado, 15 de Diciembre 2012 7:56:45
> Asunto: Re: Dedup component
> 
> Make the ID field out of the query text so you don't have to use the
> dedup component, then use the updatable fields functionality in Solr
> 4.0:
> 
> $ curl http://localhost:8983/solr/update -H
> 'Content-type:application/json' -d '
> [
>  {"id"        : "book1",
>   "copies_i"  : { "inc" : 1},
>   "cat"       : { "add" : "fantasy"},
>   "ISBN_s"    : { "set" : "0-380-97365-0"}
>   "remove_s"  : { "set" : null } }
> ]'
> 
> /* example stolen from Yonik's ApacheCon talk */
> 
> Upayavira
> 
> 
> On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> > Hi all:
> > 
> > I'm trying to build a query suggestion system using solr (also used to
> > index all the data in the app). I've a separated core dedicated only for
> > this purpose (along with some other for images, etc.). In the main app,
> > written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> > to prevent the indexing of duplicated queries, I use the dedup component:
> > 
> > <!--
> >  Delete similar duplicated documents on index time, using some fuzzy text
> >  similary techniques 
> > -->
> > <updateRequestProcessorChain name="dedupe">
> > <processor
> > class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> > <bool name="enabled">true</bool>
> > <bool name="overwriteDupes">false</bool>
> > <str name="signatureField">signature</str>
> > <str name="fields">textsuggest,textng</str>
> > <str name="signatureClass">
> > org.apache.solr.update.processor.TextProfileSignature
> > </str>
> > </processor>
> > <processor class="solr.LogUpdateProcessorFactory"/>
> > <processor class="solr.RunUpdateProcessorFactory"/>
> > </updateRequestProcessorChain>
> > 
> > Which prevent the store of very similar queries, but with this
> > configuration, but what I really trying to accomplish is to increment a
> > count (popularity) field when the same query is sent to solr.
> > 
> > Any thought on this?
> > 
> > Greetings!
> > 
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> > INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > 
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Jorge Luis Betancourt Gonzalez <jl...@uci.cu>.
Is this updatable fields available in Solr 3.6.1, is the one I'm using right now.

----- Mensaje original -----
De: "Upayavira" <uv...@odoko.co.uk>
Para: solr-user@lucene.apache.org
Enviados: Sábado, 15 de Diciembre 2012 7:56:45
Asunto: Re: Dedup component

Make the ID field out of the query text so you don't have to use the
dedup component, then use the updatable fields functionality in Solr
4.0:

$ curl http://localhost:8983/solr/update -H
'Content-type:application/json' -d '
[
 {"id"        : "book1",
  "copies_i"  : { "inc" : 1},
  "cat"       : { "add" : "fantasy"},
  "ISBN_s"    : { "set" : "0-380-97365-0"}
  "remove_s"  : { "set" : null } }
]'

/* example stolen from Yonik's ApacheCon talk */

Upayavira


On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi all:
> 
> I'm trying to build a query suggestion system using solr (also used to
> index all the data in the app). I've a separated core dedicated only for
> this purpose (along with some other for images, etc.). In the main app,
> written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> to prevent the indexing of duplicated queries, I use the dedup component:
> 
> <!--
>  Delete similar duplicated documents on index time, using some fuzzy text
>  similary techniques 
> -->
> <updateRequestProcessorChain name="dedupe">
> <processor
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
> <bool name="overwriteDupes">false</bool>
> <str name="signatureField">signature</str>
> <str name="fields">textsuggest,textng</str>
> <str name="signatureClass">
> org.apache.solr.update.processor.TextProfileSignature
> </str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory"/>
> <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
> 
> Which prevent the store of very similar queries, but with this
> configuration, but what I really trying to accomplish is to increment a
> count (popularity) field when the same query is sent to solr.
> 
> Any thought on this?
> 
> Greetings!
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Upayavira <uv...@odoko.co.uk>.
Make the ID field out of the query text so you don't have to use the
dedup component, then use the updatable fields functionality in Solr
4.0:

$ curl http://localhost:8983/solr/update -H
'Content-type:application/json' -d '
[
 {"id"        : "book1",
  "copies_i"  : { "inc" : 1},
  "cat"       : { "add" : "fantasy"},
  "ISBN_s"    : { "set" : "0-380-97365-0"}
  "remove_s"  : { "set" : null } }
]'

/* example stolen from Yonik's ApacheCon talk */

Upayavira


On Sat, Dec 15, 2012, at 01:34 AM, Jorge Luis Betancourt Gonzalez wrote:
> Hi all:
> 
> I'm trying to build a query suggestion system using solr (also used to
> index all the data in the app). I've a separated core dedicated only for
> this purpose (along with some other for images, etc.). In the main app,
> written in Symfoy2 + Solarium Bundle, we store the queries in this core,
> to prevent the indexing of duplicated queries, I use the dedup component:
> 
> <!--
>  Delete similar duplicated documents on index time, using some fuzzy text
>  similary techniques 
> -->
> <updateRequestProcessorChain name="dedupe">
> <processor
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
> <bool name="overwriteDupes">false</bool>
> <str name="signatureField">signature</str>
> <str name="fields">textsuggest,textng</str>
> <str name="signatureClass">
> org.apache.solr.update.processor.TextProfileSignature
> </str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory"/>
> <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
> 
> Which prevent the store of very similar queries, but with this
> configuration, but what I really trying to accomplish is to increment a
> count (popularity) field when the same query is sent to solr.
> 
> Any thought on this?
> 
> Greetings!
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

Re: Dedup component

Posted by Fergus McDowall <fe...@gmail.com>.
unsubscribe


On Sat, Dec 15, 2012 at 2:34 AM, Jorge Luis Betancourt Gonzalez <
jlbetancourt@uci.cu> wrote:

> Hi all:
>
> I'm trying to build a query suggestion system using solr (also used to
> index all the data in the app). I've a separated core dedicated only for
> this purpose (along with some other for images, etc.). In the main app,
> written in Symfoy2 + Solarium Bundle, we store the queries in this core, to
> prevent the indexing of duplicated queries, I use the dedup component:
>
> <!--
>  Delete similar duplicated documents on index time, using some fuzzy text
> similary techniques
> -->
> <updateRequestProcessorChain name="dedupe">
> <processor
> class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
> <bool name="enabled">true</bool>
> <bool name="overwriteDupes">false</bool>
> <str name="signatureField">signature</str>
> <str name="fields">textsuggest,textng</str>
> <str name="signatureClass">
> org.apache.solr.update.processor.TextProfileSignature
> </str>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory"/>
> <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
>
> Which prevent the store of very similar queries, but with this
> configuration, but what I really trying to accomplish is to increment a
> count (popularity) field when the same query is sent to solr.
>
> Any thought on this?
>
> Greetings!
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>