You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Costi Muraru <co...@gmail.com> on 2014/04/28 19:20:07 UTC

Delete fields from document using a wildcard

Hi guys,

Would be possible, using Atomic Updates in SOLR4, to remove all fields
matching a pattern? For instance something like:

<add><doc>
  <field name="id">100</field>
  <*field name="*_name_i" update="set" null="true"></field>*
</doc></add>

Or something similar to remove certain fields in all documents.

Thanks,
Costi

Re: Delete fields from document using a wildcard

Posted by Costi Muraru <co...@gmail.com>.
I've opened an issue: https://issues.apache.org/jira/browse/SOLR-6034
Feedback in Jira is appreciated.


On Tue, Apr 29, 2014 at 8:34 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> I think this is useful as well. Can you open an issue?
>
>
> On Tue, Apr 29, 2014 at 7:53 PM, Shawn Heisey <so...@elyograg.org> wrote:
>
> > On 4/29/2014 5:25 AM, Costi Muraru wrote:
> > > The problem is, I don't know the exact names of the fields I want to
> > > remove. All I know is that they end in *_1600_i.
> > >
> > > When removing fields from a document, I want to avoid querying SOLR to
> > see
> > > what fields are actually present for the specific document. In this
> way,
> > > hopefully I can speed up the process. Querying to see the schema.xml is
> > not
> > > going to help me much, since the field is defined a dynamic field *_i.
> > This
> > > makes me think that expanding the documents client-side is not the best
> > way
> > > to do it.
> >
> > Unfortunately at this time, you'll have to query the document and
> > go through the list of fields to determine which need to be deleted,
> > then build a request that deleted them.
> >
> > I don't know how hard it is to accomplish this in Solr.  Getting it
> > implemented might require a bunch of people standing up and saying "we
> > want this!"
> >
> > Thanks,
> > Shawn
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Delete fields from document using a wildcard

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
I think this is useful as well. Can you open an issue?


On Tue, Apr 29, 2014 at 7:53 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/29/2014 5:25 AM, Costi Muraru wrote:
> > The problem is, I don't know the exact names of the fields I want to
> > remove. All I know is that they end in *_1600_i.
> >
> > When removing fields from a document, I want to avoid querying SOLR to
> see
> > what fields are actually present for the specific document. In this way,
> > hopefully I can speed up the process. Querying to see the schema.xml is
> not
> > going to help me much, since the field is defined a dynamic field *_i.
> This
> > makes me think that expanding the documents client-side is not the best
> way
> > to do it.
>
> Unfortunately at this time, you'll have to query the document and
> go through the list of fields to determine which need to be deleted,
> then build a request that deleted them.
>
> I don't know how hard it is to accomplish this in Solr.  Getting it
> implemented might require a bunch of people standing up and saying "we
> want this!"
>
> Thanks,
> Shawn
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Delete fields from document using a wildcard

Posted by Shawn Heisey <so...@elyograg.org>.
On 4/29/2014 5:25 AM, Costi Muraru wrote:
> The problem is, I don't know the exact names of the fields I want to
> remove. All I know is that they end in *_1600_i.
> 
> When removing fields from a document, I want to avoid querying SOLR to see
> what fields are actually present for the specific document. In this way,
> hopefully I can speed up the process. Querying to see the schema.xml is not
> going to help me much, since the field is defined a dynamic field *_i. This
> makes me think that expanding the documents client-side is not the best way
> to do it.

Unfortunately at this time, you'll have to query the document and
go through the list of fields to determine which need to be deleted,
then build a request that deleted them.

I don't know how hard it is to accomplish this in Solr.  Getting it
implemented might require a bunch of people standing up and saying "we
want this!"

Thanks,
Shawn

Re: Delete fields from document using a wildcard

Posted by Costi Muraru <co...@gmail.com>.
Thanks, Alex for the input.

Let me provide a better example on what I'm trying to achieve. I have
documents like this:

<doc>
<field name="id">100</field>
<field name="2_1600_i">1</field>
<field name="5_1601_i">5</field>
<field name="112_1602_i">7</field>
</doc>

The schema looks the usual way:
<dynamicField name="*_i"  type="int"    indexed="true"  stored="true"/>
The dynamic field pattern I'm using is this: id_day_i.

Each day I want to add new fields for the current day and remove the fields
for the oldest one.

<add><doc>
  <field name="id">100</field>

  <!-- add fields for current day -->
  <field name="251_1603_i" update="set">25</field>

  <!-- remove fields for oldest day -->
  <field name="2_1600_i" update="set" null="true">1</field>
</doc></add>

The problem is, I don't know the exact names of the fields I want to
remove. All I know is that they end in *_1600_i.

When removing fields from a document, I want to avoid querying SOLR to see
what fields are actually present for the specific document. In this way,
hopefully I can speed up the process. Querying to see the schema.xml is not
going to help me much, since the field is defined a dynamic field *_i. This
makes me think that expanding the documents client-side is not the best way
to do it.

Regarding the second approach, to expand the documents server-side. I took
a look over the SOLR code and came upon the UpdateRequestProcessor.java class
which had this interesting javadoc:

* * This is a good place for subclassed update handlers to process the
document before it is *
* * indexed.  You may wish to add/remove fields or check if the requested
user is allowed to *
* * update the given document...*

As you can imagine, I have no expertise in SOLR code. How would you say it
would be possible to retrieve the document and its fields for the given id
and update the update/delete command to include the fields that match the
pattern I'm giving (eg. *_1600_i)?

Thanks,
Costi


On Tue, Apr 29, 2014 at 6:41 AM, Alexandre Rafalovitch
<ar...@gmail.com>wrote:

> Not out of the box, as far as I know.
>
> Custom UpdateRequestProcessor could possibly do some sort of expansion
> of the field name by verifying the actual schema. Not sure if API
> supports that level of flexibility. Or, for latest Solr, you can
> request the list of known field names via REST and do client-side
> expansion instead.
>
> Regards,
>    Alex.
> Personal website: http://www.outerthoughts.com/
> Current project: http://www.solr-start.com/ - Accelerating your Solr
> proficiency
>
>
> On Tue, Apr 29, 2014 at 12:20 AM, Costi Muraru <co...@gmail.com>
> wrote:
> > Hi guys,
> >
> > Would be possible, using Atomic Updates in SOLR4, to remove all fields
> > matching a pattern? For instance something like:
> >
> > <add><doc>
> >   <field name="id">100</field>
> >   <*field name="*_name_i" update="set" null="true"></field>*
> > </doc></add>
> >
> > Or something similar to remove certain fields in all documents.
> >
> > Thanks,
> > Costi
>

Re: Delete fields from document using a wildcard

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Not out of the box, as far as I know.

Custom UpdateRequestProcessor could possibly do some sort of expansion
of the field name by verifying the actual schema. Not sure if API
supports that level of flexibility. Or, for latest Solr, you can
request the list of known field names via REST and do client-side
expansion instead.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Tue, Apr 29, 2014 at 12:20 AM, Costi Muraru <co...@gmail.com> wrote:
> Hi guys,
>
> Would be possible, using Atomic Updates in SOLR4, to remove all fields
> matching a pattern? For instance something like:
>
> <add><doc>
>   <field name="id">100</field>
>   <*field name="*_name_i" update="set" null="true"></field>*
> </doc></add>
>
> Or something similar to remove certain fields in all documents.
>
> Thanks,
> Costi