You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2014/10/07 18:28:57 UTC

Proposal: Not index blank fields in Solr connector

Hi folks,

We're considering, for MCF 2.0, having the Solr connector not index empty
fields -- just skip them entirely.  This is not backwards-compatible
behavior, for a number of reasons:

(1) If the field whose value is blank is marked as "required" in
solrschema.xml, Solr will reject the document, where before it would have
allowed it to be indexed;
(2) Queries which make a distinction between no field and an empty field
will no longer work the same way.

Can I request a show of hands (figuratively) of how many people this change
might affect?

Thanks!
Karl

Re: Proposal: Not index blank fields in Solr connector

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi,

It will be good to have a such option/switch. I was using RemoveBlankFieldUpdateProcessorFactory to clean empty fields in the past.

Thanks,
Ahmet


On Tuesday, October 7, 2014 7:44 PM, Karl Wright <da...@gmail.com> wrote:



Sorry, I should be more clear.  I'm talking about the "Metadata adjuster" transformation connector, as described here: http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster

The source code is found under connectors/forcedmetadata.

Karl




On Tue, Oct 7, 2014 at 12:38 PM, Karl Wright <da...@gmail.com> wrote:

It just occurred to me that maybe we can put this functionality in the Field Mapping transformation connector, controlled by a checkbox.  That would make everyone happy I think.
>
>Karl
>
>
>
>On Tue, Oct 7, 2014 at 12:35 PM, Alessandro Benedetti <be...@gmail.com> wrote:
>
>I will copy here my consideration :
>>
>>
>>"So you have 2 scenarios for a field : you have content or you don't.
>>Introducing empty field you can have strange behaviours solr sides because from the human point of view there is no difference between an empty or a null field, but for solr yes, and the proper way to model a field that doesn't have content is to not add the field at all to the document.
>>Introducing empty field can lead to inconsistent behaviour mainly sorting side, or when you want to retrieve documents that have value for that field ( in this case indexing empty fields you are introducing false positive for that search)"
>>Curious to see others opinions and discuss !
>>
>>
>>Cheers
>>
>>
>>2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>>
>>Hi folks,
>>>
>>>We're considering, for MCF 2.0, having the Solr connector not index empty fields -- just skip them entirely.  This is not backwards-compatible behavior, for a number of reasons:
>>>
>>>(1) If the field whose value is blank is marked as "required" in solrschema.xml, Solr will reject the document, where before it would have allowed it to be indexed;
>>>(2) Queries which make a distinction between no field and an empty field will no longer work the same way.
>>>
>>>Can I request a show of hands (figuratively) of how many people this change might affect?
>>>
>>>Thanks!
>>>Karl
>>>
>>>
>>>
>>>
>>
>>
>>
>>-- 
>>
>>--------------------------
>>
>>Benedetti Alessandro 
>>Visiting card : http://about.me/alessandro_benedetti
>>
>>"Tyger, tyger burning bright     
>>In the forests of the night,     
>>What immortal hand or eye     
>>Could frame thy fearful symmetry?"
>>
>>William Blake - Songs of Experience -1794 England
>

Re: Proposal: Not index blank fields in Solr connector

Posted by Karl Wright <da...@gmail.com>.
Sorry, I should be more clear.  I'm talking about the "Metadata adjuster"
transformation connector, as described here:
http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster

The source code is found under connectors/forcedmetadata.

Karl


On Tue, Oct 7, 2014 at 12:38 PM, Karl Wright <da...@gmail.com> wrote:

> It just occurred to me that maybe we can put this functionality in the
> Field Mapping transformation connector, controlled by a checkbox.  That
> would make everyone happy I think.
>
> Karl
>
> On Tue, Oct 7, 2014 at 12:35 PM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
>
>> I will copy here my consideration :
>>
>> "So you have 2 scenarios for a field : you have content or you don't.
>>
>> Introducing empty field you can have strange behaviours solr sides
>> because from the human point of view there is no difference between an
>> empty or a null field, but for solr yes, and the proper way to model a
>> field that doesn't have content is to not add the field at all to the
>> document.
>>
>> Introducing empty field can lead to inconsistent behaviour mainly sorting
>> side, or when you want to retrieve documents that have value for that field
>> ( in this case indexing empty fields you are introducing false positive for
>> that search)"
>>
>> Curious to see others opinions and discuss !
>>
>>
>> Cheers
>>
>> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi folks,
>>>
>>> We're considering, for MCF 2.0, having the Solr connector not index
>>> empty fields -- just skip them entirely.  This is not backwards-compatible
>>> behavior, for a number of reasons:
>>>
>>> (1) If the field whose value is blank is marked as "required" in
>>> solrschema.xml, Solr will reject the document, where before it would have
>>> allowed it to be indexed;
>>> (2) Queries which make a distinction between no field and an empty field
>>> will no longer work the same way.
>>>
>>> Can I request a show of hands (figuratively) of how many people this
>>> change might affect?
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Re: Proposal: Not index blank fields in Solr connector

Posted by Karl Wright <da...@gmail.com>.
Sorry, I should be more clear.  I'm talking about the "Metadata adjuster"
transformation connector, as described here:
http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster

The source code is found under connectors/forcedmetadata.

Karl


On Tue, Oct 7, 2014 at 12:38 PM, Karl Wright <da...@gmail.com> wrote:

> It just occurred to me that maybe we can put this functionality in the
> Field Mapping transformation connector, controlled by a checkbox.  That
> would make everyone happy I think.
>
> Karl
>
> On Tue, Oct 7, 2014 at 12:35 PM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
>
>> I will copy here my consideration :
>>
>> "So you have 2 scenarios for a field : you have content or you don't.
>>
>> Introducing empty field you can have strange behaviours solr sides
>> because from the human point of view there is no difference between an
>> empty or a null field, but for solr yes, and the proper way to model a
>> field that doesn't have content is to not add the field at all to the
>> document.
>>
>> Introducing empty field can lead to inconsistent behaviour mainly sorting
>> side, or when you want to retrieve documents that have value for that field
>> ( in this case indexing empty fields you are introducing false positive for
>> that search)"
>>
>> Curious to see others opinions and discuss !
>>
>>
>> Cheers
>>
>> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>>
>>> Hi folks,
>>>
>>> We're considering, for MCF 2.0, having the Solr connector not index
>>> empty fields -- just skip them entirely.  This is not backwards-compatible
>>> behavior, for a number of reasons:
>>>
>>> (1) If the field whose value is blank is marked as "required" in
>>> solrschema.xml, Solr will reject the document, where before it would have
>>> allowed it to be indexed;
>>> (2) Queries which make a distinction between no field and an empty field
>>> will no longer work the same way.
>>>
>>> Can I request a show of hands (figuratively) of how many people this
>>> change might affect?
>>>
>>> Thanks!
>>> Karl
>>>
>>>
>>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Re: Proposal: Not index blank fields in Solr connector

Posted by Karl Wright <da...@gmail.com>.
It just occurred to me that maybe we can put this functionality in the
Field Mapping transformation connector, controlled by a checkbox.  That
would make everyone happy I think.

Karl

On Tue, Oct 7, 2014 at 12:35 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> I will copy here my consideration :
>
> "So you have 2 scenarios for a field : you have content or you don't.
>
> Introducing empty field you can have strange behaviours solr sides because
> from the human point of view there is no difference between an empty or a
> null field, but for solr yes, and the proper way to model a field that
> doesn't have content is to not add the field at all to the document.
>
> Introducing empty field can lead to inconsistent behaviour mainly sorting
> side, or when you want to retrieve documents that have value for that field
> ( in this case indexing empty fields you are introducing false positive for
> that search)"
>
> Curious to see others opinions and discuss !
>
>
> Cheers
>
> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>
>> Hi folks,
>>
>> We're considering, for MCF 2.0, having the Solr connector not index empty
>> fields -- just skip them entirely.  This is not backwards-compatible
>> behavior, for a number of reasons:
>>
>> (1) If the field whose value is blank is marked as "required" in
>> solrschema.xml, Solr will reject the document, where before it would have
>> allowed it to be indexed;
>> (2) Queries which make a distinction between no field and an empty field
>> will no longer work the same way.
>>
>> Can I request a show of hands (figuratively) of how many people this
>> change might affect?
>>
>> Thanks!
>> Karl
>>
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Proposal: Not index blank fields in Solr connector

Posted by Karl Wright <da...@gmail.com>.
It just occurred to me that maybe we can put this functionality in the
Field Mapping transformation connector, controlled by a checkbox.  That
would make everyone happy I think.

Karl

On Tue, Oct 7, 2014 at 12:35 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> I will copy here my consideration :
>
> "So you have 2 scenarios for a field : you have content or you don't.
>
> Introducing empty field you can have strange behaviours solr sides because
> from the human point of view there is no difference between an empty or a
> null field, but for solr yes, and the proper way to model a field that
> doesn't have content is to not add the field at all to the document.
>
> Introducing empty field can lead to inconsistent behaviour mainly sorting
> side, or when you want to retrieve documents that have value for that field
> ( in this case indexing empty fields you are introducing false positive for
> that search)"
>
> Curious to see others opinions and discuss !
>
>
> Cheers
>
> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>
>> Hi folks,
>>
>> We're considering, for MCF 2.0, having the Solr connector not index empty
>> fields -- just skip them entirely.  This is not backwards-compatible
>> behavior, for a number of reasons:
>>
>> (1) If the field whose value is blank is marked as "required" in
>> solrschema.xml, Solr will reject the document, where before it would have
>> allowed it to be indexed;
>> (2) Queries which make a distinction between no field and an empty field
>> will no longer work the same way.
>>
>> Can I request a show of hands (figuratively) of how many people this
>> change might affect?
>>
>> Thanks!
>> Karl
>>
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: Proposal: Not index blank fields in Solr connector

Posted by Antonio David Perez Morales <ap...@zaizi.com>.
I agree to not index empty fields.

It doesn't make sense to index empty fields in most of the uses cases.

Regards

On Tue, Oct 7, 2014 at 6:35 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> I will copy here my consideration :
>
> "So you have 2 scenarios for a field : you have content or you don't.
>
> Introducing empty field you can have strange behaviours solr sides because
> from the human point of view there is no difference between an empty or a
> null field, but for solr yes, and the proper way to model a field that
> doesn't have content is to not add the field at all to the document.
>
> Introducing empty field can lead to inconsistent behaviour mainly sorting
> side, or when you want to retrieve documents that have value for that field
> ( in this case indexing empty fields you are introducing false positive for
> that search)"
>
> Curious to see others opinions and discuss !
>
>
> Cheers
>
> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>
> > Hi folks,
> >
> > We're considering, for MCF 2.0, having the Solr connector not index empty
> > fields -- just skip them entirely.  This is not backwards-compatible
> > behavior, for a number of reasons:
> >
> > (1) If the field whose value is blank is marked as "required" in
> > solrschema.xml, Solr will reject the document, where before it would have
> > allowed it to be indexed;
> > (2) Queries which make a distinction between no field and an empty field
> > will no longer work the same way.
> >
> > Can I request a show of hands (figuratively) of how many people this
> > change might affect?
> >
> > Thanks!
> > Karl
> >
> >
> >
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: Proposal: Not index blank fields in Solr connector

Posted by Antonio David Perez Morales <ap...@zaizi.com>.
I agree to not index empty fields.

It doesn't make sense to index empty fields in most of the uses cases.

Regards

On Tue, Oct 7, 2014 at 6:35 PM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> I will copy here my consideration :
>
> "So you have 2 scenarios for a field : you have content or you don't.
>
> Introducing empty field you can have strange behaviours solr sides because
> from the human point of view there is no difference between an empty or a
> null field, but for solr yes, and the proper way to model a field that
> doesn't have content is to not add the field at all to the document.
>
> Introducing empty field can lead to inconsistent behaviour mainly sorting
> side, or when you want to retrieve documents that have value for that field
> ( in this case indexing empty fields you are introducing false positive for
> that search)"
>
> Curious to see others opinions and discuss !
>
>
> Cheers
>
> 2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:
>
> > Hi folks,
> >
> > We're considering, for MCF 2.0, having the Solr connector not index empty
> > fields -- just skip them entirely.  This is not backwards-compatible
> > behavior, for a number of reasons:
> >
> > (1) If the field whose value is blank is marked as "required" in
> > solrschema.xml, Solr will reject the document, where before it would have
> > allowed it to be indexed;
> > (2) Queries which make a distinction between no field and an empty field
> > will no longer work the same way.
> >
> > Can I request a show of hands (figuratively) of how many people this
> > change might affect?
> >
> > Thanks!
> > Karl
> >
> >
> >
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Re: Proposal: Not index blank fields in Solr connector

Posted by Alessandro Benedetti <be...@gmail.com>.
I will copy here my consideration :

"So you have 2 scenarios for a field : you have content or you don't.

Introducing empty field you can have strange behaviours solr sides because
from the human point of view there is no difference between an empty or a
null field, but for solr yes, and the proper way to model a field that
doesn't have content is to not add the field at all to the document.

Introducing empty field can lead to inconsistent behaviour mainly sorting
side, or when you want to retrieve documents that have value for that field
( in this case indexing empty fields you are introducing false positive for
that search)"

Curious to see others opinions and discuss !


Cheers

2014-10-07 17:28 GMT+01:00 Karl Wright <da...@gmail.com>:

> Hi folks,
>
> We're considering, for MCF 2.0, having the Solr connector not index empty
> fields -- just skip them entirely.  This is not backwards-compatible
> behavior, for a number of reasons:
>
> (1) If the field whose value is blank is marked as "required" in
> solrschema.xml, Solr will reject the document, where before it would have
> allowed it to be indexed;
> (2) Queries which make a distinction between no field and an empty field
> will no longer work the same way.
>
> Can I request a show of hands (figuratively) of how many people this
> change might affect?
>
> Thanks!
> Karl
>
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England