You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Em <ma...@yahoo.de> on 2011/01/31 14:20:31 UTC

UpdateHandler-Bug or intended feature?

Hi list,

I am not sure whether this behaviour is intended or not.

I am experimenting with the UpdateRequestProcessor-feature of Solr (V: 1.4)
and there occured something I find strange.

Well, when I send csv-data to the CSV-UpdateHandler with some fields
specified that are not part of the Schema, the input isn't passed up to the
UpdateRequestProcessor-Chain. 

Here is some code from my UpdateRequestProcessor:

	@Override
	public void processAdd(AddUpdateCommand cmd) throws IOException {
		 super.processAdd(cmd);
		 throw new IOException("HelloWorld");
	 }

Well, this processor makes truely no sense, but I wanted to see whether it
is called or not and it seems like it won't get called anyway.

My clients gets back messages like "undefined field MyID" - yes, "MyID"
isn't specified. 

For example:
If I want to build a field "hash" from "MyID" and removing "MyID" afterwards
from the InputDocument, I will never get the chance to do so if the
Processor isn't called anyway.

Is this intended or am I doing something wrong here?

Regards
-- 
View this message in context: http://lucene.472066.n3.nabble.com/UpdateHandler-Bug-or-intended-feature-tp2389382p2389382.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UpdateHandler-Bug or intended feature?

Posted by Em <ma...@yahoo.de>.
Hi Hoss,

actually I thought this would be neccessary for the SolrInputDocument to map
against a special FieldType, but this isn't true. The mapping comes
sometimes after the UpdateProcessor finished its work.
So yes, there is no reason to force the CSVRequestHandler to throw an
Exception if the field does not exist.

I will register at the Jira and open an issue for that today.

Regards


Chris Hostetter-3 wrote:
> 
> 
> : Well, this does not seem to me like a bug but more like an exotic
> : situation where two concepts collidate with eachother.
> : The CSVRequestHandler is intended to sweep all the unneccessary stuff
> : out of the input to avoid exceptions for unknown fields
> : while my UpdateRequestProcessor needs such fields to work correctly.
> 
> Agreed, this is an interesting edge case ... i don't actaully see any 
> reason why CSVRequestHandler needs the SchemaField for each field name -- 
> all it ever seems to use it for is determining hte field name, so it would 
> probably be easy to rip out.
> 
> i think even if CSVRequestHandler has some reason for wanting the 
> SchemaField object, it should gracefully handle the case where it can't be 
> found (there's a version of the method it calls that returns null instead 
> of throwing an exception) and just passing the fieldname=val pairs into 
> the SolrInputDocument for the UpdateProcessor to deal with -- if there 
> really is a problem (and nothing ever removes/maps that field) the 
> underlying "add" code will eventually fail with the same exception.
> 
> Please feel free to open a Jira issue for this -- it would help in 
> particular if you could mention the gist of your usecase (why you include 
> columns that don't map directly to fields and what your UpdateProcessor 
> does with them) so people better understand the goal.
> 
> -Hoss
> 
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/UpdateHandler-Bug-or-intended-feature-tp2389382p2395656.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UpdateHandler-Bug or intended feature?

Posted by Chris Hostetter <ho...@fucit.org>.
: Well, this does not seem to me like a bug but more like an exotic
: situation where two concepts collidate with eachother.
: The CSVRequestHandler is intended to sweep all the unneccessary stuff
: out of the input to avoid exceptions for unknown fields
: while my UpdateRequestProcessor needs such fields to work correctly.

Agreed, this is an interesting edge case ... i don't actaully see any 
reason why CSVRequestHandler needs the SchemaField for each field name -- 
all it ever seems to use it for is determining hte field name, so it would 
probably be easy to rip out.

i think even if CSVRequestHandler has some reason for wanting the 
SchemaField object, it should gracefully handle the case where it can't be 
found (there's a version of the method it calls that returns null instead 
of throwing an exception) and just passing the fieldname=val pairs into 
the SolrInputDocument for the UpdateProcessor to deal with -- if there 
really is a problem (and nothing ever removes/maps that field) the 
underlying "add" code will eventually fail with the same exception.

Please feel free to open a Jira issue for this -- it would help in 
particular if you could mention the gist of your usecase (why you include 
columns that don't map directly to fields and what your UpdateProcessor 
does with them) so people better understand the goal.

-Hoss

Re: UpdateHandler-Bug or intended feature?

Posted by Em <ma...@yahoo.de>.
Here is what I found out:

The CSVRequestHandler gets its fields in line 240 and the following
ones. Those fieldnames come from the file's header
or from the specified params in the request.

The CSVRequestHandler calls prepareFields to create an array of
SchemaFields (see line 269) that will be
filled by schema-fields in line 282.
Here comes the problem: "MyID" does not exist as a schemaField, which
throws an Exception.
Ignoring the field "MyID" would not solve the problem, since it is
needed for my UpdateRequestProcessor.

Well, this does not seem to me like a bug but more like an exotic
situation where two concepts collidate with eachother.
The CSVRequestHandler is intended to sweep all the unneccessary stuff
out of the input to avoid exceptions for unknown fields
while my UpdateRequestProcessor needs such fields to work correctly.

I could imagine to add all expected fields to my schema.xmll with
indexed + stored = false, but this is dirty.
However, the more I think of a rewrite for my situation, the less sense
it makes since the validation is definitly neccessary.

It seems like I will end up in my first idea with adding all expected
fields to my schema.xml, if there are no other suggestions.

Thank you for your help!

Regards

Am 31.01.2011 16:58, schrieb Em:
> Okay, I added some Logging-Stuff to both the processor and its factory.
> It turned out that there IS an updateProcessor returned and it is NOT null.
> However, my logging-method inside the processAdd-Method (1st line, so it
> HAS to be called, if one calls the method) get never called - so the
> exception will definitly get called before my processor does something.
>
> Looking into the CSVRequestHandler shows that the CSVRequestHandler's
> prepareFields()-method seems to be based on the header of the CSV-file,
> not on the document itself. However, I am currently reading more of the
> code to understand what really happens, because everything works fine,
> if the fields of the csv are specified - no matter whether I add fields
> with an UpdateRequestProcessor or not.
>
> If you like, have a look around line 282 (prepareFields) in
> CSVRequestHandler.
>
> Regards
>
> Am 31.01.2011 16:06, schrieb Koji Sekiguchi:
>> (11/01/31 23:33), Em wrote:
>>> Hi Koji,
>>>
>>> following is the solrconfig:
>>>
>>>      <requestHandler name="/update/csv" class="solr.CSVRequestHandler">
>>>          <lst name="defaults">
>>>              <str name="update.processor">throwAway</str>
>>>          </lst>
>>>      </requestHandler>
>>>
>>>    <updateRequestProcessorChain name="throwAway">
>>>          <processor
>>> class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
>>>
>>>          <processor class="solr.RunUpdateProcessorFactory" />
>>>    </updateRequestProcessorChain>
>>>
>>> Do you see any mistake here?
>>>
>>> Regards
>>>
>> Hmm, Looks fine. Are you sure you create your update processor instance
>> in your factory and return it? (if it is null, processor chain simply
>> ignores your processor...)
>>
>> Koji
>


Re: UpdateHandler-Bug or intended feature?

Posted by Em <ma...@yahoo.de>.
Okay, I added some Logging-Stuff to both the processor and its factory.
It turned out that there IS an updateProcessor returned and it is NOT null.
However, my logging-method inside the processAdd-Method (1st line, so it
HAS to be called, if one calls the method) get never called - so the
exception will definitly get called before my processor does something.

Looking into the CSVRequestHandler shows that the CSVRequestHandler's
prepareFields()-method seems to be based on the header of the CSV-file,
not on the document itself. However, I am currently reading more of the
code to understand what really happens, because everything works fine,
if the fields of the csv are specified - no matter whether I add fields
with an UpdateRequestProcessor or not.

If you like, have a look around line 282 (prepareFields) in
CSVRequestHandler.

Regards

Am 31.01.2011 16:06, schrieb Koji Sekiguchi:
> (11/01/31 23:33), Em wrote:
>> Hi Koji,
>>
>> following is the solrconfig:
>>
>>      <requestHandler name="/update/csv" class="solr.CSVRequestHandler">
>>          <lst name="defaults">
>>              <str name="update.processor">throwAway</str>
>>          </lst>
>>      </requestHandler>
>>
>>    <updateRequestProcessorChain name="throwAway">
>>          <processor
>> class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
>>
>>          <processor class="solr.RunUpdateProcessorFactory" />
>>    </updateRequestProcessorChain>
>>
>> Do you see any mistake here?
>>
>> Regards
>>
>
> Hmm, Looks fine. Are you sure you create your update processor instance
> in your factory and return it? (if it is null, processor chain simply
> ignores your processor...)
>
> Koji


Re: UpdateHandler-Bug or intended feature?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/01/31 23:33), Em wrote:
> Hi Koji,
>
> following is the solrconfig:
>
>      <requestHandler name="/update/csv" class="solr.CSVRequestHandler">
>          <lst name="defaults">
>              <str name="update.processor">throwAway</str>
>          </lst>
>      </requestHandler>
>
>    <updateRequestProcessorChain name="throwAway">
>          <processor
> class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
>          <processor class="solr.RunUpdateProcessorFactory" />
>    </updateRequestProcessorChain>
>
> Do you see any mistake here?
>
> Regards
>

Hmm, Looks fine. Are you sure you create your update processor instance
in your factory and return it? (if it is null, processor chain simply
ignores your processor...)

Koji
-- 
http://www.rondhuit.com/en/

Re: UpdateHandler-Bug or intended feature?

Posted by Em <ma...@yahoo.de>.
Hi Koji,

following is the solrconfig:

    <requestHandler name="/update/csv" class="solr.CSVRequestHandler">
        <lst name="defaults">
            <str name="update.processor">throwAway</str>
        </lst>
    </requestHandler>

  <updateRequestProcessorChain name="throwAway">
        <processor
class="solr.experiments.solr.update.processor.ThrowAwayUpdateProcessorFactory"/>
        <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

Do you see any mistake here?

Regards

Re: UpdateHandler-Bug or intended feature?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(11/01/31 22:20), Em wrote:
>
> Hi list,
>
> I am not sure whether this behaviour is intended or not.
>
> I am experimenting with the UpdateRequestProcessor-feature of Solr (V: 1.4)
> and there occured something I find strange.
>
> Well, when I send csv-data to the CSV-UpdateHandler with some fields
> specified that are not part of the Schema, the input isn't passed up to the
> UpdateRequestProcessor-Chain.
>
> Here is some code from my UpdateRequestProcessor:
>
> 	@Override
> 	public void processAdd(AddUpdateCommand cmd) throws IOException {
> 		 super.processAdd(cmd);
> 		 throw new IOException("HelloWorld");
> 	 }
>
> Well, this processor makes truely no sense, but I wanted to see whether it
> is called or not and it seems like it won't get called anyway.
>

Are you sure you have your UpdateRequestProcessor is defined in solrconfig.xml
and you set the name of UpdateRequestProcessorChain to update.processor parameter
when you call CSVLoader?

Koji
-- 
http://www.rondhuit.com/en/