You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pulkit Singhal <pu...@gmail.com> on 2011/09/21 00:42:07 UTC
How to skip fields when using DIH?
The data I'm running through the DIH looks like:
<products>
<product>
<new>false</new>
<active>false</active>
<regularPrice>349.99</regularPrice>
<salesRankShortTerm/>
</product>
</products>
As you can see, in this particular instance of a product, there is no
value for "salesRankShortTerm" which happens to be defined in my
schema like so:
<field name="salesRankShortTerm" type="slong" indexed="true" stored="true" />
Having an empty value in the incoming DIH data leads to an exception:
Caused by: java.lang.NumberFormatException: For input string: ""
1) How can I skip this field if its empty?
If I use script transformer like so:
<script>
<![CDATA[
function skipRow(row) {
var salesRankShortTerm = row.get( 'salesRankShortTerm' );
if ( salesRankShortTerm == null || salesRankShortTerm == '' ) {
row.put( '$skipRow', 'true' );
}
return row;
}
]]>
</script>
THEN, I will end up skipping the entire document :(
2) So please help me understand how I can configure it to only skip a
field and not the document?
Thanks,
- Pulkit
Re: How to skip fields when using DIH?
Posted by Pulkit Singhal <pu...@gmail.com>.
OMG, I'm so sorry, please ignore.
Its so simple, just had to use:
row.remove( 'salesRankShortTerm' );
because the script runs at the end after the entire entity has been
processed (I suppose) rather than per field.
Thanks!
On Tue, Sep 20, 2011 at 5:42 PM, Pulkit Singhal <pu...@gmail.com> wrote:
> The data I'm running through the DIH looks like:
>
> <products>
> <product>
> <new>false</new>
> <active>false</active>
> <regularPrice>349.99</regularPrice>
> <salesRankShortTerm/>
> </product>
> </products>
>
> As you can see, in this particular instance of a product, there is no
> value for "salesRankShortTerm" which happens to be defined in my
> schema like so:
> <field name="salesRankShortTerm" type="slong" indexed="true" stored="true" />
>
> Having an empty value in the incoming DIH data leads to an exception:
> Caused by: java.lang.NumberFormatException: For input string: ""
>
> 1) How can I skip this field if its empty?
>
> If I use script transformer like so:
> <script>
> <![CDATA[
> function skipRow(row) {
> var salesRankShortTerm = row.get( 'salesRankShortTerm' );
> if ( salesRankShortTerm == null || salesRankShortTerm == '' ) {
> row.put( '$skipRow', 'true' );
> }
> return row;
> }
> ]]>
> </script>
> THEN, I will end up skipping the entire document :(
>
> 2) So please help me understand how I can configure it to only skip a
> field and not the document?
>
> Thanks,
> - Pulkit
>