You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pulkit Singhal <pu...@gmail.com> on 2011/09/21 00:42:07 UTC

How to skip fields when using DIH?

The data I'm running through the DIH looks like:

<products>
  <product>
    <new>false</new>
    <active>false</active>
    <regularPrice>349.99</regularPrice>
    <salesRankShortTerm/>
  </product>
</products>

As you can see, in this particular instance of a product, there is no
value for "salesRankShortTerm" which happens to be defined in my
schema like so:
<field name="salesRankShortTerm" type="slong"  indexed="true"  stored="true" />

Having an empty value in the incoming DIH data leads to an exception:
Caused by: java.lang.NumberFormatException: For input string: ""

1) How can I skip this field if its empty?

If I use script transformer like so:
  <script>
        <![CDATA[
        function skipRow(row) {
            var salesRankShortTerm = row.get( 'salesRankShortTerm' );
            if ( salesRankShortTerm == null || salesRankShortTerm == '' ) {
                row.put( '$skipRow', 'true' );
            }
            return row;
        }
        ]]>
  </script>
THEN, I will end up skipping the entire document :(

2) So please help me understand how I can configure it to only skip a
field and not the document?

Thanks,
- Pulkit

Re: How to skip fields when using DIH?

Posted by Pulkit Singhal <pu...@gmail.com>.
OMG, I'm so sorry, please ignore.

Its so simple, just had to use:
row.remove( 'salesRankShortTerm' );
because the script runs at the end after the entire entity has been
processed (I suppose) rather than per field.

Thanks!

On Tue, Sep 20, 2011 at 5:42 PM, Pulkit Singhal <pu...@gmail.com> wrote:
> The data I'm running through the DIH looks like:
>
> <products>
>  <product>
>    <new>false</new>
>    <active>false</active>
>    <regularPrice>349.99</regularPrice>
>    <salesRankShortTerm/>
>  </product>
> </products>
>
> As you can see, in this particular instance of a product, there is no
> value for "salesRankShortTerm" which happens to be defined in my
> schema like so:
> <field name="salesRankShortTerm" type="slong"  indexed="true"  stored="true" />
>
> Having an empty value in the incoming DIH data leads to an exception:
> Caused by: java.lang.NumberFormatException: For input string: ""
>
> 1) How can I skip this field if its empty?
>
> If I use script transformer like so:
>  <script>
>        <![CDATA[
>        function skipRow(row) {
>            var salesRankShortTerm = row.get( 'salesRankShortTerm' );
>            if ( salesRankShortTerm == null || salesRankShortTerm == '' ) {
>                row.put( '$skipRow', 'true' );
>            }
>            return row;
>        }
>        ]]>
>  </script>
> THEN, I will end up skipping the entire document :(
>
> 2) So please help me understand how I can configure it to only skip a
> field and not the document?
>
> Thanks,
> - Pulkit
>