You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Charlie Hubbard <ch...@gmail.com> on 2015/06/14 19:08:37 UTC

Solrj Tika/Cell not using defaultField

I'm having trouble getting Solr to pay attention to the defaultField value
when I send a document to Solr Cell or Tika.  Here is my post I'm sending
using Solrj

POST
/solr/collection1/update/extract?extractOnly=true&defaultField=text&wt=javabin&version=2
HTTP/1.1

When I get the response back the NamedList contains the content it
extracted but it's under the name null and null_metadata respectively.
I've seen it return the defaultField I give it before, but for some reason
now it's not returning it.  I've even tried to configure the
ExtractRequestHandler like so:

    <requestHandler name="/update/extract"
                    startup="lazy"
                    class="solr.extraction.ExtractingRequestHandler">
        <lst name="defaults">
            <str name="defaultField">text</str>
            <!--<str name="lowernames">true</str>-->
            <!--<str name="uprefix">ignored_</str>-->

            <!-- capture link hrefs but ignore div attributes -->
            <str name="captureAttr">true</str>
            <str name="fmap.content">text</str>
            <str name="fmap.a">links</str>
            <str name="fmap.div">ignored_</str>
        </lst>
        <!--<str name="tika.config">tika.config</str>-->
    </requestHandler>

But even that doesn't get picked up.  Here is the SOLR code I use to set
the parameters:

    public SolrRequest toSolrExtractRequest() throws IOException {
        ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
        req.addFile(getLocation(), null);

        req.setParam(EXTRACT_ONLY, "true");
        req.setParam(DEFAULT_FIELD, "text");

        return req;
    }

So why is this not working?

Charlie