You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Hans-Peter Stricker <st...@epublius.de> on 2013/05/27 18:01:18 UTC

Problems with DIH in Solrj

I start the SOLR example with 

java -Dsolr.solr.home=example-DIH/solr -jar start.jar

and run

public static void main(String[] args) {

        String url = "http://localhost:8983/solr/rss";
        SolrServer server;
        SolrQuery query;
        try {
            server = new HttpSolrServer(url);
            query = new SolrQuery();
            query.setParam(CommonParams.QT,"/dataimport");
            QueryRequest request = new QueryRequest(query);
            QueryResponse response = request.process(server);
            server.commit();
            System.out.println(response.toString());

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

without exception and the response string as

{responseHeader={status=0,QTime=0},initArgs={defaults={config=rss-data-config.xml}},status=idle,importResponse=,statusMessages={},WARNING=This response format is experimental.  It is likely to change in the future.}

The Lucene index is "touched" but not really updated: there are only segments.gen and segments_a files of size 1Kb. If I execute /dataimport (full-import with option "commit" checked) from http://localhost:8983/solr/#/rss/dataimport//dataimport I get

{ "responseHeader": { "status": 0, "QTime": 1 }, "initArgs": [ "defaults", [ "config", "rss-data-config.xml" ] ], "command": "status", "status": "idle", "importResponse": "", "statusMessages": { "Total Requests made to DataSource": "1", "Total Rows Fetched": "10", "Total Documents Skipped": "0", "Full Dump Started": "2013-05-27 17:57:07", "": "Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.", "Committed": "2013-05-27 17:57:07", "Total Documents Processed": "10", "Time taken": "0:0:0.603" }, "WARNING": "This response format is experimental. It is likely to change in the future." }

What am I doing wrong?

Re: Problems with DIH in Solrj

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Details about the DataImportHandler are on the wiki:

http://wiki.apache.org/solr/DataImportHandler

In general, the SolrJ client just makes HTTP requests to the corresponding
Solr APIs so you need to learn about the http parameters for the
corresponding solr component. The solr wiki is your best bet.

http://wiki.apache.org/solr/FrontPage


On Mon, May 27, 2013 at 9:50 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:

> Marvelous!!
>
> Once again: where could/should I have read this? What kinds of
> concepts/keywords are "command" and "full-import"? (Couldn't find them in
> any config file. Where are they explained?)
>
> Anyway: Now it works like a charm!
>
> Thanks
>
> Hans
>
>
>
> ------------------------------**--------------------
> From: "Shalin Shekhar Mangar" <sh...@gmail.com>
> Sent: Monday, May 27, 2013 6:09 PM
> To: <so...@lucene.apache.org>
> Subject: Re: Problems with DIH in Solrj
>
>
>  Your program is not specifying a command. You need to add:
>>
>> query.setParam("command", "full-import");
>>
>>
>> On Mon, May 27, 2013 at 9:31 PM, Hans-Peter Stricker
>> <st...@epublius.de>wrote:
>>
>>  I start the SOLR example with
>>>
>>> java -Dsolr.solr.home=example-DIH/**solr -jar start.jar
>>>
>>> and run
>>>
>>> public static void main(String[] args) {
>>>
>>>         String url = "http://localhost:8983/solr/**rss<http://localhost:8983/solr/rss>
>>> ";
>>>         SolrServer server;
>>>         SolrQuery query;
>>>         try {
>>>             server = new HttpSolrServer(url);
>>>             query = new SolrQuery();
>>>             query.setParam(CommonParams.**QT,"/dataimport");
>>>             QueryRequest request = new QueryRequest(query);
>>>             QueryResponse response = request.process(server);
>>>             server.commit();
>>>             System.out.println(response.**toString());
>>>
>>>         } catch (Exception ex) {
>>>             ex.printStackTrace();
>>>         }
>>>     }
>>>
>>> without exception and the response string as
>>>
>>> {responseHeader={status=0,**QTime=0},initArgs={defaults={**
>>> config=rss-data-config.xml}},**status=idle,importResponse=,**
>>> statusMessages={},WARNING=This
>>> response format is experimental.  It is likely to change in the future.}
>>>
>>> The Lucene index is "touched" but not really updated: there are only
>>> segments.gen and segments_a files of size 1Kb. If I execute /dataimport
>>> (full-import with option "commit" checked) from
>>> http://localhost:8983/solr/#/**rss/dataimport//dataimport<http://localhost:8983/solr/#/rss/dataimport//dataimport>I get
>>>
>>> { "responseHeader": { "status": 0, "QTime": 1 }, "initArgs": [
>>> "defaults",
>>> [ "config", "rss-data-config.xml" ] ], "command": "status", "status":
>>> "idle", "importResponse": "", "statusMessages": { "Total Requests made to
>>> DataSource": "1", "Total Rows Fetched": "10", "Total Documents Skipped":
>>> "0", "Full Dump Started": "2013-05-27 17:57:07", "": "Indexing completed.
>>> Added/Updated: 10 documents. Deleted 0 documents.", "Committed":
>>> "2013-05-27 17:57:07", "Total Documents Processed": "10", "Time taken":
>>> "0:0:0.603" }, "WARNING": "This response format is experimental. It is
>>> likely to change in the future." }
>>>
>>> What am I doing wrong?
>>>
>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Problems with DIH in Solrj

Posted by Shawn Heisey <so...@elyograg.org>.
On 5/27/2013 10:20 AM, Hans-Peter Stricker wrote:
> Marvelous!!
> 
> Once again: where could/should I have read this? What kinds of
> concepts/keywords are "command" and "full-import"? (Couldn't find them
> in any config file. Where are they explained?)
> 
> Anyway: Now it works like a charm!

http://wiki.apache.org/solr/DataImportHandler#Commands

The CommonParams.QT syntax that you used only works with SolrJ 4.0 and
newer, and those versions have a shortcut that's slightly easier to read:

query.setRequestHandler("/dataimport");

The reason that there are no real examples of using DIH with SolrJ is
because if you are using SolrJ, it is expected that your application
will be doing the indexing itself, with the add method on the server
object.  I'll point you once again to the database example:

http://wiki.apache.org/solr/Solrj#Reading_data_from_a_database

I do use DIH on occasion - whenever I do a full rebuild of my index, DIH
does the job a lot faster than my own code.  I handle it from SolrJ.

Sending a full-import or delta-import command to Solr returns to SolrJ
immediately.  You will only see a failure on that request if something
major fails with the request itself, it won't tell you anything about
whether the import succeeded or failed.  You must periodically check the
status.

Interpreting the status in a program is a complicated endeavor, because
the status is human readable, not machine readable, and important
information is added or removed from the response at various success and
error stages.  There have been a number of issues on this.  I filed most
of them:

https://issues.apache.org/jira/browse/SOLR-2728
https://issues.apache.org/jira/browse/SOLR-2729
https://issues.apache.org/jira/browse/SOLR-3319
https://issues.apache.org/jira/browse/SOLR-3689
https://issues.apache.org/jira/browse/SOLR-4241

I do have SolrJ code that interprets DIH status, but it's tied up in a
larger work and will require some cleanup before I can share it.

Thanks,
Shawn


Re: Problems with DIH in Solrj

Posted by Hans-Peter Stricker <st...@epublius.de>.
Marvelous!!

Once again: where could/should I have read this? What kinds of 
concepts/keywords are "command" and "full-import"? (Couldn't find them in 
any config file. Where are they explained?)

Anyway: Now it works like a charm!

Thanks

Hans



--------------------------------------------------
From: "Shalin Shekhar Mangar" <sh...@gmail.com>
Sent: Monday, May 27, 2013 6:09 PM
To: <so...@lucene.apache.org>
Subject: Re: Problems with DIH in Solrj

> Your program is not specifying a command. You need to add:
>
> query.setParam("command", "full-import");
>
>
> On Mon, May 27, 2013 at 9:31 PM, Hans-Peter Stricker
> <st...@epublius.de>wrote:
>
>> I start the SOLR example with
>>
>> java -Dsolr.solr.home=example-DIH/solr -jar start.jar
>>
>> and run
>>
>> public static void main(String[] args) {
>>
>>         String url = "http://localhost:8983/solr/rss";
>>         SolrServer server;
>>         SolrQuery query;
>>         try {
>>             server = new HttpSolrServer(url);
>>             query = new SolrQuery();
>>             query.setParam(CommonParams.QT,"/dataimport");
>>             QueryRequest request = new QueryRequest(query);
>>             QueryResponse response = request.process(server);
>>             server.commit();
>>             System.out.println(response.toString());
>>
>>         } catch (Exception ex) {
>>             ex.printStackTrace();
>>         }
>>     }
>>
>> without exception and the response string as
>>
>> {responseHeader={status=0,QTime=0},initArgs={defaults={config=rss-data-config.xml}},status=idle,importResponse=,statusMessages={},WARNING=This
>> response format is experimental.  It is likely to change in the future.}
>>
>> The Lucene index is "touched" but not really updated: there are only
>> segments.gen and segments_a files of size 1Kb. If I execute /dataimport
>> (full-import with option "commit" checked) from
>> http://localhost:8983/solr/#/rss/dataimport//dataimport I get
>>
>> { "responseHeader": { "status": 0, "QTime": 1 }, "initArgs": [ 
>> "defaults",
>> [ "config", "rss-data-config.xml" ] ], "command": "status", "status":
>> "idle", "importResponse": "", "statusMessages": { "Total Requests made to
>> DataSource": "1", "Total Rows Fetched": "10", "Total Documents Skipped":
>> "0", "Full Dump Started": "2013-05-27 17:57:07", "": "Indexing completed.
>> Added/Updated: 10 documents. Deleted 0 documents.", "Committed":
>> "2013-05-27 17:57:07", "Total Documents Processed": "10", "Time taken":
>> "0:0:0.603" }, "WARNING": "This response format is experimental. It is
>> likely to change in the future." }
>>
>> What am I doing wrong?
>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 


Re: Problems with DIH in Solrj

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Your program is not specifying a command. You need to add:

query.setParam("command", "full-import");


On Mon, May 27, 2013 at 9:31 PM, Hans-Peter Stricker
<st...@epublius.de>wrote:

> I start the SOLR example with
>
> java -Dsolr.solr.home=example-DIH/solr -jar start.jar
>
> and run
>
> public static void main(String[] args) {
>
>         String url = "http://localhost:8983/solr/rss";
>         SolrServer server;
>         SolrQuery query;
>         try {
>             server = new HttpSolrServer(url);
>             query = new SolrQuery();
>             query.setParam(CommonParams.QT,"/dataimport");
>             QueryRequest request = new QueryRequest(query);
>             QueryResponse response = request.process(server);
>             server.commit();
>             System.out.println(response.toString());
>
>         } catch (Exception ex) {
>             ex.printStackTrace();
>         }
>     }
>
> without exception and the response string as
>
> {responseHeader={status=0,QTime=0},initArgs={defaults={config=rss-data-config.xml}},status=idle,importResponse=,statusMessages={},WARNING=This
> response format is experimental.  It is likely to change in the future.}
>
> The Lucene index is "touched" but not really updated: there are only
> segments.gen and segments_a files of size 1Kb. If I execute /dataimport
> (full-import with option "commit" checked) from
> http://localhost:8983/solr/#/rss/dataimport//dataimport I get
>
> { "responseHeader": { "status": 0, "QTime": 1 }, "initArgs": [ "defaults",
> [ "config", "rss-data-config.xml" ] ], "command": "status", "status":
> "idle", "importResponse": "", "statusMessages": { "Total Requests made to
> DataSource": "1", "Total Rows Fetched": "10", "Total Documents Skipped":
> "0", "Full Dump Started": "2013-05-27 17:57:07", "": "Indexing completed.
> Added/Updated: 10 documents. Deleted 0 documents.", "Committed":
> "2013-05-27 17:57:07", "Total Documents Processed": "10", "Time taken":
> "0:0:0.603" }, "WARNING": "This response format is experimental. It is
> likely to change in the future." }
>
> What am I doing wrong?




-- 
Regards,
Shalin Shekhar Mangar.