You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mahmoud Almokadem <pr...@gmail.com> on 2017/12/03 16:27:37 UTC

Dataimporter status

We're facing an issue related to the dataimporter status on new Admin UI
(7.0.1).

Calling to the API
http://solrip/solr/collection/dataimport?_=1512314812090&command=status&indent=on&wt=json


returns different status despite the importer is running
The messages are swapped between the following when refreshing the page:
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "initArgs":[
    "defaults",[
      "config","data-config-online-live-pervoice.xml"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{}}

===============================
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "initArgs":[
    "defaults",[
      "config","data-config-online-live-pervoice.xml"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{
    "Total Requests made to DataSource":"2",
    "Total Rows Fetched":"715",
    "Total Documents Processed":"679",
    "Total Documents Skipped":"0",
    "Full Dump Started":"2017-12-03 18:22:31",
    "":"Indexing completed. Added/Updated: 679 documents. Deleted 0
documents.",
    "Committed":"2017-12-03 18:22:32",
    "Total Documents Failed":"36",
    "Time taken":"0:0:54.638",
    "Full Import failed":"2017-12-03 18:22:32"}}

================================
The old Admin UI was working well.

Is that a bug on the new Admin UI?

Thanks,
Mahmoud

Re: Dataimporter status

Posted by devashrid <de...@gmail.com>.
Hi Shawn,

I am new to solr and I have set up a cloud cluster of 1 shard and 3
collections one 2 servers. I am facing the same issue. I am using  
CloudSolrClient client = new
CloudSolrClient.Builder(zkUrls,Optional.empty()).build(), to create my
client.

and then I fire import command using,
client.request(queryRequest,collectionName);

However, I am not sure how to fire it to a particular coreName
(collection_shard_replica) ? Could you please help me out in the same.

Thanks!
Devashri



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Dataimporter status

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/6/2017 1:38 AM, Mahmoud Almokadem wrote:
> I'm already using the admin UI and get URL for fetching the status of
> dataimporter from network console and tried it outside the admin UI. Admin
> UI have the same behavior,  when I pressed on execute the status messages
> are swapped between "not started", "started and indexing", "completed on 3
> seconds", "completed on 10 seconds" something like that.
>
> I understood what you mean that the dataimporter are load balanced between
> shards, that's made me using the old admin UI on using dataimporter to get
> accurate status of what is running now. Because the it's related to core
> not collection.
>
> I think the dataimporter feature must moved to the core level instead of
> collection level.

For production usage, you should be using the API directly, not the
admin UI.

In version 7, the old UI is no longer available.  Moving dataimport back
to the core level in the admin UI is an interesting idea that would make
the problem less likely, though a good fix for SOLR-3666 would be
better.  Any committers want to comment?

Whether it's the admin UI or the API, if you access the DIH handler
through the collection instead of a core, you're going to see this behavior.

Thanks,
Shawn


Re: Dataimporter status

Posted by Mahmoud Almokadem <pr...@gmail.com>.
Thanks Shawn,

I'm already using the admin UI and get URL for fetching the status of
dataimporter from network console and tried it outside the admin UI. Admin
UI have the same behavior,  when I pressed on execute the status messages
are swapped between "not started", "started and indexing", "completed on 3
seconds", "completed on 10 seconds" something like that.

I understood what you mean that the dataimporter are load balanced between
shards, that's made me using the old admin UI on using dataimporter to get
accurate status of what is running now. Because the it's related to core
not collection.

I think the dataimporter feature must moved to the core level instead of
collection level.

Thanks,
Mahmoud


On Tue, Dec 5, 2017 at 6:57 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/3/2017 9:27 AM, Mahmoud Almokadem wrote:
>
>> We're facing an issue related to the dataimporter status on new Admin UI
>> (7.0.1).
>>
>> Calling to the API
>> http://solrip/solr/collection/dataimport?_=1512314812090&com
>> mand=status&indent=on&wt=json
>>
>> returns different status despite the importer is running
>> The messages are swapped between the following when refreshing the page:
>>
>
> <snip>
>
> The old Admin UI was working well.
>>
>> Is that a bug on the new Admin UI?
>>
>
> What I'm going to say below is based on the idea that you're running
> SolrCloud.  If you're not, then this seems extremely odd and should not be
> happening.
>
> The first part of your message has a URL that accesses the API directly,
> *not* the admin UI, so I'm going to concentrate on that, and not discuss
> the admin UI, because the admin UI is not involved when using that kind of
> URL.
>
> When requests are sent to a collection name rather than directly to a
> core, SolrCloud load balances those requests across the cloud, picking
> different replicas and shards so each individual request ends up on a
> different core, and possibly on a different server.
>
> This load balancing is a general feature of SolrCloud, and happens even
> with the dataimport handler.  You never know which shard/replica is going
> to actually get a /dataimport request.  So what is happening here is that
> one of the cores in your collection is actually doing a dataimport, but all
> the others aren't.  When the status command is load balanced to the core
> that did the import, then you see the status with actual data, and when
> load balancing sends the request to one of the other cores, you see the
> empty status.
>
> If you want to reliably see the status of an import on SolrCloud, you're
> going to have to choose one of the cores (collection_shardN_replicaM) on
> one of the servers in your cloud, and send both the import command and the
> status command to that one core, instead of the collection.  You might even
> need to add a distrib=false parameter to the request to keep it from being
> load balanced, but I am not sure whether that's needed for /dataimport.
>
> Thanks,
> Shawn
>

Re: Dataimporter status

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/3/2017 9:27 AM, Mahmoud Almokadem wrote:
> We're facing an issue related to the dataimporter status on new Admin UI
> (7.0.1).
> 
> Calling to the API
> http://solrip/solr/collection/dataimport?_=1512314812090&command=status&indent=on&wt=json
> 
> returns different status despite the importer is running
> The messages are swapped between the following when refreshing the page:

<snip>

> The old Admin UI was working well.
> 
> Is that a bug on the new Admin UI?

What I'm going to say below is based on the idea that you're running 
SolrCloud.  If you're not, then this seems extremely odd and should not 
be happening.

The first part of your message has a URL that accesses the API directly, 
*not* the admin UI, so I'm going to concentrate on that, and not discuss 
the admin UI, because the admin UI is not involved when using that kind 
of URL.

When requests are sent to a collection name rather than directly to a 
core, SolrCloud load balances those requests across the cloud, picking 
different replicas and shards so each individual request ends up on a 
different core, and possibly on a different server.

This load balancing is a general feature of SolrCloud, and happens even 
with the dataimport handler.  You never know which shard/replica is 
going to actually get a /dataimport request.  So what is happening here 
is that one of the cores in your collection is actually doing a 
dataimport, but all the others aren't.  When the status command is load 
balanced to the core that did the import, then you see the status with 
actual data, and when load balancing sends the request to one of the 
other cores, you see the empty status.

If you want to reliably see the status of an import on SolrCloud, you're 
going to have to choose one of the cores (collection_shardN_replicaM) on 
one of the servers in your cloud, and send both the import command and 
the status command to that one core, instead of the collection.  You 
might even need to add a distrib=false parameter to the request to keep 
it from being load balanced, but I am not sure whether that's needed for 
/dataimport.

Thanks,
Shawn