You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@wunderwood.org> on 2019/05/08 19:59:38 UTC

Load suggest dictionary from non-Zookeeper file?

Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?

java.lang.IllegalArgumentException: Invalid path string "/configs/questions-suggest//solr/suggest-data/questions-suggest/ngram_counts.tsv"

I could bring up a non-cloud cluster just for this suggester, but that seems like an ugly hack.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Load suggest dictionary from non-Zookeeper file?

Posted by Mikhail Khludnev <mk...@apache.org>.

Right.

On Wed, May 8, 2019 at 11:49 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/8/2019 2:34 PM, Mikhail Khludnev wrote:
> > It reminds me
> https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
> > I don't think it's already integrated with suggester.
>
> I'm having one of of those days where I can't seem to recall things easily.
>
> With the blob store, the blobs are in the Lucene index, right?
>
> Thanks,
> Shawn
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Load suggest dictionary from non-Zookeeper file?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 5/8/2019 2:34 PM, Mikhail Khludnev wrote:
> It reminds me  https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
> I don't think it's already integrated with suggester.

I'm having one of of those days where I can't seem to recall things easily.

With the blob store, the blobs are in the Lucene index, right?

Thanks,
Shawn

Re: Load suggest dictionary from non-Zookeeper file?

Posted by Mikhail Khludnev <mk...@apache.org>.

It reminds me  https://lucene.apache.org/solr/guide/7_6/blob-store-api.html but
I don't think it's already integrated with suggester.

On Wed, May 8, 2019 at 11:26 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/8/2019 1:59 PM, Walter Underwood wrote:
> > Our suggest dictionary is too big for Zookeeper. I’m trying to load it
> from an absolute path, but the Solr 6.6.1 insists on interpreting that as a
> Zookeeper path. Any way to disable that?
>
> I wouldn't be surprised to learn it's not possible to get it to go
> outside zookeeper for config files.  I do not know, though.
>
> For right now, your only option will probably be to increase the
> jute.maxbuffer system property on all relevant ZK servers and Solr
> servers.  Then you will be able to store data larger than 1MB in ZK.
> Somebody from the ZK project would probably frown on that solution, and
> if I'm honest, I don't like it much myself.
>
> There are use cases like this where a SolrCloud replica (core) needs to
> access some large data that would be better kept on the local disk
> instead of in ZK.  I think it's probably a good idea to open an issue
> for allowing access to config data on the filesystem for SolrCloud.  So
> it's probably a good idea to open an issue to make that possible.  I'd
> like some of the other people here to sanity check that idea, though.
>
> Thanks,
> Shawn
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Load suggest dictionary from non-Zookeeper file?

Posted by Walter Underwood <wu...@wunderwood.org>.

The file is 33 Megabytes, so I don’t think increasing jute.maxbuffer is a wise idea.

The current documentation is not at all clear about how the dictionary file name is interpreted. I could see an absolute path being local and a relative path being relative to the ZK config folder. I wouldn’t mind using a “file:” URL for local stuff.

None of that is going to get this prototype working today, so I’m back to a non-cloud cluster. That is a real pain in the ass to set up with 6.x and 7.x. I got it working before vacation and now I can’t remember the steps.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 8, 2019, at 1:26 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 5/8/2019 1:59 PM, Walter Underwood wrote:
>> Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?
> 
> I wouldn't be surprised to learn it's not possible to get it to go outside zookeeper for config files.  I do not know, though.
> 
> For right now, your only option will probably be to increase the jute.maxbuffer system property on all relevant ZK servers and Solr servers.  Then you will be able to store data larger than 1MB in ZK. Somebody from the ZK project would probably frown on that solution, and if I'm honest, I don't like it much myself.
> 
> There are use cases like this where a SolrCloud replica (core) needs to access some large data that would be better kept on the local disk instead of in ZK.  I think it's probably a good idea to open an issue for allowing access to config data on the filesystem for SolrCloud.  So it's probably a good idea to open an issue to make that possible.  I'd like some of the other people here to sanity check that idea, though.
> 
> Thanks,
> Shawn

Re: Load suggest dictionary from non-Zookeeper file?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 5/8/2019 1:59 PM, Walter Underwood wrote:
> Our suggest dictionary is too big for Zookeeper. I’m trying to load it from an absolute path, but the Solr 6.6.1 insists on interpreting that as a Zookeeper path. Any way to disable that?

I wouldn't be surprised to learn it's not possible to get it to go 
outside zookeeper for config files.  I do not know, though.

For right now, your only option will probably be to increase the 
jute.maxbuffer system property on all relevant ZK servers and Solr 
servers.  Then you will be able to store data larger than 1MB in ZK. 
Somebody from the ZK project would probably frown on that solution, and 
if I'm honest, I don't like it much myself.

There are use cases like this where a SolrCloud replica (core) needs to 
access some large data that would be better kept on the local disk 
instead of in ZK.  I think it's probably a good idea to open an issue 
for allowing access to config data on the filesystem for SolrCloud.  So 
it's probably a good idea to open an issue to make that possible.  I'd 
like some of the other people here to sanity check that idea, though.

Thanks,
Shawn