You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jihwan Kim <ji...@gmail.com> on 2016/10/05 02:09:38 UTC

Enhancement on a getFloats method of FileFloatSource.java ?

I would like to ask the Solr organization an enhancement request.

The FileFloatSource creates a cache value from an external file when a Core
is reloaded and/or a new searcher is opened.  Nevertheless, the external
files can be changed less frequently.

With a larger document and larger external files, it consumes lots of CPU
and memory.  Also, it makes a Core reloading process longer.

Until the getFloats method of the FileFloatSource is completed and replace
the cache value in the floatCache --> readerCache, two larger float arrays
are in the memory.  When the external file was not changed, these two
larger float array contains same values.

Current format of the external field file is this:
  Id=float_value
  Id=float_value
  Id=float_value
  :      :

If we support this kind format, we can only create the large array
object(s) when a version of the file is changed.
  Version (such as system current time as an example)
  Id=float_value
  Id=float_value
  Id=float_value
  :      :

When the version was not changed, we can still use the cached array without
creating a new one.

Brief pseudo code, which should be added to the getFloats method, is
something like this.

String currentVersion = r.readLine();

if(latestVerstion != null && latestVerstion.equals(currentVersion)){

      Object cacheVal = latestCache.get(ffs.field.getName());

      if(null != cacheVal){

      return (float[])cacheVal;

      }

 }

  //Create  float array after the version check.
  vals = new float[reader.maxDoc()];

  if (ffs.defVal != 0) {

    Arrays.fill(vals, ffs.defVal);

  }

  latestVerstion = currentVersion;

  latestCache.put(ffs.field.getName(), vals);


  //This is existing codes
  for (String line; (line=r.readLine())!=null;) {

        int delimIndex = line.lastIndexOf(delimiter);

        if (delimIndex < 0) continue;


        int endIndex = line.length();


 *How can I file the enhancement request?*

Thanks.

Re: Enhancement on a getFloats method of FileFloatSource.java ?

Posted by Jihwan Kim <ji...@gmail.com>.
Got it!  Thanks a lot!
On Oct 4, 2016 9:29 PM, "Yonik Seeley" <ys...@gmail.com> wrote:

> On Tue, Oct 4, 2016 at 11:23 PM, Jihwan Kim <ji...@gmail.com> wrote:
> > Hi Yonik,
> > I thought about your comment and I might understand what you were saying.
> > The for loop in the getFloats method assign a different index of the
> array
> > whenever a new segment is created/updated. You are saying that is why we
> > cannot cache the float array as I suggested.  Am I understood correctly?
>
> Yes.
>
> For example, id="mydocument1" may map to lucene_docid 7
> and then after a new searcher is opened that same id may map to
> lucene_docid 22
>
> -Yonik
>

Re: Enhancement on a getFloats method of FileFloatSource.java ?

Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Oct 4, 2016 at 11:23 PM, Jihwan Kim <ji...@gmail.com> wrote:
> Hi Yonik,
> I thought about your comment and I might understand what you were saying.
> The for loop in the getFloats method assign a different index of the array
> whenever a new segment is created/updated. You are saying that is why we
> cannot cache the float array as I suggested.  Am I understood correctly?

Yes.

For example, id="mydocument1" may map to lucene_docid 7
and then after a new searcher is opened that same id may map to lucene_docid 22

-Yonik

Re: Enhancement on a getFloats method of FileFloatSource.java ?

Posted by Jihwan Kim <ji...@gmail.com>.
Hi Yonik,
I thought about your comment and I might understand what you were saying.
The for loop in the getFloats method assign a different index of the array
whenever a new segment is created/updated. You are saying that is why we
cannot cache the float array as I suggested.  Am I understood correctly?

Thanks.


On Tue, Oct 4, 2016 at 8:59 PM, Jihwan Kim <ji...@gmail.com> wrote:

> "The array is indexed by internal lucene docid," --> If I understood
> correctly, it is done inside the for loop that I briefly showed.
>
> In the following code I used, the 'vals' points to an array object and the
> latestCache puts the reference of the same array object.  Then, the index
> decision and assigning a value in the array object is happening inside the
> for loop in the getFloats.  So, the latestCache still hold same array
> object after the index decision is made along with value assignment from a
> value in the external file.
>
> During a QueryHandler (If I understood it correctly), I noticed that
> another array per segment is created (with a size 20 when I looked at) and
> this array is used query results & values in the float array by calling the
> public Object get(IndexReader reader, Object key) method.  This get
> method uses the same float array created by the getFloats.
>
> vals = new float[reader.maxDoc()];
>
> latestCache.put(ffs.field.getName(), vals);
>
> Am I missing something? Any feedback will be helpful to understand the
> Solr better.
>
> Thank you,
>
> Jihwan
>
> On Tue, Oct 4, 2016 at 8:35 PM, Yonik Seeley <ys...@gmail.com> wrote:
>
>> On Tue, Oct 4, 2016 at 10:09 PM, Jihwan Kim <ji...@gmail.com> wrote:
>> > I would like to ask the Solr organization an enhancement request.
>> >
>> > The FileFloatSource creates a cache value from an external file when a
>> Core
>> > is reloaded and/or a new searcher is opened.  Nevertheless, the external
>> > files can be changed less frequently.
>> >
>> [...]
>> > When the version was not changed, we can still use the cached array
>> without
>> > creating a new one.
>>
>> The array is indexed by internal lucene docid, which can change across
>> different views of the index (a commit).
>>
>> What we can do is cache per segment though.  When this class was
>> originally written, we didn't have access to individual segments.
>>
>> -Yonik
>>
>
>

Re: Enhancement on a getFloats method of FileFloatSource.java ?

Posted by Jihwan Kim <ji...@gmail.com>.
"The array is indexed by internal lucene docid," --> If I understood
correctly, it is done inside the for loop that I briefly showed.

In the following code I used, the 'vals' points to an array object and the
latestCache puts the reference of the same array object.  Then, the index
decision and assigning a value in the array object is happening inside the
for loop in the getFloats.  So, the latestCache still hold same array
object after the index decision is made along with value assignment from a
value in the external file.

During a QueryHandler (If I understood it correctly), I noticed that
another array per segment is created (with a size 20 when I looked at) and
this array is used query results & values in the float array by calling the
public Object get(IndexReader reader, Object key) method.  This get method
uses the same float array created by the getFloats.

vals = new float[reader.maxDoc()];

latestCache.put(ffs.field.getName(), vals);

Am I missing something? Any feedback will be helpful to understand the Solr
better.

Thank you,

Jihwan

On Tue, Oct 4, 2016 at 8:35 PM, Yonik Seeley <ys...@gmail.com> wrote:

> On Tue, Oct 4, 2016 at 10:09 PM, Jihwan Kim <ji...@gmail.com> wrote:
> > I would like to ask the Solr organization an enhancement request.
> >
> > The FileFloatSource creates a cache value from an external file when a
> Core
> > is reloaded and/or a new searcher is opened.  Nevertheless, the external
> > files can be changed less frequently.
> >
> [...]
> > When the version was not changed, we can still use the cached array
> without
> > creating a new one.
>
> The array is indexed by internal lucene docid, which can change across
> different views of the index (a commit).
>
> What we can do is cache per segment though.  When this class was
> originally written, we didn't have access to individual segments.
>
> -Yonik
>

Re: Enhancement on a getFloats method of FileFloatSource.java ?

Posted by Yonik Seeley <ys...@gmail.com>.
On Tue, Oct 4, 2016 at 10:09 PM, Jihwan Kim <ji...@gmail.com> wrote:
> I would like to ask the Solr organization an enhancement request.
>
> The FileFloatSource creates a cache value from an external file when a Core
> is reloaded and/or a new searcher is opened.  Nevertheless, the external
> files can be changed less frequently.
>
[...]
> When the version was not changed, we can still use the cached array without
> creating a new one.

The array is indexed by internal lucene docid, which can change across
different views of the index (a commit).

What we can do is cache per segment though.  When this class was
originally written, we didn't have access to individual segments.

-Yonik