You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Zhenya Stanilovsky <ar...@mail.ru.INVALID> on 2019/11/26 10:59:10 UTC

Re[2]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
 
thanks !
  
>Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <il...@gmail.com>:
> 
>Hello!
>
>I have a hunch that we are trying to build Apache Solr (or Solr Cloud) into
>Apache Ignite. I think that's a lot of effort that is not very justified.
>
>I don't think we should try to implement sorting in Apache Ignite, because
>it is a lot of work, and a lot of code in our code base which we don't
>really want.
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shuliga@gmail.com >:
> 
>> Dear Igniters,
>>
>> The first part of TextQuery improvement - a result limit - was developed
>> and merged.
>> Now we have to develop most important functionality here - proper sorting
>> of Lucene index response and correct reducing of them for distributed
>> queries.
>>
>> *There are two Lucene based aspects*
>>
>> 1. In case of using no sorting fields, the documents in response are still
>> ordered by relevance.
>> Actually this is ScoreDoc.score value.
>> In order to reduce the distributed results correctly, the score should be
>> passed with response.
>>
>> 2. When sorting by conventional fields, then Lucene should have these
>> fields properly indexed and
>> corresponding Sort object should be applied to Lucene's search call.
>> In order to mark those fields a new annotation like '@SortField' may be
>> introduced.
>>
>> *Reducing on Ignite *
>>
>> The obvious point of distributed response reduction is class
>> GridCacheDistributedQueryFuture.
>> Though, @Ivan Pavlukhin mentioned class with similar functionality:
>> ReduceIndexSorted
>> What I see here, that it is tangled with H2 related classes (
>> org.h2.result.Row) and might not be unified with TextQuery reduction.
>>
>> Still need a support here.
>>
>> Overall, the goal of this letter is to initiate discussion on TextQuery
>> Sorting implementation and come closer to ticket creation.
>>
>> BR,
>> Yuriy Shuliha
>>
>> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < andrey.mashenkov@gmail.com >
>> пише:
>>
>> > Hi Dmitry, Yuriy.
>> >
>> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
>> > 'total' field and 'limit; field as primitive int.
>> >
>> > Both fields are used inside synchronized block only.
>> > So, we can make both private and downgrade AtomicInteger to primitive
>> int.
>> >
>> > Most likely, these fields can be replaced with one field.
>> >
>> >
>> >
>> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < dpavlov@apache.org >
>> > wrote:
>> >
>> > > Hi Andrey,
>> > >
>> > > I've checked this ticket comments, and there is a TC Bot visa (with no
>> > > blockers).
>> > >
>> > > Do you have any concerns related to this patch?
>> > >
>> > > Sincerely,
>> > > Dmitriy Pavlov
>> > >
>> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shuliga@gmail.com >:
>> > >
>> > >> Andrey,
>> > >>
>> > >> Per you request, I created ticket
>> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
>> > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>> > >>
>> > >> Could you please proceed with PR merge ?
>> > >>
>> > >> BR,
>> > >> Yuriy Shuliha
>> > >>
>> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov < andrey.mashenkov@gmail.com
>> >
>> > >> пише:
>> > >>
>> > >> > Hi Yuri,
>> > >> >
>> > >> > To get access to TC Bot you should register as TeamCity user [1], if
>> > you
>> > >> > didn't do this already.
>> > >> > Then you will be able to authorize on Ignite TC Bot page with same
>> > >> > credentials.
>> > >> >
>> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
>> > >> >
>> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < shuliga@gmail.com >
>> > wrote:
>> > >> >
>> > >> >> Andrew,
>> > >> >>
>> > >> >> I have corrected PR according to your notes. Please review.
>> > >> >> What will be the next steps in order to merge in?
>> > >> >>
>> > >> >> Y.
>> > >> >>
>> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
>> >  andrey.mashenkov@gmail.com >
>> > >> >> пише:
>> > >> >>
>> > >> >> > Yuri,
>> > >> >> >
>> > >> >> > I've done with review.
>> > >> >> > No crime found, but trivial compatibility bug.
>> > >> >> >
>> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga < shuliga@gmail.com >
>> > >> wrote:
>> > >> >> >
>> > >> >> > > Denis,
>> > >> >> > >
>> > >> >> > > Thank you for your attention to this.
>> > >> >> > > as for now, the
>> >  https://issues.apache.org/jira/browse/IGNITE-12189
>> > >> >> > ticket
>> > >> >> > > is still pending review.
>> > >> >> > > Do we have a chance to move it forward somehow?
>> > >> >> > >
>> > >> >> > > BR,
>> > >> >> > > Yuriy Shuliha
>> > >> >> > >
>> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < dmagda@apache.org > пише:
>> > >> >> > >
>> > >> >> > > > Yuriy,
>> > >> >> > > >
>> > >> >> > > > I've seen you opening a pull-request with the first changes:
>> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
>> > >> >> > > >
>> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
>> > review?
>> > >> >> > > >
>> > >> >> > > > -
>> > >> >> > > > Denis
>> > >> >> > > >
>> > >> >> > > >
>> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
>> > >>  vololo100@gmail.com >
>> > >> >> > > wrote:
>> > >> >> > > >
>> > >> >> > > > > Yuriy,
>> > >> >> > > > >
>> > >> >> > > > > Thank you for providing details! Quite interesting.
>> > >> >> > > > >
>> > >> >> > > > > Yes, we already have support of distributed limit and
>> merging
>> > >> >> sorted
>> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> > >> >> > > > > MergeStreamIterator are used for merging sorted streams.
>> > >> >> > > > >
>> > >> >> > > > > Could you please also clarify about score/relevance? Is it
>> > >> >> provided
>> > >> >> > by
>> > >> >> > > > > Lucene engine for each query result? I am thinking how to
>> do
>> > >> >> sorted
>> > >> >> > > > > merge properly in this case.
>> > >> >> > > > >
>> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
>> >  shuliga@gmail.com
>> > >> >:
>> > >> >> > > > > >
>> > >> >> > > > > > Ivan,
>> > >> >> > > > > >
>> > >> >> > > > > > Thank you for interesting question!
>> > >> >> > > > > >
>> > >> >> > > > > > Text searches (or full text searches) are mostly
>> > >> human-oriented.
>> > >> >> > And
>> > >> >> > > > the
>> > >> >> > > > > > point of user's interest is topmost part of response.
>> > >> >> > > > > > Then user can read it, evaluate and use the given records
>> > for
>> > >> >> > further
>> > >> >> > > > > > purposes.
>> > >> >> > > > > >
>> > >> >> > > > > > Particularly in our case, we use Ignite for operations
>> with
>> > >> >> > financial
>> > >> >> > > > > data,
>> > >> >> > > > > > and there lots of text stuff like assets names, fin.
>> > >> >> instruments,
>> > >> >> > > > > companies
>> > >> >> > > > > > etc.
>> > >> >> > > > > > In order to operate with this quickly and reliably, users
>> > >> used
>> > >> >> to
>> > >> >> > > work
>> > >> >> > > > > with
>> > >> >> > > > > > text search, type-ahead completions, suggestions.
>> > >> >> > > > > >
>> > >> >> > > > > > For this purposes we are indexing particular string data
>> in
>> > >> >> > separate
>> > >> >> > > > > caches.
>> > >> >> > > > > >
>> > >> >> > > > > > Sorting capabilities and response size limitations are
>> very
>> > >> >> > important
>> > >> >> > > > > > there. As our API have to provide most relevant
>> information
>> > >> in
>> > >> >> view
>> > >> >> > > of
>> > >> >> > > > > > limited size.
>> > >> >> > > > > >
>> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
>> > >> >> > > > > > Actually Ignite queries and Lucene returns
>> > >> *TopDocs.scoresDocs
>> > >> >> > > *already
>> > >> >> > > > > > sorted by *score *(relevance). So most relevant documents
>> > >> are on
>> > >> >> > the
>> > >> >> > > > top.
>> > >> >> > > > > > And currently distributed queries responses from
>> different
>> > >> nodes
>> > >> >> > are
>> > >> >> > > > > merged
>> > >> >> > > > > > into final query cursor queue in arbitrary way.
>> > >> >> > > > > > So in fact we already have the score order ruined here.
>> > Also
>> > >> >> Ignite
>> > >> >> > > > > > requests all possible documents from Lucene that is
>> > redundant
>> > >> >> and
>> > >> >> > not
>> > >> >> > > > > good
>> > >> >> > > > > > for performance.
>> > >> >> > > > > >
>> > >> >> > > > > > I'm implementing *limit* parameter to be part of
>> *TextQuery
>> > >> *and
>> > >> >> > have
>> > >> >> > > > to
>> > >> >> > > > > > notice that we still have to add sorting for text queries
>> > >> >> > processing
>> > >> >> > > in
>> > >> >> > > > > > order to have applicable results.
>> > >> >> > > > > >
>> > >> >> > > > > > *Limit* parameter itself should improve the part of
>> issues
>> > >> from
>> > >> >> > > above,
>> > >> >> > > > > but
>> > >> >> > > > > > definitely, sorting by document score at least should be
>> > >> >> > implemented
>> > >> >> > > > > along
>> > >> >> > > > > > with limit.
>> > >> >> > > > > >
>> > >> >> > > > > > This is a pretty short commentary if you still have any
>> > >> >> questions,
>> > >> >> > > > please
>> > >> >> > > > > > ask, do not hesitate)
>> > >> >> > > > > >
>> > >> >> > > > > > BR,
>> > >> >> > > > > > Yuriy Shuliha
>> > >> >> > > > > >
>> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
>> >  vololo100@gmail.com >
>> > >> >> пише:
>> > >> >> > > > > >
>> > >> >> > > > > > > Yuriy,
>> > >> >> > > > > > >
>> > >> >> > > > > > > Greatly appreciate your interest.
>> > >> >> > > > > > >
>> > >> >> > > > > > > Could you please elaborate a little bit about sorting?
>> > What
>> > >> >> tasks
>> > >> >> > > > does
>> > >> >> > > > > > > it help to solve and how? It would be great to provide
>> an
>> > >> >> > example.
>> > >> >> > > > > > >
>> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
>> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > Denis,
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > I like the idea of throwing an exception for enabled
>> > text
>> > >> >> > queries
>> > >> >> > > > on
>> > >> >> > > > > > > > persistent caches.
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
>> > searches.
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > Yury, please proceed with ticket creation.
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
>> > >>  dmagda@apache.org
>> > >> >> >:
>> > >> >> > > > > > > >
>> > >> >> > > > > > > > > Igniters,
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in regards
>> > >> >> full-text
>> > >> >> > > > > search
>> > >> >> > > > > > > API
>> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
>> > forward.
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > As for the in-memory mode only, it makes total
>> sense
>> > >> for
>> > >> >> > > > in-memory
>> > >> >> > > > > data
>> > >> >> > > > > > > > > grid deployments when Ignite caches data of an
>> > >> underlying
>> > >> >> DB
>> > >> >> > > like
>> > >> >> > > > > > > Postgres.
>> > >> >> > > > > > > > > As part of the changes, I would simply throw an
>> > >> exception
>> > >> >> (by
>> > >> >> > > > > default)
>> > >> >> > > > > > > if
>> > >> >> > > > > > > > > the one attempts to use text indices with the
>> native
>> > >> >> > > persistence
>> > >> >> > > > > > > enabled.
>> > >> >> > > > > > > > > If the person is ready to live with that limitation
>> > >> that
>> > >> >> an
>> > >> >> > > > > explicit
>> > >> >> > > > > > > > > configuration change is needed to come around the
>> > >> >> exception.
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > Thoughts?
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > -
>> > >> >> > > > > > > > > Denis
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga <
>> > >> >> > >  shuliga@gmail.com
>> > >> >> > > > >
>> > >> >> > > > > > > wrote:
>> > >> >> > > > > > > > >
>> > >> >> > > > > > > > > > Hello to all again,
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > Thank you for important comments and notes given
>> > >> below!
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > Let me answer and continue the discussion.
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > Alexei has referenced to
>> > >> >> > > > > > > > > >
>>  https://issues.apache.org/jira/browse/IGNITE-5371
>> > >> where
>> > >> >> > > > > > > > > > absence of index persistence was declared as an
>> > >> >> obstacle to
>> > >> >> > > > > further
>> > >> >> > > > > > > > > > development.
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > a) This ticket is already closed as not valid.b)
>> > >> There
>> > >> >> are
>> > >> >> > > > > definite
>> > >> >> > > > > > > needs
>> > >> >> > > > > > > > > > (and in our project as well) in just in-memory
>> > >> indexing
>> > >> >> of
>> > >> >> > > > > selected
>> > >> >> > > > > > > data.
>> > >> >> > > > > > > > > > We intend to use search capabilities for fetching
>> > >> >> limited
>> > >> >> > > > amount
>> > >> >> > > > > of
>> > >> >> > > > > > > > > records
>> > >> >> > > > > > > > > > that should be used in type-ahead search /
>> > >> suggestions.
>> > >> >> > > > > > > > > > Not all of the data will be indexed and the are
>> no
>> > >> need
>> > >> >> in
>> > >> >> > > > Lucene
>> > >> >> > > > > > > index
>> > >> >> > > > > > > > > to
>> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of
>> > >> >> text-search
>> > >> >> > > > usage.
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > (II) Necessary fixes in current implementation.
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset*
>> > seems
>> > >> to
>> > >> >> be
>> > >> >> > > not
>> > >> >> > > > > > > required
>> > >> >> > > > > > > > > in
>> > >> >> > > > > > > > > > text-search tasks for now)
>> > >> >> > > > > > > > > > I have investigated the data flow for distributed
>> > >> text
>> > >> >> > > queries.
>> > >> >> > > > > it
>> > >> >> > > > > > > was
>> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'*
>> > >> >> > > > > > > > > > For now each server-node returns all response
>> > >> records to
>> > >> >> > the
>> > >> >> > > > > > > client-node
>> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred thousands
>> > >> >> records.
>> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all
>> the
>> > >> >> results
>> > >> >> > > are
>> > >> >> > > > > added
>> > >> >> > > > > > > to
>> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in arbitrary
>> > >> order
>> > >> >> by
>> > >> >> > > > pages.
>> > >> >> > > > > > > > > > I did not find here any means to deliver
>> > >> deterministic
>> > >> >> > > result.
>> > >> >> > > > > > > > > > So implementing limit as part of query and
>> > >> >> > > > > (GridCacheQueryRequest)
>> > >> >> > > > > > > will
>> > >> >> > > > > > > > > not
>> > >> >> > > > > > > > > > change the nature of response but will limit load
>> > on
>> > >> >> nodes
>> > >> >> > > and
>> > >> >> > > > > > > > > networking.
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > (III) Further extension of Lucene API exposition
>> to
>> > >> >> Ignite
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > a) Sorting
>> > >> >> > > > > > > > > > The solution for this could be:
>> > >> >> > > > > > > > > > - Make entities comparable
>> > >> >> > > > > > > > > > - Add custom comparator to entity
>> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
>> Lucene
>> > >> >> indexing
>> > >> >> > > > > > > > > > - Use comparators when merging responses or
>> > reducing
>> > >> to
>> > >> >> > > desired
>> > >> >> > > > > > > limit on
>> > >> >> > > > > > > > > > client node.
>> > >> >> > > > > > > > > > Will require full result set to be loaded into
>> > >> memory.
>> > >> >> > Though
>> > >> >> > > > > can be
>> > >> >> > > > > > > used
>> > >> >> > > > > > > > > > for relatively small limits.
>> > >> >> > > > > > > > > > BR,
>> > >> >> > > > > > > > > > Yuriy Shuliha
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
>> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
>> > >> >> > > > > > > > > > пише:
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > > > > Yuriy,
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > Note what one of major blockers for text
>> queries
>> > is
>> > >> >> [1]
>> > >> >> > > which
>> > >> >> > > > > makes
>> > >> >> > > > > > > > > > lucene
>> > >> >> > > > > > > > > > > indexes unusable with persistence and main
>> reason
>> > >> for
>> > >> >> > > > > > > discontinuation.
>> > >> >> > > > > > > > > > > Probably it's should be addressed first to make
>> > >> text
>> > >> >> > > queries
>> > >> >> > > > a
>> > >> >> > > > > > > valid
>> > >> >> > > > > > > > > > > product feature.
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is
>> > indeed
>> > >> >> not a
>> > >> >> > > > > trivial
>> > >> >> > > > > > > task.
>> > >> >> > > > > > > > > > > Some kind of merging must be implemented on
>> query
>> > >> >> > > originating
>> > >> >> > > > > node.
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > [1]
>> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda <
>> > >> >> > >  dmagda@apache.org
>> > >> >> > > > >:
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > > Yuriy,
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > If you are ready to take over the full-text
>> > >> search
>> > >> >> > > indexes
>> > >> >> > > > > then
>> > >> >> > > > > > > > > please
>> > >> >> > > > > > > > > > go
>> > >> >> > > > > > > > > > > > ahead. The primary reason why the community
>> > >> wants to
>> > >> >> > > > > discontinue
>> > >> >> > > > > > > them
>> > >> >> > > > > > > > > > > first
>> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the
>> > >> limitations
>> > >> >> > > listed
>> > >> >> > > > > by
>> > >> >> > > > > > > Andrey
>> > >> >> > > > > > > > > > and
>> > >> >> > > > > > > > > > > > minimal support from the community end.
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > -
>> > >> >> > > > > > > > > > > > Denis
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
>> > Mashenkov
>> > >> <
>> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
>> > >> >> > > > > > > > > > > > wrote:
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > Hi Yuriy,
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
>> > discontinue
>> > >> >> > > > TextQueries
>> > >> >> > > > > in
>> > >> >> > > > > > > > > Ignite
>> > >> >> > > > > > > > > > > [1].
>> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not
>> > >> >> persistent,
>> > >> >> > not
>> > >> >> > > > > > > > > transactional
>> > >> >> > > > > > > > > > > and
>> > >> >> > > > > > > > > > > > > can't be user together with SQL or inside
>> > SQL.
>> > >> >> > > > > > > > > > > > > and there is a lack of interest from
>> > community
>> > >> >> side.
>> > >> >> > > > > > > > > > > > > You are weclome to take on these issues and
>> > >> make
>> > >> >> > > > > TextQueries
>> > >> >> > > > > > > great.
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
>> > resultset.
>> > >> >> > > > > > > > > > > > > Query results return from data node to
>> > >> client-side
>> > >> >> > > cursor
>> > >> >> > > > > in
>> > >> >> > > > > > > > > > > page-by-page
>> > >> >> > > > > > > > > > > > > manner and
>> > >> >> > > > > > > > > > > > > this parameter is designed control page
>> size.
>> > >> It
>> > >> >> is
>> > >> >> > > > > supposed
>> > >> >> > > > > > > query
>> > >> >> > > > > > > > > > > > executes
>> > >> >> > > > > > > > > > > > > lazily on server side and
>> > >> >> > > > > > > > > > > > > it is not excepted full resultset be loaded
>> > to
>> > >> >> memory
>> > >> >> > > on
>> > >> >> > > > > server
>> > >> >> > > > > > > > > side
>> > >> >> > > > > > > > > > at
>> > >> >> > > > > > > > > > > > > once, but by pages.
>> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire
>> > >> resultset
>> > >> >> > into
>> > >> >> > > > > memory
>> > >> >> > > > > > > > > before
>> > >> >> > > > > > > > > > > > first
>> > >> >> > > > > > > > > > > > > page is sent to client?
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > I'd think a new parameter should be added
>> to
>> > >> limit
>> > >> >> > > > result.
>> > >> >> > > > > The
>> > >> >> > > > > > > best
>> > >> >> > > > > > > > > > > > > solution is to use query language commands
>> > for
>> > >> >> this,
>> > >> >> > > e.g.
>> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
>> > >> >> > > > > > > > > > > > in
>> > >> >> > > > > > > > > > > > > SQL.
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is
>> > >> >> distributed
>> > >> >> > > > > operation
>> > >> >> > > > > > > and
>> > >> >> > > > > > > > > > same
>> > >> >> > > > > > > > > > > > > user query will be executed on data nodes
>> > >> >> > > > > > > > > > > > > and then results from all nodes should be
>> > >> correcly
>> > >> >> > > merged
>> > >> >> > > > > > > before
>> > >> >> > > > > > > > > > being
>> > >> >> > > > > > > > > > > > > returned via client-cursor.
>> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every node
>> and
>> > >> >> then on
>> > >> >> > > > merge
>> > >> >> > > > > > > phase.
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting
>> > results
>> > >> >> make
>> > >> >> > no
>> > >> >> > > > > sence
>> > >> >> > > > > > > > > without
>> > >> >> > > > > > > > > > > > > sorting,
>> > >> >> > > > > > > > > > > > > as there is no guarantee every next query
>> run
>> > >> will
>> > >> >> > > return
>> > >> >> > > > > same
>> > >> >> > > > > > > data
>> > >> >> > > > > > > > > > > > because
>> > >> >> > > > > > > > > > > > > of page reordeing.
>> > >> >> > > > > > > > > > > > > Basically, merge phase receive results from
>> > >> data
>> > >> >> > nodes
>> > >> >> > > > > > > > > asynchronously
>> > >> >> > > > > > > > > > > and
>> > >> >> > > > > > > > > > > > > messages from different nodes can't be
>> > ordered.
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > 2.
>> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
>> > @QueryTextFiled)
>> > >> >> looks
>> > >> >> > > more
>> > >> >> > > > > > > verbose,
>> > >> >> > > > > > > > > > > isn't
>> > >> >> > > > > > > > > > > > > it.
>> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How
>> > partial
>> > >> >> > results
>> > >> >> > > > from
>> > >> >> > > > > > > nodes
>> > >> >> > > > > > > > > > will
>> > >> >> > > > > > > > > > > be
>> > >> >> > > > > > > > > > > > > merged?
>> > >> >> > > > > > > > > > > > > Does Lucene allows to configure comparator
>> > for
>> > >> >> data
>> > >> >> > > > > sorting?
>> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to
>> sort
>> > >> >> result
>> > >> >> > on
>> > >> >> > > > > merge
>> > >> >> > > > > > > phase?
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
>> configurable
>> > at
>> > >> >> all.
>> > >> >> > > E.g.
>> > >> >> > > > > it is
>> > >> >> > > > > > > > > > > > impossible
>> > >> >> > > > > > > > > > > > > to configure Tokenizer.
>> > >> >> > > > > > > > > > > > > I'd think about possible ways to configure
>> > >> engine
>> > >> >> at
>> > >> >> > > > first
>> > >> >> > > > > and
>> > >> >> > > > > > > only
>> > >> >> > > > > > > > > > > then
>> > >> >> > > > > > > > > > > > go
>> > >> >> > > > > > > > > > > > > further to discuss\implement complex
>> > features,
>> > >> >> > > > > > > > > > > > > that may depends on engine config.
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
>> > Shuliga <
>> > >> >> > > > > > >  shuliga@gmail.com >
>> > >> >> > > > > > > > > > > wrote:
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > Dear community,
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to open
>> > >> >> discussion
>> > >> >> > > that
>> > >> >> > > > > would
>> > >> >> > > > > > > > > come
>> > >> >> > > > > > > > > > to
>> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities, backed
>> up
>> > >> by
>> > >> >> > > > different
>> > >> >> > > > > > > > > > mechanisms,
>> > >> >> > > > > > > > > > > > > > including Lucene.
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past
>> year
>> > >> >> > release).
>> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
>> technology
>> > >> that
>> > >> >> > > covers
>> > >> >> > > > > text
>> > >> >> > > > > > > > > search
>> > >> >> > > > > > > > > > > > area
>> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data indexing).
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
>> > >> functionality
>> > >> >> to
>> > >> >> > > > Ignite
>> > >> >> > > > > > > > > indexing
>> > >> >> > > > > > > > > > > and
>> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > It's quite simple request at current
>> stage.
>> > >> It
>> > >> >> is
>> > >> >> > > > coming
>> > >> >> > > > > > > from our
>> > >> >> > > > > > > > > > > > > project's
>> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful for
>> a
>> > >> lot
>> > >> >> more
>> > >> >> > > > > people.
>> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss
>> > about
>> > >> >> Jira
>> > >> >> > > > > tickets for
>> > >> >> > > > > > > > > them.
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > 1.[trivial] Use dataQuery.getPageSize()
>> > to
>> > >> >> limit
>> > >> >> > > > search
>> > >> >> > > > > > > > > response
>> > >> >> > > > > > > > > > > > items
>> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query(). Currently
>> > it
>> > >> is
>> > >> >> > > calling
>> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
>> > >> >> *Integer.MAX_VALUE*) -
>> > >> >> > so
>> > >> >> > > > > > > basically
>> > >> >> > > > > > > > > all
>> > >> >> > > > > > > > > > > > > scored
>> > >> >> > > > > > > > > > > > > > matches will me returned, what we do not
>> > >> need in
>> > >> >> > most
>> > >> >> > > > > cases.
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
>> capable
>> > >> >> search
>> > >> >> > > call
>> > >> >> > > > > can be
>> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query,
>> > count,
>> > >> >> > > > > > > > > > > > > > sort) *
>> > >> >> > > > > > > > > > > > > > Implementation steps:
>> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
>> parameter
>> > in
>> > >> >> > > > > > > *@QueryTextFiled *
>> > >> >> > > > > > > > > > > > > > annotation. If
>> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but not
>> > >> >> tokenized.
>> > >> >> > > > > Number
>> > >> >> > > > > > > types
>> > >> >> > > > > > > > > > are
>> > >> >> > > > > > > > > > > > > > preferred here.
>> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to *TextQuery*
>> > >> >> > constructor.
>> > >> >> > > It
>> > >> >> > > > > > > should
>> > >> >> > > > > > > > > > define
>> > >> >> > > > > > > > > > > > > > desired sort fields used for querying.
>> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
>> > >> >> > > > > GridLuceneIndex.query().
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries with
>> > >> >> > *TextQuery*,
>> > >> >> > > > > > > including
>> > >> >> > > > > > > > > > > > > > terms/queries boosting.
>> > >> >> > > > > > > > > > > > > > *This section for voting only, as
>> requires
>> > >> more
>> > >> >> > > > detailed
>> > >> >> > > > > > > work.
>> > >> >> > > > > > > > > > Should
>> > >> >> > > > > > > > > > > > be
>> > >> >> > > > > > > > > > > > > > extended if community is interested in
>> it.*
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > > BR,
>> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
>> > >> >> > > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > > > --
>> > >> >> > > > > > > > > > > > > Best regards,
>> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
>> > >> >> > > > > > > > > > > > >
>> > >> >> > > > > > > > > > > >
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > --
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > > > Best regards,
>> > >> >> > > > > > > > > > > Alexei Scherbakov
>> > >> >> > > > > > > > > > >
>> > >> >> > > > > > > > > >
>> > >> >> > > > > > > > >
>> > >> >> > > > > > >
>> > >> >> > > > > > >
>> > >> >> > > > > > >
>> > >> >> > > > > > > --
>> > >> >> > > > > > > Best regards,
>> > >> >> > > > > > > Ivan Pavlukhin
>> > >> >> > > > > > >
>> > >> >> > > > >
>> > >> >> > > > >
>> > >> >> > > > >
>> > >> >> > > > > --
>> > >> >> > > > > Best regards,
>> > >> >> > > > > Ivan Pavlukhin
>> > >> >> > > > >
>> > >> >> > > >
>> > >> >> > >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> > Best regards,
>> > >> >> > Andrey V. Mashenkov
>> > >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Best regards,
>> > >> > Andrey V. Mashenkov
>> > >> >
>> > >>
>> > >
>> >
>> > --
>> > Best regards,
>> > Andrey V. Mashenkov
>> >
>>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Yuriy Shuliga <sh...@gmail.com>.

Nice to hear, Ivan

It's good practice to make existing functionality extension to be proper
presented; as we expect if from Text Queries.
Lets make it work correctly at first.

I'm ok to prepare ticket for adding reduction for sorted responses to
GridCacheDistributedQueryFuture  or nearby.
Also theTextQuery response entity will be extended to carry Lucene's
'docScore' per record.
No open question has left then.

BR,
Yuriy Shuliha

чт, 28 лист. 2019 о 15:23 Ivan Pavlukhin <vo...@gmail.com> пише:

> Folks, Yuriy,
>
> I suppose that we are going to proceed with
>
> >>>
> Reducing on Ignite
>
> The obvious point of distributed response reduction is class
> GridCacheDistributedQueryFuture.
> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> ReduceIndexSorted
> What I see here, that it is tangled with H2 related classes
> (org.h2.result.Row) and might not be unified with TextQuery reduction.
> >>
>
> From my side there is no strict opinion that we should unify
> reduction. Having a separate reduction implementation for text queries
> sounds for me as not bad option as well.
>
> Are there still any open questions?
>
> ср, 27 нояб. 2019 г. в 02:27, Denis Magda <dm...@apache.org>:
> >
> > I don't see anything wrong if Yuriy is willing to carry on and keep
> > enhancing our full-text search support that lacks basic capabilities.
> >
> > The basics should be available. If anybody needs an advanced feature they
> > can introduce Solr or ElastiSearch into the final architecture of the
> app.
> >
> > Folks, who of us can help Yuriy with the questions asked? Most like the
> SQL
> > experts are the best candidates here.
> >
> >
> > -
> > Denis
> >
> >
> > On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <vo...@gmail.com>
> wrote:
> >
> > > Folks,
> > >
> > > IEP is an Ignite-specific thing. In fact, I suppose that we are
> > > already doing it in ASF way by having this dev-list discussion =)
> > >
> > > As for me, implementing "limit" feature for text queries is not so big
> > > to make an IEP. But we might need to create one for next features.
> > >
> > > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com>:
> > > >
> > > > Hello!
> > > >
> > > > ASF way should probably start with an IEP :)
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> > > <arzamas123@mail.ru.invalid
> > > > >:
> > > >
> > > > >
> > > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > > > functionality is helpful and PR it, why not ?
> > > > >
> > > > > isn`t it ?
> > > > >
> > > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > > > ilya.kasnacheev@gmail.com>:
> > > > > >
> > > > > >Hello!
> > > > > >
> > > > > >The problem here is that Solr is a multi-year effort by a lot of
> > > people.
> > > > > We
> > > > > >can't match that.
> > > > > >
> > > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding
> our
> > > > > cache
> > > > > >information into their storage for indexing and relying on their
> own
> > > > > >mechanisms for distributed IR sorting?
> > > > > >
> > > > > >Regards,
> > > > > >--
> > > > > >Ilya Kasnacheev
> > > > > >
> > > > > >
> > > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > > > arzamas123@mail.ru.invalid
> > > > > >>:
> > > > > >
> > > > > >>
> > > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite
> functionality ?
> > > > > >>
> > > > > >> thanks !
> > > > > >>
> > > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > > > >>  ilya.kasnacheev@gmail.com >:
> > > > > >> >
> > > > > >> >Hello!
> > > > > >> >
> > > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> > > Cloud)
> > > > > >> into
> > > > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > > > justified.
> > > > > >> >
> > > > > >> >I don't think we should try to implement sorting in Apache
> Ignite,
> > > > > because
> > > > > >> >it is a lot of work, and a lot of code in our code base which
> we
> > > don't
> > > > > >> >really want.
> > > > > >> >
> > > > > >> >Regards,
> > > > > >> >--
> > > > > >> >Ilya Kasnacheev
> > > > > >> >
> > > > > >> >
> > > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <
> shuliga@gmail.com
> > > >:
> > > > > >> >
> > > > > >> >> Dear Igniters,
> > > > > >> >>
> > > > > >> >> The first part of TextQuery improvement - a result limit -
> was
> > > > > developed
> > > > > >> >> and merged.
> > > > > >> >> Now we have to develop most important functionality here -
> proper
> > > > > >> sorting
> > > > > >> >> of Lucene index response and correct reducing of them for
> > > distributed
> > > > > >> >> queries.
> > > > > >> >>
> > > > > >> >> *There are two Lucene based aspects*
> > > > > >> >>
> > > > > >> >> 1. In case of using no sorting fields, the documents in
> response
> > > are
> > > > > >> still
> > > > > >> >> ordered by relevance.
> > > > > >> >> Actually this is ScoreDoc.score value.
> > > > > >> >> In order to reduce the distributed results correctly, the
> score
> > > > > should
> > > > > >> be
> > > > > >> >> passed with response.
> > > > > >> >>
> > > > > >> >> 2. When sorting by conventional fields, then Lucene should
> have
> > > these
> > > > > >> >> fields properly indexed and
> > > > > >> >> corresponding Sort object should be applied to Lucene's
> search
> > > call.
> > > > > >> >> In order to mark those fields a new annotation like
> '@SortField'
> > > may
> > > > > be
> > > > > >> >> introduced.
> > > > > >> >>
> > > > > >> >> *Reducing on Ignite *
> > > > > >> >>
> > > > > >> >> The obvious point of distributed response reduction is class
> > > > > >> >> GridCacheDistributedQueryFuture.
> > > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar
> > > functionality:
> > > > > >> >> ReduceIndexSorted
> > > > > >> >> What I see here, that it is tangled with H2 related classes (
> > > > > >> >> org.h2.result.Row) and might not be unified with TextQuery
> > > reduction.
> > > > > >> >>
> > > > > >> >> Still need a support here.
> > > > > >> >>
> > > > > >> >> Overall, the goal of this letter is to initiate discussion on
> > > > > TextQuery
> > > > > >> >> Sorting implementation and come closer to ticket creation.
> > > > > >> >>
> > > > > >> >> BR,
> > > > > >> >> Yuriy Shuliha
> > > > > >> >>
> > > > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > > > > andrey.mashenkov@gmail.com
> > > > > >> >
> > > > > >> >> пише:
> > > > > >> >>
> > > > > >> >> > Hi Dmitry, Yuriy.
> > > > > >> >> >
> > > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > > > > AtomicInteger
> > > > > >> >> > 'total' field and 'limit; field as primitive int.
> > > > > >> >> >
> > > > > >> >> > Both fields are used inside synchronized block only.
> > > > > >> >> > So, we can make both private and downgrade AtomicInteger to
> > > > > primitive
> > > > > >> >> int.
> > > > > >> >> >
> > > > > >> >> > Most likely, these fields can be replaced with one field.
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > > > > dpavlov@apache.org
> > > > > >> >
> > > > > >> >> > wrote:
> > > > > >> >> >
> > > > > >> >> > > Hi Andrey,
> > > > > >> >> > >
> > > > > >> >> > > I've checked this ticket comments, and there is a TC Bot
> visa
> > > > > (with
> > > > > >> no
> > > > > >> >> > > blockers).
> > > > > >> >> > >
> > > > > >> >> > > Do you have any concerns related to this patch?
> > > > > >> >> > >
> > > > > >> >> > > Sincerely,
> > > > > >> >> > > Dmitriy Pavlov
> > > > > >> >> > >
> > > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <
> > > shuliga@gmail.com
> > > > > >:
> > > > > >> >> > >
> > > > > >> >> > >> Andrey,
> > > > > >> >> > >>
> > > > > >> >> > >> Per you request, I created ticket
> > > > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291
> linked
> > > to
> > > > > >> >> > >>
> > > > > >>
> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > > > > >> >> > >>
> > > > > >> >> > >> Could you please proceed with PR merge ?
> > > > > >> >> > >>
> > > > > >> >> > >> BR,
> > > > > >> >> > >> Yuriy Shuliha
> > > > > >> >> > >>
> > > > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> > > > > >>  andrey.mashenkov@gmail.com
> > > > > >> >> >
> > > > > >> >> > >> пише:
> > > > > >> >> > >>
> > > > > >> >> > >> > Hi Yuri,
> > > > > >> >> > >> >
> > > > > >> >> > >> > To get access to TC Bot you should register as
> TeamCity
> > > user
> > > > > >> [1], if
> > > > > >> >> > you
> > > > > >> >> > >> > didn't do this already.
> > > > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot
> page
> > > with
> > > > > >> same
> > > > > >> >> > >> > credentials.
> > > > > >> >> > >> >
> > > > > >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> > > > > >> >> > >> >
> > > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> > > > > shuliga@gmail.com
> > > > > >> >
> > > > > >> >> > wrote:
> > > > > >> >> > >> >
> > > > > >> >> > >> >> Andrew,
> > > > > >> >> > >> >>
> > > > > >> >> > >> >> I have corrected PR according to your notes. Please
> > > review.
> > > > > >> >> > >> >> What will be the next steps in order to merge in?
> > > > > >> >> > >> >>
> > > > > >> >> > >> >> Y.
> > > > > >> >> > >> >>
> > > > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > > > > >> >> >  andrey.mashenkov@gmail.com >
> > > > > >> >> > >> >> пише:
> > > > > >> >> > >> >>
> > > > > >> >> > >> >> > Yuri,
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >> > I've done with review.
> > > > > >> >> > >> >> > No crime found, but trivial compatibility bug.
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> > > > > >>  shuliga@gmail.com >
> > > > > >> >> > >> wrote:
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >> > > Denis,
> > > > > >> >> > >> >> > >
> > > > > >> >> > >> >> > > Thank you for your attention to this.
> > > > > >> >> > >> >> > > as for now, the
> > > > > >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> > > > > >> >> > >> >> > ticket
> > > > > >> >> > >> >> > > is still pending review.
> > > > > >> >> > >> >> > > Do we have a chance to move it forward somehow?
> > > > > >> >> > >> >> > >
> > > > > >> >> > >> >> > > BR,
> > > > > >> >> > >> >> > > Yuriy Shuliha
> > > > > >> >> > >> >> > >
> > > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> > > > > dmagda@apache.org >
> > > > > >> пише:
> > > > > >> >> > >> >> > >
> > > > > >> >> > >> >> > > > Yuriy,
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > > I've seen you opening a pull-request with the
> first
> > > > > >> changes:
> > > > > >> >> > >> >> > > >
> > > https://issues.apache.org/jira/browse/IGNITE-12189
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right
> guys to
> > > do
> > > > > the
> > > > > >> >> > review?
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > > -
> > > > > >> >> > >> >> > > > Denis
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > > > > >> >> > >>  vololo100@gmail.com >
> > > > > >> >> > >> >> > > wrote:
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > > > > Yuriy,
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > Thank you for providing details! Quite
> > > interesting.
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > Yes, we already have support of distributed
> > > limit and
> > > > > >> >> merging
> > > > > >> >> > >> >> sorted
> > > > > >> >> > >> >> > > > > subresults for SQL queries. E.g.
> > > ReduceIndexSorted
> > > > > and
> > > > > >> >> > >> >> > > > > MergeStreamIterator are used for merging
> sorted
> > > > > streams.
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > Could you please also clarify about
> > > score/relevance?
> > > > > Is
> > > > > >> it
> > > > > >> >> > >> >> provided
> > > > > >> >> > >> >> > by
> > > > > >> >> > >> >> > > > > Lucene engine for each query result? I am
> > > thinking
> > > > > how
> > > > > >> to
> > > > > >> >> do
> > > > > >> >> > >> >> sorted
> > > > > >> >> > >> >> > > > > merge properly in this case.
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> > > > > >> >> >  shuliga@gmail.com
> > > > > >> >> > >> >:
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Ivan,
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Thank you for interesting question!
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Text searches (or full text searches) are
> > > mostly
> > > > > >> >> > >> human-oriented.
> > > > > >> >> > >> >> > And
> > > > > >> >> > >> >> > > > the
> > > > > >> >> > >> >> > > > > > point of user's interest is topmost part of
> > > > > response.
> > > > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the
> > > given
> > > > > >> records
> > > > > >> >> > for
> > > > > >> >> > >> >> > further
> > > > > >> >> > >> >> > > > > > purposes.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> > > > > operations
> > > > > >> >> with
> > > > > >> >> > >> >> > financial
> > > > > >> >> > >> >> > > > > data,
> > > > > >> >> > >> >> > > > > > and there lots of text stuff like assets
> names,
> > > > > fin.
> > > > > >> >> > >> >> instruments,
> > > > > >> >> > >> >> > > > > companies
> > > > > >> >> > >> >> > > > > > etc.
> > > > > >> >> > >> >> > > > > > In order to operate with this quickly and
> > > reliably,
> > > > > >> users
> > > > > >> >> > >> used
> > > > > >> >> > >> >> to
> > > > > >> >> > >> >> > > work
> > > > > >> >> > >> >> > > > > with
> > > > > >> >> > >> >> > > > > > text search, type-ahead completions,
> > > suggestions.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > For this purposes we are indexing
> particular
> > > string
> > > > > >> data
> > > > > >> >> in
> > > > > >> >> > >> >> > separate
> > > > > >> >> > >> >> > > > > caches.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Sorting capabilities and response size
> > > limitations
> > > > > are
> > > > > >> >> very
> > > > > >> >> > >> >> > important
> > > > > >> >> > >> >> > > > > > there. As our API have to provide most
> relevant
> > > > > >> >> information
> > > > > >> >> > >> in
> > > > > >> >> > >> >> view
> > > > > >> >> > >> >> > > of
> > > > > >> >> > >> >> > > > > > limited size.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene
> > > perspective.
> > > > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> > > > > >> >> > >> *TopDocs.scoresDocs
> > > > > >> >> > >> >> > > *already
> > > > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most
> relevant
> > > > > >> documents
> > > > > >> >> > >> are on
> > > > > >> >> > >> >> > the
> > > > > >> >> > >> >> > > > top.
> > > > > >> >> > >> >> > > > > > And currently distributed queries responses
> > > from
> > > > > >> >> different
> > > > > >> >> > >> nodes
> > > > > >> >> > >> >> > are
> > > > > >> >> > >> >> > > > > merged
> > > > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary
> way.
> > > > > >> >> > >> >> > > > > > So in fact we already have the score order
> > > ruined
> > > > > >> here.
> > > > > >> >> > Also
> > > > > >> >> > >> >> Ignite
> > > > > >> >> > >> >> > > > > > requests all possible documents from Lucene
> > > that is
> > > > > >> >> > redundant
> > > > > >> >> > >> >> and
> > > > > >> >> > >> >> > not
> > > > > >> >> > >> >> > > > > good
> > > > > >> >> > >> >> > > > > > for performance.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be
> part
> > > of
> > > > > >> >> *TextQuery
> > > > > >> >> > >> *and
> > > > > >> >> > >> >> > have
> > > > > >> >> > >> >> > > > to
> > > > > >> >> > >> >> > > > > > notice that we still have to add sorting
> for
> > > text
> > > > > >> queries
> > > > > >> >> > >> >> > processing
> > > > > >> >> > >> >> > > in
> > > > > >> >> > >> >> > > > > > order to have applicable results.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the
> > > part of
> > > > > >> >> issues
> > > > > >> >> > >> from
> > > > > >> >> > >> >> > > above,
> > > > > >> >> > >> >> > > > > but
> > > > > >> >> > >> >> > > > > > definitely, sorting by document score at
> least
> > > > > should
> > > > > >> be
> > > > > >> >> > >> >> > implemented
> > > > > >> >> > >> >> > > > > along
> > > > > >> >> > >> >> > > > > > with limit.
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > This is a pretty short commentary if you
> still
> > > have
> > > > > >> any
> > > > > >> >> > >> >> questions,
> > > > > >> >> > >> >> > > > please
> > > > > >> >> > >> >> > > > > > ask, do not hesitate)
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > BR,
> > > > > >> >> > >> >> > > > > > Yuriy Shuliha
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> > > > > >> >> >  vololo100@gmail.com >
> > > > > >> >> > >> >> пише:
> > > > > >> >> > >> >> > > > > >
> > > > > >> >> > >> >> > > > > > > Yuriy,
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > > Greatly appreciate your interest.
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit
> about
> > > > > >> sorting?
> > > > > >> >> > What
> > > > > >> >> > >> >> tasks
> > > > > >> >> > >> >> > > > does
> > > > > >> >> > >> >> > > > > > > it help to solve and how? It would be
> great
> > > to
> > > > > >> provide
> > > > > >> >> an
> > > > > >> >> > >> >> > example.
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei
> > > Scherbakov <
> > > > > >> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > Denis,
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > I like the idea of throwing an
> exception
> > > for
> > > > > >> enabled
> > > > > >> >> > text
> > > > > >> >> > >> >> > queries
> > > > > >> >> > >> >> > > > on
> > > > > >> >> > >> >> > > > > > > > persistent caches.
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for
> > > unsorted
> > > > > >> >> > searches.
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket
> creation.
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis
> Magda <
> > > > > >> >> > >>  dmagda@apache.org
> > > > > >> >> > >> >> >:
> > > > > >> >> > >> >> > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > Igniters,
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's
> proposal
> > > in
> > > > > >> regards
> > > > > >> >> > >> >> full-text
> > > > > >> >> > >> >> > > > > search
> > > > > >> >> > >> >> > > > > > > API
> > > > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to
> > > push it
> > > > > >> >> > forward.
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it
> makes
> > > > > total
> > > > > >> >> sense
> > > > > >> >> > >> for
> > > > > >> >> > >> >> > > > in-memory
> > > > > >> >> > >> >> > > > > data
> > > > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches
> data
> > > of
> > > > > an
> > > > > >> >> > >> underlying
> > > > > >> >> > >> >> DB
> > > > > >> >> > >> >> > > like
> > > > > >> >> > >> >> > > > > > > Postgres.
> > > > > >> >> > >> >> > > > > > > > > As part of the changes, I would
> simply
> > > throw
> > > > > an
> > > > > >> >> > >> exception
> > > > > >> >> > >> >> (by
> > > > > >> >> > >> >> > > > > default)
> > > > > >> >> > >> >> > > > > > > if
> > > > > >> >> > >> >> > > > > > > > > the one attempts to use text indices
> > > with the
> > > > > >> >> native
> > > > > >> >> > >> >> > > persistence
> > > > > >> >> > >> >> > > > > > > enabled.
> > > > > >> >> > >> >> > > > > > > > > If the person is ready to live with
> that
> > > > > >> limitation
> > > > > >> >> > >> that
> > > > > >> >> > >> >> an
> > > > > >> >> > >> >> > > > > explicit
> > > > > >> >> > >> >> > > > > > > > > configuration change is needed to
> come
> > > around
> > > > > >> the
> > > > > >> >> > >> >> exception.
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > Thoughts?
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > -
> > > > > >> >> > >> >> > > > > > > > > Denis
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> > > > > Shuliga <
> > > > > >> >> > >> >> > >  shuliga@gmail.com
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > > > wrote:
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > Hello to all again,
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > Thank you for important comments
> and
> > > notes
> > > > > >> given
> > > > > >> >> > >> below!
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the
> > > discussion.
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene
> indexing
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > > > >> >> > >> where
> > > > > >> >> > >> >> > > > > > > > > > absence of index persistence was
> > > declared
> > > > > as
> > > > > >> an
> > > > > >> >> > >> >> obstacle to
> > > > > >> >> > >> >> > > > > further
> > > > > >> >> > >> >> > > > > > > > > > development.
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed
> as not
> > > > > >> valid.b)
> > > > > >> >> > >> There
> > > > > >> >> > >> >> are
> > > > > >> >> > >> >> > > > > definite
> > > > > >> >> > >> >> > > > > > > needs
> > > > > >> >> > >> >> > > > > > > > > > (and in our project as well) in
> just
> > > > > in-memory
> > > > > >> >> > >> indexing
> > > > > >> >> > >> >> of
> > > > > >> >> > >> >> > > > > selected
> > > > > >> >> > >> >> > > > > > > data.
> > > > > >> >> > >> >> > > > > > > > > > We intend to use search
> capabilities
> > > for
> > > > > >> fetching
> > > > > >> >> > >> >> limited
> > > > > >> >> > >> >> > > > amount
> > > > > >> >> > >> >> > > > > of
> > > > > >> >> > >> >> > > > > > > > > records
> > > > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead
> > > search /
> > > > > >> >> > >> suggestions.
> > > > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed
> > > and the
> > > > > >> are
> > > > > >> >> no
> > > > > >> >> > >> need
> > > > > >> >> > >> >> in
> > > > > >> >> > >> >> > > > Lucene
> > > > > >> >> > >> >> > > > > > > index
> > > > > >> >> > >> >> > > > > > > > > to
> > > > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> > > > > pattern of
> > > > > >> >> > >> >> text-search
> > > > > >> >> > >> >> > > > usage.
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> > > > > >> implementation.
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> > > > > *(*offset*
> > > > > >> >> > seems
> > > > > >> >> > >> to
> > > > > >> >> > >> >> be
> > > > > >> >> > >> >> > > not
> > > > > >> >> > >> >> > > > > > > required
> > > > > >> >> > >> >> > > > > > > > > in
> > > > > >> >> > >> >> > > > > > > > > > text-search tasks for now)
> > > > > >> >> > >> >> > > > > > > > > > I have investigated the data flow
> for
> > > > > >> distributed
> > > > > >> >> > >> text
> > > > > >> >> > >> >> > > queries.
> > > > > >> >> > >> >> > > > > it
> > > > > >> >> > >> >> > > > > > > was
> > > > > >> >> > >> >> > > > > > > > > > simple test prefix query, like
> > > > > 'name'*='ene*'*
> > > > > >> >> > >> >> > > > > > > > > > For now each server-node returns
> all
> > > > > response
> > > > > >> >> > >> records to
> > > > > >> >> > >> >> > the
> > > > > >> >> > >> >> > > > > > > client-node
> > > > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands,
> ~hundred
> > > > > >> thousands
> > > > > >> >> > >> >> records.
> > > > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100.
> > > Again,
> > > > > all
> > > > > >> >> the
> > > > > >> >> > >> >> results
> > > > > >> >> > >> >> > > are
> > > > > >> >> > >> >> > > > > added
> > > > > >> >> > >> >> > > > > > > to
> > > > > >> >> > >> >> > > > > > > > > > queue in
> GridCacheQueryFutureAdapter in
> > > > > >> arbitrary
> > > > > >> >> > >> order
> > > > > >> >> > >> >> by
> > > > > >> >> > >> >> > > > pages.
> > > > > >> >> > >> >> > > > > > > > > > I did not find here any means to
> > > deliver
> > > > > >> >> > >> deterministic
> > > > > >> >> > >> >> > > result.
> > > > > >> >> > >> >> > > > > > > > > > So implementing limit as part of
> query
> > > and
> > > > > >> >> > >> >> > > > > (GridCacheQueryRequest)
> > > > > >> >> > >> >> > > > > > > will
> > > > > >> >> > >> >> > > > > > > > > not
> > > > > >> >> > >> >> > > > > > > > > > change the nature of response but
> will
> > > > > limit
> > > > > >> load
> > > > > >> >> > on
> > > > > >> >> > >> >> nodes
> > > > > >> >> > >> >> > > and
> > > > > >> >> > >> >> > > > > > > > > networking.
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket
> for
> > > this?
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene
> API
> > > > > >> exposition
> > > > > >> >> to
> > > > > >> >> > >> >> Ignite
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > a) Sorting
> > > > > >> >> > >> >> > > > > > > > > > The solution for this could be:
> > > > > >> >> > >> >> > > > > > > > > > - Make entities comparable
> > > > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> > > > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted
> > > fields for
> > > > > >> >> Lucene
> > > > > >> >> > >> >> indexing
> > > > > >> >> > >> >> > > > > > > > > > - Use comparators when merging
> > > responses or
> > > > > >> >> > reducing
> > > > > >> >> > >> to
> > > > > >> >> > >> >> > > desired
> > > > > >> >> > >> >> > > > > > > limit on
> > > > > >> >> > >> >> > > > > > > > > > client node.
> > > > > >> >> > >> >> > > > > > > > > > Will require full result set to be
> > > loaded
> > > > > into
> > > > > >> >> > >> memory.
> > > > > >> >> > >> >> > Though
> > > > > >> >> > >> >> > > > > can be
> > > > > >> >> > >> >> > > > > > > used
> > > > > >> >> > >> >> > > > > > > > > > for relatively small limits.
> > > > > >> >> > >> >> > > > > > > > > > BR,
> > > > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> > > > > Scherbakov <
> > > > > >> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> > > > > >> >> > >> >> > > > > > > > > > пише:
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > Yuriy,
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers
> for
> > > text
> > > > > >> >> queries
> > > > > >> >> > is
> > > > > >> >> > >> >> [1]
> > > > > >> >> > >> >> > > which
> > > > > >> >> > >> >> > > > > makes
> > > > > >> >> > >> >> > > > > > > > > > lucene
> > > > > >> >> > >> >> > > > > > > > > > > indexes unusable with
> persistence and
> > > > > main
> > > > > >> >> reason
> > > > > >> >> > >> for
> > > > > >> >> > >> >> > > > > > > discontinuation.
> > > > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed
> > > first
> > > > > to
> > > > > >> make
> > > > > >> >> > >> text
> > > > > >> >> > >> >> > > queries
> > > > > >> >> > >> >> > > > a
> > > > > >> >> > >> >> > > > > > > valid
> > > > > >> >> > >> >> > > > > > > > > > > product feature.
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> > > > > querying is
> > > > > >> >> > indeed
> > > > > >> >> > >> >> not a
> > > > > >> >> > >> >> > > > > trivial
> > > > > >> >> > >> >> > > > > > > task.
> > > > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be
> > > implemented
> > > > > on
> > > > > >> >> query
> > > > > >> >> > >> >> > > originating
> > > > > >> >> > >> >> > > > > node.
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > [1]
> > > > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38,
> Denis
> > > Magda
> > > > > <
> > > > > >> >> > >> >> > >  dmagda@apache.org
> > > > > >> >> > >> >> > > > >:
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > Yuriy,
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over
> the
> > > > > >> full-text
> > > > > >> >> > >> search
> > > > > >> >> > >> >> > > indexes
> > > > > >> >> > >> >> > > > > then
> > > > > >> >> > >> >> > > > > > > > > please
> > > > > >> >> > >> >> > > > > > > > > > go
> > > > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why
> the
> > > > > >> community
> > > > > >> >> > >> wants to
> > > > > >> >> > >> >> > > > > discontinue
> > > > > >> >> > >> >> > > > > > > them
> > > > > >> >> > >> >> > > > > > > > > > > first
> > > > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect
> later)
> > > are
> > > > > the
> > > > > >> >> > >> limitations
> > > > > >> >> > >> >> > > listed
> > > > > >> >> > >> >> > > > > by
> > > > > >> >> > >> >> > > > > > > Andrey
> > > > > >> >> > >> >> > > > > > > > > > and
> > > > > >> >> > >> >> > > > > > > > > > > > minimal support from the
> community
> > > end.
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > -
> > > > > >> >> > >> >> > > > > > > > > > > > Denis
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM
> > > Andrey
> > > > > >> >> > Mashenkov
> > > > > >> >> > >> <
> > > > > >> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> > > > > >> >> > >> >> > > > > > > > > > > > wrote:
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a
> plan
> > > to
> > > > > >> >> > discontinue
> > > > > >> >> > >> >> > > > TextQueries
> > > > > >> >> > >> >> > > > > in
> > > > > >> >> > >> >> > > > > > > > > Ignite
> > > > > >> >> > >> >> > > > > > > > > > > [1].
> > > > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text
> indexes
> > > are
> > > > > not
> > > > > >> >> > >> >> persistent,
> > > > > >> >> > >> >> > not
> > > > > >> >> > >> >> > > > > > > > > transactional
> > > > > >> >> > >> >> > > > > > > > > > > and
> > > > > >> >> > >> >> > > > > > > > > > > > > can't be user together with
> SQL
> > > or
> > > > > >> inside
> > > > > >> >> > SQL.
> > > > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of
> interest
> > > from
> > > > > >> >> > community
> > > > > >> >> > >> >> side.
> > > > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on
> these
> > > > > issues
> > > > > >> and
> > > > > >> >> > >> make
> > > > > >> >> > >> >> > > > > TextQueries
> > > > > >> >> > >> >> > > > > > > great.
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to
> > > limit
> > > > > >> >> > resultset.
> > > > > >> >> > >> >> > > > > > > > > > > > > Query results return from
> data
> > > node
> > > > > to
> > > > > >> >> > >> client-side
> > > > > >> >> > >> >> > > cursor
> > > > > >> >> > >> >> > > > > in
> > > > > >> >> > >> >> > > > > > > > > > > page-by-page
> > > > > >> >> > >> >> > > > > > > > > > > > > manner and
> > > > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed
> > > control
> > > > > page
> > > > > >> >> size.
> > > > > >> >> > >> It
> > > > > >> >> > >> >> is
> > > > > >> >> > >> >> > > > > supposed
> > > > > >> >> > >> >> > > > > > > query
> > > > > >> >> > >> >> > > > > > > > > > > > executes
> > > > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> > > > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full
> > > resultset be
> > > > > >> loaded
> > > > > >> >> > to
> > > > > >> >> > >> >> memory
> > > > > >> >> > >> >> > > on
> > > > > >> >> > >> >> > > > > server
> > > > > >> >> > >> >> > > > > > > > > side
> > > > > >> >> > >> >> > > > > > > > > > at
> > > > > >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> > > > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene
> load
> > > > > entire
> > > > > >> >> > >> resultset
> > > > > >> >> > >> >> > into
> > > > > >> >> > >> >> > > > > memory
> > > > > >> >> > >> >> > > > > > > > > before
> > > > > >> >> > >> >> > > > > > > > > > > > first
> > > > > >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter
> should
> > > be
> > > > > >> added
> > > > > >> >> to
> > > > > >> >> > >> limit
> > > > > >> >> > >> >> > > > result.
> > > > > >> >> > >> >> > > > > The
> > > > > >> >> > >> >> > > > > > > best
> > > > > >> >> > >> >> > > > > > > > > > > > > solution is to use query
> language
> > > > > >> commands
> > > > > >> >> > for
> > > > > >> >> > >> >> this,
> > > > > >> >> > >> >> > > e.g.
> > > > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> > > > > >> >> > >> >> > > > > > > > > > > > in
> > > > > >> >> > >> >> > > > > > > > > > > > > SQL.
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look
> trivial.
> > > > > Query is
> > > > > >> >> > >> >> distributed
> > > > > >> >> > >> >> > > > > operation
> > > > > >> >> > >> >> > > > > > > and
> > > > > >> >> > >> >> > > > > > > > > > same
> > > > > >> >> > >> >> > > > > > > > > > > > > user query will be executed
> on
> > > data
> > > > > >> nodes
> > > > > >> >> > >> >> > > > > > > > > > > > > and then results from all
> nodes
> > > > > should
> > > > > >> be
> > > > > >> >> > >> correcly
> > > > > >> >> > >> >> > > merged
> > > > > >> >> > >> >> > > > > > > before
> > > > > >> >> > >> >> > > > > > > > > > being
> > > > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> > > > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied
> on
> > > every
> > > > > >> node
> > > > > >> >> and
> > > > > >> >> > >> >> then on
> > > > > >> >> > >> >> > > > merge
> > > > > >> >> > >> >> > > > > > > phase.
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be
> non-obviuos,
> > > > > limiting
> > > > > >> >> > results
> > > > > >> >> > >> >> make
> > > > > >> >> > >> >> > no
> > > > > >> >> > >> >> > > > > sence
> > > > > >> >> > >> >> > > > > > > > > without
> > > > > >> >> > >> >> > > > > > > > > > > > > sorting,
> > > > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee
> every
> > > next
> > > > > >> query
> > > > > >> >> run
> > > > > >> >> > >> will
> > > > > >> >> > >> >> > > return
> > > > > >> >> > >> >> > > > > same
> > > > > >> >> > >> >> > > > > > > data
> > > > > >> >> > >> >> > > > > > > > > > > > because
> > > > > >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> > > > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase
> receive
> > > > > results
> > > > > >> from
> > > > > >> >> > >> data
> > > > > >> >> > >> >> > nodes
> > > > > >> >> > >> >> > > > > > > > > asynchronously
> > > > > >> >> > >> >> > > > > > > > > > > and
> > > > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes
> > > can't
> > > > > be
> > > > > >> >> > ordered.
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > 2.
> > > > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> > > > > >> >> > @QueryTextFiled)
> > > > > >> >> > >> >> looks
> > > > > >> >> > >> >> > > more
> > > > > >> >> > >> >> > > > > > > verbose,
> > > > > >> >> > >> >> > > > > > > > > > > isn't
> > > > > >> >> > >> >> > > > > > > > > > > > > it.
> > > > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed
> > > query?
> > > > > How
> > > > > >> >> > partial
> > > > > >> >> > >> >> > results
> > > > > >> >> > >> >> > > > from
> > > > > >> >> > >> >> > > > > > > nodes
> > > > > >> >> > >> >> > > > > > > > > > will
> > > > > >> >> > >> >> > > > > > > > > > > be
> > > > > >> >> > >> >> > > > > > > > > > > > > merged?
> > > > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to
> configure
> > > > > >> comparator
> > > > > >> >> > for
> > > > > >> >> > >> >> data
> > > > > >> >> > >> >> > > > > sorting?
> > > > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should
> > > choose
> > > > > to
> > > > > >> >> sort
> > > > > >> >> > >> >> result
> > > > > >> >> > >> >> > on
> > > > > >> >> > >> >> > > > > merge
> > > > > >> >> > >> >> > > > > > > phase?
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is
> not
> > > > > >> >> configurable
> > > > > >> >> > at
> > > > > >> >> > >> >> all.
> > > > > >> >> > >> >> > > E.g.
> > > > > >> >> > >> >> > > > > it is
> > > > > >> >> > >> >> > > > > > > > > > > > impossible
> > > > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> > > > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible
> ways to
> > > > > >> configure
> > > > > >> >> > >> engine
> > > > > >> >> > >> >> at
> > > > > >> >> > >> >> > > > first
> > > > > >> >> > >> >> > > > > and
> > > > > >> >> > >> >> > > > > > > only
> > > > > >> >> > >> >> > > > > > > > > > > then
> > > > > >> >> > >> >> > > > > > > > > > > > go
> > > > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement
> > > complex
> > > > > >> >> > features,
> > > > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine
> > > config.
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17
> PM
> > > Yuriy
> > > > > >> >> > Shuliga <
> > > > > >> >> > >> >> > > > > > >  shuliga@gmail.com >
> > > > > >> >> > >> >> > > > > > > > > > > wrote:
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > Dear community,
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd
> > > like to
> > > > > >> open
> > > > > >> >> > >> >> discussion
> > > > > >> >> > >> >> > > that
> > > > > >> >> > >> >> > > > > would
> > > > > >> >> > >> >> > > > > > > > > come
> > > > > >> >> > >> >> > > > > > > > > > to
> > > > > >> >> > >> >> > > > > > > > > > > > > > contribution results in
> subj.
> > > area.
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing
> > > capabilities,
> > > > > >> backed
> > > > > >> >> up
> > > > > >> >> > >> by
> > > > > >> >> > >> >> > > > different
> > > > > >> >> > >> >> > > > > > > > > > mechanisms,
> > > > > >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is
> used
> > > > > (past
> > > > > >> >> year
> > > > > >> >> > >> >> > release).
> > > > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and
> > > mature
> > > > > >> >> technology
> > > > > >> >> > >> that
> > > > > >> >> > >> >> > > covers
> > > > > >> >> > >> >> > > > > text
> > > > > >> >> > >> >> > > > > > > > > search
> > > > > >> >> > >> >> > > > > > > > > > > > area
> > > > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial
> data
> > > > > >> indexing).
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more
> > > Lucene
> > > > > >> >> > >> functionality
> > > > > >> >> > >> >> to
> > > > > >> >> > >> >> > > > Ignite
> > > > > >> >> > >> >> > > > > > > > > indexing
> > > > > >> >> > >> >> > > > > > > > > > > and
> > > > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text
> > > data*.
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request
> at
> > > > > current
> > > > > >> >> stage.
> > > > > >> >> > >> It
> > > > > >> >> > >> >> is
> > > > > >> >> > >> >> > > > coming
> > > > > >> >> > >> >> > > > > > > from our
> > > > > >> >> > >> >> > > > > > > > > > > > > project's
> > > > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will
> be
> > > > > useful
> > > > > >> for
> > > > > >> >> a
> > > > > >> >> > >> lot
> > > > > >> >> > >> >> more
> > > > > >> >> > >> >> > > > > people.
> > > > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and
> vote or
> > > > > discuss
> > > > > >> >> > about
> > > > > >> >> > >> >> Jira
> > > > > >> >> > >> >> > > > > tickets for
> > > > > >> >> > >> >> > > > > > > > > them.
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> > > > > >> dataQuery.getPageSize()
> > > > > >> >> > to
> > > > > >> >> > >> >> limit
> > > > > >> >> > >> >> > > > search
> > > > > >> >> > >> >> > > > > > > > > response
> > > > > >> >> > >> >> > > > > > > > > > > > items
> > > > > >> >> > >> >> > > > > > > > > > > > > > inside
> GridLuceneIndex.query().
> > > > > >> Currently
> > > > > >> >> > it
> > > > > >> >> > >> is
> > > > > >> >> > >> >> > > calling
> > > > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> > > > > >> >> > >> >> *Integer.MAX_VALUE*) -
> > > > > >> >> > >> >> > so
> > > > > >> >> > >> >> > > > > > > basically
> > > > > >> >> > >> >> > > > > > > > > all
> > > > > >> >> > >> >> > > > > > > > > > > > > scored
> > > > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned,
> what
> > > we
> > > > > do
> > > > > >> not
> > > > > >> >> > >> need in
> > > > > >> >> > >> >> > most
> > > > > >> >> > >> >> > > > > cases.
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting.
> Then
> > > more
> > > > > >> >> capable
> > > > > >> >> > >> >> search
> > > > > >> >> > >> >> > > call
> > > > > >> >> > >> >> > > > > can be
> > > > > >> >> > >> >> > > > > > > > > > > > > > executed:
> > > > > *IndexSearcher.search(query,
> > > > > >> >> > count,
> > > > > >> >> > >> >> > > > > > > > > > > > > > sort) *
> > > > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> > > > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean
> > > *sortField*
> > > > > >> >> parameter
> > > > > >> >> > in
> > > > > >> >> > >> >> > > > > > > *@QueryTextFiled *
> > > > > >> >> > >> >> > > > > > > > > > > > > > annotation. If
> > > > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be
> > > indexed
> > > > > but
> > > > > >> not
> > > > > >> >> > >> >> tokenized.
> > > > > >> >> > >> >> > > > > Number
> > > > > >> >> > >> >> > > > > > > types
> > > > > >> >> > >> >> > > > > > > > > > are
> > > > > >> >> > >> >> > > > > > > > > > > > > > preferred here.
> > > > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> > > > > >> *TextQuery*
> > > > > >> >> > >> >> > constructor.
> > > > > >> >> > >> >> > > It
> > > > > >> >> > >> >> > > > > > > should
> > > > > >> >> > >> >> > > > > > > > > > define
> > > > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used
> for
> > > > > querying.
> > > > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort
> usage
> > > in
> > > > > >> >> > >> >> > > > > GridLuceneIndex.query().
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex
> > > queries
> > > > > >> with
> > > > > >> >> > >> >> > *TextQuery*,
> > > > > >> >> > >> >> > > > > > > including
> > > > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> > > > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting
> only,
> > > as
> > > > > >> >> requires
> > > > > >> >> > >> more
> > > > > >> >> > >> >> > > > detailed
> > > > > >> >> > >> >> > > > > > > work.
> > > > > >> >> > >> >> > > > > > > > > > Should
> > > > > >> >> > >> >> > > > > > > > > > > > be
> > > > > >> >> > >> >> > > > > > > > > > > > > > extended if community is
> > > > > interested in
> > > > > >> >> it.*
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your
> > > comments!
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > > BR,
> > > > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> > > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > > > --
> > > > > >> >> > >> >> > > > > > > > > > > > > Best regards,
> > > > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> > > > > >> >> > >> >> > > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > --
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > > > Best regards,
> > > > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> > > > > >> >> > >> >> > > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > > >
> > > > > >> >> > >> >> > > > > > > > >
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > > > > --
> > > > > >> >> > >> >> > > > > > > Best regards,
> > > > > >> >> > >> >> > > > > > > Ivan Pavlukhin
> > > > > >> >> > >> >> > > > > > >
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > > > --
> > > > > >> >> > >> >> > > > > Best regards,
> > > > > >> >> > >> >> > > > > Ivan Pavlukhin
> > > > > >> >> > >> >> > > > >
> > > > > >> >> > >> >> > > >
> > > > > >> >> > >> >> > >
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >> > --
> > > > > >> >> > >> >> > Best regards,
> > > > > >> >> > >> >> > Andrey V. Mashenkov
> > > > > >> >> > >> >> >
> > > > > >> >> > >> >>
> > > > > >> >> > >> >
> > > > > >> >> > >> >
> > > > > >> >> > >> > --
> > > > > >> >> > >> > Best regards,
> > > > > >> >> > >> > Andrey V. Mashenkov
> > > > > >> >> > >> >
> > > > > >> >> > >>
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >> > --
> > > > > >> >> > Best regards,
> > > > > >> >> > Andrey V. Mashenkov
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Yuriy,

> Let me summarize the approaches:
I agree with your reasoning, p.2 sounds the best one to me as well.

Will look into merge-sort strategy some time later.

Best regards,
Ivan Pavlukhin

пт, 13 мар. 2020 г. в 19:23, Yuriy Shuliga <sh...@gmail.com>:
>
> Ivan,
>
> I have made changes in the fork that reflects merge-sort strategy and now
> query future iterator unblocks as soon all first pages are delivered from
> nodes; then it waits for the next pages portions and so on.
> https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd
>
> Please validate the design if you wish.
>
> Regarding ranking field in the entity.
>
> Entities for text queries in search domain are usually treated as
> documents with some metadata.
> This can be an id, issued/expired date, and document score returned for
> given query.
> It is common to include such fields in entity design.
>
> Answer to your question about omitting QueryRankField:
> - Then the response records just will come in arbitrary order. This
> should not fail TextQuery execution.
>
> Another point about rank value among different indices.
> - ranks are to be used for comparison between entities in praticular query
> response, they are not intended to be absolute over the system.
>
> Let me summarize the approaches:
> 1. Subclassing from Ranked.class.
>  pros: the simplest and ignite-natural approach
> cons: implicit nature, limits entity inheritance
>
> 2. Explicitly Introducing dedicated field  annotated  @QueryRankField
> pros:  ignite-natural approach, easy to introduce, explicitly controlled by
> developer
> cons: adds extra metadata to entity
>
> 3. Wrapping entity response with rank data, used for merge sort, not
> exposing it to client.
> pros: leaves entity design clean
> cons: rank is not available for client, development will require complex
> change in query execution / entity marshaling mechanisms
>
> I'd stay on p.2 as most balanced solution of these.
> What do you think?
>
> BR,
> Yuriy Shuliha
>
>
>
>
> ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin <vo...@gmail.com> пише:
>
> > Igniters,
> >
> > Not intentionally the discussion continued outside of dev list. I am
> > returning it back. You can find it below. Do not hesitate to join if you
> > have some thoughts on raised questions. May be you have ideas how to enrich
> > text query results with score/rank information.
> >
> > вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <sh...@gmail.com>:
> >
> > > Yes, please do.
> > >
> > > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vo...@gmail.com>
> > > пише:
> > >
> > >> Yuriy,
> > >>
> > >> I noticed that from some point our discussion moved out of Ignite dev
> > >> list. Would you mind if I return it back to dev list?
> > >>
> > >> Best regards,
> > >> Ivan Pavlukhin
> > >>
> > >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vo...@gmail.com>:
> > >> >
> > >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> > >> What will be the next version/date we can aim on with this update?
> > >> >
> > >> > Yes, 2.8 is already available and the community is working on
> > >> finalizing activities (e.g. publishing documentation). I do not have any
> > >> reliable expectations about next releases. I suppose that there could
> > be a
> > >> couple of maintenance releases like 2.8.1 as several problems were
> > already
> > >> discovered. I do not know whether next more significant release is
> > going to
> > >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> > >> because there are already several "almost ready" features in master. In
> > my
> > >> mind it is a good idea to start a discussion about next releases on dev
> > >> list.
> > >> >
> > >> > Best regards,
> > >> > Ivan Pavlukhin
> > >> >
> > >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vo...@gmail.com>:
> > >> > >
> > >> > > Hi Yuriy,
> > >> > >
> > >> > > Sorry for a late response.
> > >> > >
> > >> > > > Suitable solution without subclassing might be:
> > >> > > > 1. Explicitly add float field to entity
> > >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> > >> to initiating node
> > >> > > > 4. Possibly still need to proxify entity with adding Comparable
> > >> interface.
> > >> > > > 5. Perform merge sort on initiating node
> > >> > >
> > >> > > Possibly I missed it but one moment is not clear for me. What will
> > >> > > happen if an entity class does not have a field annotated with
> > >> > > QueryRankField?
> > >> > >
> > >> > > And I am still not sure that it is a proper (enough) approach. The
> > >> > > thing which bothers me is a transient and dynamic nature of "rank"
> > >> > > field. It does belong to entity, it can have different values for
> > the
> > >> > > same entity (e.g. different indices are used).
> > >> > >
> > >> > > I would like to experiment with a code a little bit. But most
> > likely I
> > >> > > will have a chance only at the end of this week.
> > >> > >
> > >> > > Best regards,
> > >> > > Ivan Pavlukhin
> > >> > >
> > >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <sh...@gmail.com>:
> > >> > > >
> > >> > > > Hi Ivan,
> > >> > > >
> > >> > > > Have concerns about entity annotation variant.
> > >> > > > Wrapping into dynamic proxy for passing back, will be quite a
> > >> complex thing that requires changes in IgniteCacheObjectProcessor
> > >> > > > and entity marshaling.
> > >> > > >
> > >> > > > Suitable solution without subclassing might be:
> > >> > > > 1. Explicitly add float field to entity
> > >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> > >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> > >> to initiating node
> > >> > > > 4. Possibly still need to proxify entity with adding Comparable
> > >> interface.
> > >> > > > 5. Perform merge sort on initiating node
> > >> > > >
> > >> > > > Would you consider this approach or return back to using Ranked
> > >> superclass?
> > >> > > >
> > >> > > > Regarding your proposal to implement megre sort - definitely yes.
> > >> > > > I will implement this.
> > >> > > > Sorry, didn't understand you earlier )
> > >> > > >
> > >> > > > BR,
> > >> > > > Yuriy Shuliha
> > >> > > >
> > >> > > > PS As far as i see, the are no chance to get on 2.8 release train.
> > >> What will be the next version/date we can aim on with this update?
> > >> > > >
> > >> > > >
> > >> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vo...@gmail.com>
> > пише:
> > >> > > >>
> > >> > > >> Hi Yuriy,
> > >> > > >>
> > >> > > >> Sorry for a late response and thank you for your comments.
> > >> > > >>
> > >> > > >> Approach with @Ranked annotation looks cleaner to me from API
> > >> point of view.
> > >> > > >>
> > >> > > >> Regarding merging responses from multiple nodes I suppose that
> > good
> > >> > > >> enough solution is possible:
> > >> > > >> 1. Request one page of entries from each node.
> > >> > > >> 2. Return one page to a user (as there is definitely a page of
> > the
> > >> > > >> best results already).
> > >> > > >> 3. Request next result pages from nodes corresponding to pages we
> > >> > > >> exposed to the user (actually nodes having lesser than 1 page of
> > >> > > >> pending results). Repeat from step 2.
> > >> > > >>
> > >> > > >> Some kind of sort merge plus backpressure. Backpressure part
> > might
> > >> be
> > >> > > >> left as an improvement.
> > >> > > >>
> > >> > > >> What do you think?
> > >> > > >>
> > >> > > >> Best regards,
> > >> > > >> Ivan Pavlukhin
> > >> > > >>
> > >> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <sh...@gmail.com>:
> > >> > > >>
> > >> > > >> >
> > >> > > >> > Hi Ivan,
> > >> > > >> >
> > >> > > >> > Thank you for keeping eye on the topic!
> > >> > > >> >
> > >> > > >> >  Here're the answers to your questions:
> > >> > > >> > 1. TextQuery response is always ordered by documentScore, and
> > >> this number are also frequently used when processing the results.
> > >> > > >> > We have analyzed current entity flow indeed the hood of query
> > >> processing and found out that the most clean approach to get response
> > with
> > >> ordered entities is to extent the entity itself.
> > >> > > >> > The only drawback will be the necessity to extend from Ranked
> > in
> > >> our case. And as it is very common to utilize documentScore (rank) when
> > >> working with TextQuery.
> > >> > > >> > Another  approach i see, is to play with reflection to create
> > >> proxy with Ranked interface. In this case we still will need to mark our
> > >> intentions to have ordered response and add some @Ranked annotation e.g.
> > >> > > >> > Please, advice what would fit Ignite better.
> > >> > > >> >
> > >> > > >> > 2. Yes, you are right. Using PriorityQueue  may lead to
> > unwanted
> > >> memory consumption.
> > >> > > >> > In order to get correct response we still need to retrieve data
> > >> from all of the nodes, as ant of them may contain value that may fall
> > into
> > >> limited range (this is because of float ranking score)
> > >> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has
> > >> maximum size limitation. Technically it will be equivalent to the sorted
> > >> responses merging, as each element will require comparison upon all
> > queue.
> > >> > > >> >
> > >> > > >> > BR,
> > >> > > >> > Yuriy Shuliha
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vo...@gmail.com>
> > >> пише:
> > >> > > >> >>
> > >> > > >> >> Hi Yuriy,
> > >> > > >> >>
> > >> > > >> >> Sorry for a delay. I went through the proposed solution and I
> > >> have
> > >> > > >> >> some questions. Currently I am a little bit far from a context
> > >> of TEXT
> > >> > > >> >> queries, so correct me or redirect to some previous discussion
> > >> if I
> > >> > > >> >> got something wrong:
> > >> > > >> >> 1. What is a justification for using inheritance from Ranked
> > in
> > >> order
> > >> > > >> >> to keep order? Why cannot we mix in rank/score into entries
> > >> > > >> >> transferred inside GridCacheQueryResponse?
> > >> > > >> >> 2. Collecting all entries in PriorityQueue can lead to
> > >> unnecessary
> > >> > > >> >> heap memory consumption. I think that merging several sorted
> > >> runs
> > >> > > >> >> (responses from different nodes) will be a better option.
> > >> > > >> >>
> > >> > > >> >> Best regards,
> > >> > > >> >> Ivan Pavlukhin
> > >> > > >> >>
> > >> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga <
> > shuliga@gmail.com
> > >> >:
> > >> > > >> >> >
> > >> > > >> >> > Hi Ivan,
> > >> > > >> >> >
> > >> > > >> >> > Did you have a chance to look through the proposed solution?
> > >> > > >> >> > We definitely need this validation in order to proceed
> > >> further and provide the changes officially .
> > >> > > >> >> >
> > >> > > >> >> > BR,
> > >> > > >> >> > Yuriy Shluiha
> > >> > > >> >> >
> > >> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <sh...@gmail.com>
> > >> пише:
> > >> > > >> >> >>
> > >> > > >> >> >> Hello,
> > >> > > >> >> >>
> > >> > > >> >> >> please see the proposed TextQuery ordering solution here:
> > >> > > >> >> >>
> > >>
> > https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score
> > >> > > >> >> >>
> > >> > > >> >> >> Y.
> > >> > > >> >> >>
> > >> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin <
> > vololo100@gmail.com>
> > >> пише:
> > >> > > >> >> >>>
> > >> > > >> >> >>> Yuriy,
> > >> > > >> >> >>>
> > >> > > >> >> >>> Good to know that the story continues! Yes, it would be
> > >> really nice to
> > >> > > >> >> >>> see the code of your solution, of course formal
> > >> requirements can be
> > >> > > >> >> >>> omitted, a solution design is of the most interest so far.
> > >> And it
> > >> > > >> >> >>> definitely would be great to merge to Apache Ignite
> > codebase
> > >> > > >> >> >>> eventually.
> > >> > > >> >> >>>
> > >> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga <
> > >> shuliga@gmail.com>:
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Hi Ivan,
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Actually I have engaged another developer to help bring
> > >> TextQueries to correctly working state.
> > >> > > >> >> >>> > For now we have solution that adds Ordering
> > functionality
> > >> to distributed TextQueries .
> > >> > > >> >> >>> > This is developed and tested locally. I can share
> > details
> > >> here, then we can discuss and decide whether to create a corresponding
> > >> ticket.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > The starting point is that by nature Lucene's documents
> > >> are always ordered by docScore:float;
> > >> > > >> >> >>> > So we created abstract class Ranked, implementing
> > >> Comparable<Ranked> and Serializable; and containing float rank value;
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge
> > >> should be derived from this class.
> > >> > > >> >> >>> > All subsequent actions will be done under the hood
> > >> automatically due to new CacheQueryFutureRankedDecorator
> > >> > > >> >> >>> > that contain special BlockingIterator used for correct
> > >> merge of distributed responses.
> > >> > > >> >> >>> > Text queries with Ranked entities are automatically
> > >> wrapped with this new decorator.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > This is a contour of solution. Please ask if any
> > >> questions.
> > >> > > >> >> >>> > Or i can create ticket and link PR with already tested
> > >> (yet locally) solution to it for detailed review.
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > BR,
> > >> > > >> >> >>> > Yuriy
> > >> > > >> >> >>> >
> > >> > > >> >> >>> >
> > >> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <
> > >> vololo100@gmail.com> пише:
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> Hi Yuriy,
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> Just would like to realize current state. Are you still
> > >> working on
> > >> > > >> >> >>> >> Ignite text queries? If not, are you going to continue
> > >> with it?
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <
> > >> vololo100@gmail.com>:
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > Yuriy,
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > Sure, I will be glad to help.
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> > >> querying?
> > >> > > >> >> >>> >> > Apparently this is the problem. If you feel it really
> > >> complicated to
> > >> > > >> >> >>> >> > understand and debug then I can dig deeper and share
> > >> my vision how the
> > >> > > >> >> >>> >> > problem can be fixed.
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <
> > >> shuliga@gmail.com>:
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > I will look to the MOVING partition issue.
> > >> > > >> >> >>> >> > > But also need a guidance there.
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > Ivan, don't you mind to be that person?
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > The question is whether we have an issue with:
> > >> > > >> >> >>> >> > > -  wrong storing targets during indexing OR
> > >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> > >> querying?
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > BR,
> > >> > > >> >> >>> >> > > Yuriy Shluiha
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > >
> > >> > > >> >> >>> >> > > --
> > >> > > >> >> >>> >> > > Sent from:
> > >> http://apache-ignite-developers.2346864.n4.nabble.com/
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> >
> > >> > > >> >> >>> >> > --
> > >> > > >> >> >>> >> > Best regards,
> > >> > > >> >> >>> >> > Ivan Pavlukhin
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >>
> > >> > > >> >> >>> >> --
> > >> > > >> >> >>> >> Best regards,
> > >> > > >> >> >>> >> Ivan Pavlukhin
> > >> > > >> >> >>>
> > >> > > >> >> >>>
> > >> > > >> >> >>>
> > >> > > >> >> >>> --
> > >> > > >> >> >>> Best regards,
> > >> > > >> >> >>> Ivan Pavlukhin
> > >>
> > >
> >

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Yuriy Shuliga <sh...@gmail.com>.

Ivan,

I have made changes in the fork that reflects merge-sort strategy and now
query future iterator unblocks as soon all first pages are delivered from
nodes; then it waits for the next pages portions and so on.
https://github.com/shuliga/ignite/commit/c84f04c18f67e99ab7bc0a7893b75f1dc83a76bd

Please validate the design if you wish.

Regarding ranking field in the entity.

Entities for text queries in search domain are usually treated as
documents with some metadata.
This can be an id, issued/expired date, and document score returned for
given query.
It is common to include such fields in entity design.

Answer to your question about omitting QueryRankField:
- Then the response records just will come in arbitrary order. This
should not fail TextQuery execution.

Another point about rank value among different indices.
- ranks are to be used for comparison between entities in praticular query
response, they are not intended to be absolute over the system.

Let me summarize the approaches:
1. Subclassing from Ranked.class.
 pros: the simplest and ignite-natural approach
cons: implicit nature, limits entity inheritance

2. Explicitly Introducing dedicated field  annotated  @QueryRankField
pros:  ignite-natural approach, easy to introduce, explicitly controlled by
developer
cons: adds extra metadata to entity

3. Wrapping entity response with rank data, used for merge sort, not
exposing it to client.
pros: leaves entity design clean
cons: rank is not available for client, development will require complex
change in query execution / entity marshaling mechanisms

I'd stay on p.2 as most balanced solution of these.
What do you think?

BR,
Yuriy Shuliha




ср, 11 бер. 2020 о 01:14 Ivan Pavlukhin <vo...@gmail.com> пише:

> Igniters,
>
> Not intentionally the discussion continued outside of dev list. I am
> returning it back. You can find it below. Do not hesitate to join if you
> have some thoughts on raised questions. May be you have ideas how to enrich
> text query results with score/rank information.
>
> вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <sh...@gmail.com>:
>
> > Yes, please do.
> >
> > вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vo...@gmail.com>
> > пише:
> >
> >> Yuriy,
> >>
> >> I noticed that from some point our discussion moved out of Ignite dev
> >> list. Would you mind if I return it back to dev list?
> >>
> >> Best regards,
> >> Ivan Pavlukhin
> >>
> >> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vo...@gmail.com>:
> >> >
> >> > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> >
> >> > Yes, 2.8 is already available and the community is working on
> >> finalizing activities (e.g. publishing documentation). I do not have any
> >> reliable expectations about next releases. I suppose that there could
> be a
> >> couple of maintenance releases like 2.8.1 as several problems were
> already
> >> discovered. I do not know whether next more significant release is
> going to
> >> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
> >> because there are already several "almost ready" features in master. In
> my
> >> mind it is a good idea to start a discussion about next releases on dev
> >> list.
> >> >
> >> > Best regards,
> >> > Ivan Pavlukhin
> >> >
> >> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vo...@gmail.com>:
> >> > >
> >> > > Hi Yuriy,
> >> > >
> >> > > Sorry for a late response.
> >> > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > >
> >> > > Possibly I missed it but one moment is not clear for me. What will
> >> > > happen if an entity class does not have a field annotated with
> >> > > QueryRankField?
> >> > >
> >> > > And I am still not sure that it is a proper (enough) approach. The
> >> > > thing which bothers me is a transient and dynamic nature of "rank"
> >> > > field. It does belong to entity, it can have different values for
> the
> >> > > same entity (e.g. different indices are used).
> >> > >
> >> > > I would like to experiment with a code a little bit. But most
> likely I
> >> > > will have a chance only at the end of this week.
> >> > >
> >> > > Best regards,
> >> > > Ivan Pavlukhin
> >> > >
> >> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <sh...@gmail.com>:
> >> > > >
> >> > > > Hi Ivan,
> >> > > >
> >> > > > Have concerns about entity annotation variant.
> >> > > > Wrapping into dynamic proxy for passing back, will be quite a
> >> complex thing that requires changes in IgniteCacheObjectProcessor
> >> > > > and entity marshaling.
> >> > > >
> >> > > > Suitable solution without subclassing might be:
> >> > > > 1. Explicitly add float field to entity
> >> > > > 2. Annotate it with special @QueryRankField, (for instance)
> >> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
> >> to initiating node
> >> > > > 4. Possibly still need to proxify entity with adding Comparable
> >> interface.
> >> > > > 5. Perform merge sort on initiating node
> >> > > >
> >> > > > Would you consider this approach or return back to using Ranked
> >> superclass?
> >> > > >
> >> > > > Regarding your proposal to implement megre sort - definitely yes.
> >> > > > I will implement this.
> >> > > > Sorry, didn't understand you earlier )
> >> > > >
> >> > > > BR,
> >> > > > Yuriy Shuliha
> >> > > >
> >> > > > PS As far as i see, the are no chance to get on 2.8 release train.
> >> What will be the next version/date we can aim on with this update?
> >> > > >
> >> > > >
> >> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vo...@gmail.com>
> пише:
> >> > > >>
> >> > > >> Hi Yuriy,
> >> > > >>
> >> > > >> Sorry for a late response and thank you for your comments.
> >> > > >>
> >> > > >> Approach with @Ranked annotation looks cleaner to me from API
> >> point of view.
> >> > > >>
> >> > > >> Regarding merging responses from multiple nodes I suppose that
> good
> >> > > >> enough solution is possible:
> >> > > >> 1. Request one page of entries from each node.
> >> > > >> 2. Return one page to a user (as there is definitely a page of
> the
> >> > > >> best results already).
> >> > > >> 3. Request next result pages from nodes corresponding to pages we
> >> > > >> exposed to the user (actually nodes having lesser than 1 page of
> >> > > >> pending results). Repeat from step 2.
> >> > > >>
> >> > > >> Some kind of sort merge plus backpressure. Backpressure part
> might
> >> be
> >> > > >> left as an improvement.
> >> > > >>
> >> > > >> What do you think?
> >> > > >>
> >> > > >> Best regards,
> >> > > >> Ivan Pavlukhin
> >> > > >>
> >> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <sh...@gmail.com>:
> >> > > >>
> >> > > >> >
> >> > > >> > Hi Ivan,
> >> > > >> >
> >> > > >> > Thank you for keeping eye on the topic!
> >> > > >> >
> >> > > >> >  Here're the answers to your questions:
> >> > > >> > 1. TextQuery response is always ordered by documentScore, and
> >> this number are also frequently used when processing the results.
> >> > > >> > We have analyzed current entity flow indeed the hood of query
> >> processing and found out that the most clean approach to get response
> with
> >> ordered entities is to extent the entity itself.
> >> > > >> > The only drawback will be the necessity to extend from Ranked
> in
> >> our case. And as it is very common to utilize documentScore (rank) when
> >> working with TextQuery.
> >> > > >> > Another  approach i see, is to play with reflection to create
> >> proxy with Ranked interface. In this case we still will need to mark our
> >> intentions to have ordered response and add some @Ranked annotation e.g.
> >> > > >> > Please, advice what would fit Ignite better.
> >> > > >> >
> >> > > >> > 2. Yes, you are right. Using PriorityQueue  may lead to
> unwanted
> >> memory consumption.
> >> > > >> > In order to get correct response we still need to retrieve data
> >> from all of the nodes, as ant of them may contain value that may fall
> into
> >> limited range (this is because of float ranking score)
> >> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has
> >> maximum size limitation. Technically it will be equivalent to the sorted
> >> responses merging, as each element will require comparison upon all
> queue.
> >> > > >> >
> >> > > >> > BR,
> >> > > >> > Yuriy Shuliha
> >> > > >> >
> >> > > >> >
> >> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vo...@gmail.com>
> >> пише:
> >> > > >> >>
> >> > > >> >> Hi Yuriy,
> >> > > >> >>
> >> > > >> >> Sorry for a delay. I went through the proposed solution and I
> >> have
> >> > > >> >> some questions. Currently I am a little bit far from a context
> >> of TEXT
> >> > > >> >> queries, so correct me or redirect to some previous discussion
> >> if I
> >> > > >> >> got something wrong:
> >> > > >> >> 1. What is a justification for using inheritance from Ranked
> in
> >> order
> >> > > >> >> to keep order? Why cannot we mix in rank/score into entries
> >> > > >> >> transferred inside GridCacheQueryResponse?
> >> > > >> >> 2. Collecting all entries in PriorityQueue can lead to
> >> unnecessary
> >> > > >> >> heap memory consumption. I think that merging several sorted
> >> runs
> >> > > >> >> (responses from different nodes) will be a better option.
> >> > > >> >>
> >> > > >> >> Best regards,
> >> > > >> >> Ivan Pavlukhin
> >> > > >> >>
> >> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga <
> shuliga@gmail.com
> >> >:
> >> > > >> >> >
> >> > > >> >> > Hi Ivan,
> >> > > >> >> >
> >> > > >> >> > Did you have a chance to look through the proposed solution?
> >> > > >> >> > We definitely need this validation in order to proceed
> >> further and provide the changes officially .
> >> > > >> >> >
> >> > > >> >> > BR,
> >> > > >> >> > Yuriy Shluiha
> >> > > >> >> >
> >> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <sh...@gmail.com>
> >> пише:
> >> > > >> >> >>
> >> > > >> >> >> Hello,
> >> > > >> >> >>
> >> > > >> >> >> please see the proposed TextQuery ordering solution here:
> >> > > >> >> >>
> >>
> https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score
> >> > > >> >> >>
> >> > > >> >> >> Y.
> >> > > >> >> >>
> >> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin <
> vololo100@gmail.com>
> >> пише:
> >> > > >> >> >>>
> >> > > >> >> >>> Yuriy,
> >> > > >> >> >>>
> >> > > >> >> >>> Good to know that the story continues! Yes, it would be
> >> really nice to
> >> > > >> >> >>> see the code of your solution, of course formal
> >> requirements can be
> >> > > >> >> >>> omitted, a solution design is of the most interest so far.
> >> And it
> >> > > >> >> >>> definitely would be great to merge to Apache Ignite
> codebase
> >> > > >> >> >>> eventually.
> >> > > >> >> >>>
> >> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga <
> >> shuliga@gmail.com>:
> >> > > >> >> >>> >
> >> > > >> >> >>> > Hi Ivan,
> >> > > >> >> >>> >
> >> > > >> >> >>> > Actually I have engaged another developer to help bring
> >> TextQueries to correctly working state.
> >> > > >> >> >>> > For now we have solution that adds Ordering
> functionality
> >> to distributed TextQueries .
> >> > > >> >> >>> > This is developed and tested locally. I can share
> details
> >> here, then we can discuss and decide whether to create a corresponding
> >> ticket.
> >> > > >> >> >>> >
> >> > > >> >> >>> > The starting point is that by nature Lucene's documents
> >> are always ordered by docScore:float;
> >> > > >> >> >>> > So we created abstract class Ranked, implementing
> >> Comparable<Ranked> and Serializable; and containing float rank value;
> >> > > >> >> >>> >
> >> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge
> >> should be derived from this class.
> >> > > >> >> >>> > All subsequent actions will be done under the hood
> >> automatically due to new CacheQueryFutureRankedDecorator
> >> > > >> >> >>> > that contain special BlockingIterator used for correct
> >> merge of distributed responses.
> >> > > >> >> >>> > Text queries with Ranked entities are automatically
> >> wrapped with this new decorator.
> >> > > >> >> >>> >
> >> > > >> >> >>> > This is a contour of solution. Please ask if any
> >> questions.
> >> > > >> >> >>> > Or i can create ticket and link PR with already tested
> >> (yet locally) solution to it for detailed review.
> >> > > >> >> >>> >
> >> > > >> >> >>> > BR,
> >> > > >> >> >>> > Yuriy
> >> > > >> >> >>> >
> >> > > >> >> >>> >
> >> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <
> >> vololo100@gmail.com> пише:
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> Hi Yuriy,
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> Just would like to realize current state. Are you still
> >> working on
> >> > > >> >> >>> >> Ignite text queries? If not, are you going to continue
> >> with it?
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <
> >> vololo100@gmail.com>:
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > Yuriy,
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > Sure, I will be glad to help.
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> >> querying?
> >> > > >> >> >>> >> > Apparently this is the problem. If you feel it really
> >> complicated to
> >> > > >> >> >>> >> > understand and debug then I can dig deeper and share
> >> my vision how the
> >> > > >> >> >>> >> > problem can be fixed.
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <
> >> shuliga@gmail.com>:
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > I will look to the MOVING partition issue.
> >> > > >> >> >>> >> > > But also need a guidance there.
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > Ivan, don't you mind to be that person?
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > The question is whether we have an issue with:
> >> > > >> >> >>> >> > > -  wrong storing targets during indexing OR
> >> > > >> >> >>> >> > > - incorrect nodes/partition selection during
> >> querying?
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > BR,
> >> > > >> >> >>> >> > > Yuriy Shluiha
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > >
> >> > > >> >> >>> >> > > --
> >> > > >> >> >>> >> > > Sent from:
> >> http://apache-ignite-developers.2346864.n4.nabble.com/
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> >
> >> > > >> >> >>> >> > --
> >> > > >> >> >>> >> > Best regards,
> >> > > >> >> >>> >> > Ivan Pavlukhin
> >> > > >> >> >>> >>
> >> > > >> >> >>> >>
> >> > > >> >> >>> >>
> >> > > >> >> >>> >> --
> >> > > >> >> >>> >> Best regards,
> >> > > >> >> >>> >> Ivan Pavlukhin
> >> > > >> >> >>>
> >> > > >> >> >>>
> >> > > >> >> >>>
> >> > > >> >> >>> --
> >> > > >> >> >>> Best regards,
> >> > > >> >> >>> Ivan Pavlukhin
> >>
> >
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Igniters,

Not intentionally the discussion continued outside of dev list. I am
returning it back. You can find it below. Do not hesitate to join if you
have some thoughts on raised questions. May be you have ideas how to enrich
text query results with score/rank information.

вт, 10 мар. 2020 г. в 09:11, Yuriy Shuliga <sh...@gmail.com>:

> Yes, please do.
>
> вт, 10 бер. 2020, 02:26 користувач Ivan Pavlukhin <vo...@gmail.com>
> пише:
>
>> Yuriy,
>>
>> I noticed that from some point our discussion moved out of Ignite dev
>> list. Would you mind if I return it back to dev list?
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>> вт, 10 мар. 2020 г. в 03:25, Ivan Pavlukhin <vo...@gmail.com>:
>> >
>> > > PS As far as i see, the are no chance to get on 2.8 release train.
>> What will be the next version/date we can aim on with this update?
>> >
>> > Yes, 2.8 is already available and the community is working on
>> finalizing activities (e.g. publishing documentation). I do not have any
>> reliable expectations about next releases. I suppose that there could be a
>> couple of maintenance releases like 2.8.1 as several problems were already
>> discovered. I do not know whether next more significant release is going to
>> be 2.9 even major release 3.0. It sounds realistic to facilitate 2.9
>> because there are already several "almost ready" features in master. In my
>> mind it is a good idea to start a discussion about next releases on dev
>> list.
>> >
>> > Best regards,
>> > Ivan Pavlukhin
>> >
>> > вт, 10 мар. 2020 г. в 00:58, Ivan Pavlukhin <vo...@gmail.com>:
>> > >
>> > > Hi Yuriy,
>> > >
>> > > Sorry for a late response.
>> > >
>> > > > Suitable solution without subclassing might be:
>> > > > 1. Explicitly add float field to entity
>> > > > 2. Annotate it with special @QueryRankField, (for instance)
>> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
>> to initiating node
>> > > > 4. Possibly still need to proxify entity with adding Comparable
>> interface.
>> > > > 5. Perform merge sort on initiating node
>> > >
>> > > Possibly I missed it but one moment is not clear for me. What will
>> > > happen if an entity class does not have a field annotated with
>> > > QueryRankField?
>> > >
>> > > And I am still not sure that it is a proper (enough) approach. The
>> > > thing which bothers me is a transient and dynamic nature of "rank"
>> > > field. It does belong to entity, it can have different values for the
>> > > same entity (e.g. different indices are used).
>> > >
>> > > I would like to experiment with a code a little bit. But most likely I
>> > > will have a chance only at the end of this week.
>> > >
>> > > Best regards,
>> > > Ivan Pavlukhin
>> > >
>> > > пн, 2 мар. 2020 г. в 20:09, Yuriy Shuliga <sh...@gmail.com>:
>> > > >
>> > > > Hi Ivan,
>> > > >
>> > > > Have concerns about entity annotation variant.
>> > > > Wrapping into dynamic proxy for passing back, will be quite a
>> complex thing that requires changes in IgniteCacheObjectProcessor
>> > > > and entity marshaling.
>> > > >
>> > > > Suitable solution without subclassing might be:
>> > > > 1. Explicitly add float field to entity
>> > > > 2. Annotate it with special @QueryRankField, (for instance)
>> > > > 3. Fill in this field with docScore in GrlidLuceneindex, pass back
>> to initiating node
>> > > > 4. Possibly still need to proxify entity with adding Comparable
>> interface.
>> > > > 5. Perform merge sort on initiating node
>> > > >
>> > > > Would you consider this approach or return back to using Ranked
>> superclass?
>> > > >
>> > > > Regarding your proposal to implement megre sort - definitely yes.
>> > > > I will implement this.
>> > > > Sorry, didn't understand you earlier )
>> > > >
>> > > > BR,
>> > > > Yuriy Shuliha
>> > > >
>> > > > PS As far as i see, the are no chance to get on 2.8 release train.
>> What will be the next version/date we can aim on with this update?
>> > > >
>> > > >
>> > > > пт, 28 лют. 2020 о 23:08 Ivan Pavlukhin <vo...@gmail.com> пише:
>> > > >>
>> > > >> Hi Yuriy,
>> > > >>
>> > > >> Sorry for a late response and thank you for your comments.
>> > > >>
>> > > >> Approach with @Ranked annotation looks cleaner to me from API
>> point of view.
>> > > >>
>> > > >> Regarding merging responses from multiple nodes I suppose that good
>> > > >> enough solution is possible:
>> > > >> 1. Request one page of entries from each node.
>> > > >> 2. Return one page to a user (as there is definitely a page of the
>> > > >> best results already).
>> > > >> 3. Request next result pages from nodes corresponding to pages we
>> > > >> exposed to the user (actually nodes having lesser than 1 page of
>> > > >> pending results). Repeat from step 2.
>> > > >>
>> > > >> Some kind of sort merge plus backpressure. Backpressure part might
>> be
>> > > >> left as an improvement.
>> > > >>
>> > > >> What do you think?
>> > > >>
>> > > >> Best regards,
>> > > >> Ivan Pavlukhin
>> > > >>
>> > > >> вт, 18 февр. 2020 г. в 18:27, Yuriy Shuliga <sh...@gmail.com>:
>> > > >>
>> > > >> >
>> > > >> > Hi Ivan,
>> > > >> >
>> > > >> > Thank you for keeping eye on the topic!
>> > > >> >
>> > > >> >  Here're the answers to your questions:
>> > > >> > 1. TextQuery response is always ordered by documentScore, and
>> this number are also frequently used when processing the results.
>> > > >> > We have analyzed current entity flow indeed the hood of query
>> processing and found out that the most clean approach to get response with
>> ordered entities is to extent the entity itself.
>> > > >> > The only drawback will be the necessity to extend from Ranked in
>> our case. And as it is very common to utilize documentScore (rank) when
>> working with TextQuery.
>> > > >> > Another  approach i see, is to play with reflection to create
>> proxy with Ranked interface. In this case we still will need to mark our
>> intentions to have ordered response and add some @Ranked annotation e.g.
>> > > >> > Please, advice what would fit Ignite better.
>> > > >> >
>> > > >> > 2. Yes, you are right. Using PriorityQueue  may lead to unwanted
>> memory consumption.
>> > > >> > In order to get correct response we still need to retrieve data
>> from all of the nodes, as ant of them may contain value that may fall into
>> limited range (this is because of float ranking score)
>> > > >> > This can be fixed by using Guava's MinMaxPriorityQueue that has
>> maximum size limitation. Technically it will be equivalent to the sorted
>> responses merging, as each element will require comparison upon all queue.
>> > > >> >
>> > > >> > BR,
>> > > >> > Yuriy Shuliha
>> > > >> >
>> > > >> >
>> > > >> > чт, 13 лют. 2020 о 13:53 Ivan Pavlukhin <vo...@gmail.com>
>> пише:
>> > > >> >>
>> > > >> >> Hi Yuriy,
>> > > >> >>
>> > > >> >> Sorry for a delay. I went through the proposed solution and I
>> have
>> > > >> >> some questions. Currently I am a little bit far from a context
>> of TEXT
>> > > >> >> queries, so correct me or redirect to some previous discussion
>> if I
>> > > >> >> got something wrong:
>> > > >> >> 1. What is a justification for using inheritance from Ranked in
>> order
>> > > >> >> to keep order? Why cannot we mix in rank/score into entries
>> > > >> >> transferred inside GridCacheQueryResponse?
>> > > >> >> 2. Collecting all entries in PriorityQueue can lead to
>> unnecessary
>> > > >> >> heap memory consumption. I think that merging several sorted
>> runs
>> > > >> >> (responses from different nodes) will be a better option.
>> > > >> >>
>> > > >> >> Best regards,
>> > > >> >> Ivan Pavlukhin
>> > > >> >>
>> > > >> >> пн, 10 февр. 2020 г. в 18:32, Yuriy Shuliga <shuliga@gmail.com
>> >:
>> > > >> >> >
>> > > >> >> > Hi Ivan,
>> > > >> >> >
>> > > >> >> > Did you have a chance to look through the proposed solution?
>> > > >> >> > We definitely need this validation in order to proceed
>> further and provide the changes officially .
>> > > >> >> >
>> > > >> >> > BR,
>> > > >> >> > Yuriy Shluiha
>> > > >> >> >
>> > > >> >> > вт, 28 січ. 2020 о 17:30 Yuriy Shuliga <sh...@gmail.com>
>> пише:
>> > > >> >> >>
>> > > >> >> >> Hello,
>> > > >> >> >>
>> > > >> >> >> please see the proposed TextQuery ordering solution here:
>> > > >> >> >>
>> https://github.com/apache/ignite/compare/master...shuliga:feature/rank_score
>> > > >> >> >>
>> > > >> >> >> Y.
>> > > >> >> >>
>> > > >> >> >> пт, 24 січ. 2020 о 09:50 Ivan Pavlukhin <vo...@gmail.com>
>> пише:
>> > > >> >> >>>
>> > > >> >> >>> Yuriy,
>> > > >> >> >>>
>> > > >> >> >>> Good to know that the story continues! Yes, it would be
>> really nice to
>> > > >> >> >>> see the code of your solution, of course formal
>> requirements can be
>> > > >> >> >>> omitted, a solution design is of the most interest so far.
>> And it
>> > > >> >> >>> definitely would be great to merge to Apache Ignite codebase
>> > > >> >> >>> eventually.
>> > > >> >> >>>
>> > > >> >> >>> чт, 23 янв. 2020 г. в 16:47, Yuriy Shuliga <
>> shuliga@gmail.com>:
>> > > >> >> >>> >
>> > > >> >> >>> > Hi Ivan,
>> > > >> >> >>> >
>> > > >> >> >>> > Actually I have engaged another developer to help bring
>> TextQueries to correctly working state.
>> > > >> >> >>> > For now we have solution that adds Ordering functionality
>> to distributed TextQueries .
>> > > >> >> >>> > This is developed and tested locally. I can share details
>> here, then we can discuss and decide whether to create a corresponding
>> ticket.
>> > > >> >> >>> >
>> > > >> >> >>> > The starting point is that by nature Lucene's documents
>> are always ordered by docScore:float;
>> > > >> >> >>> > So we created abstract class Ranked, implementing
>> Comparable<Ranked> and Serializable; and containing float rank value;
>> > > >> >> >>> >
>> > > >> >> >>> > Each entity expected to be ordered on TextQuery merge
>> should be derived from this class.
>> > > >> >> >>> > All subsequent actions will be done under the hood
>> automatically due to new CacheQueryFutureRankedDecorator
>> > > >> >> >>> > that contain special BlockingIterator used for correct
>> merge of distributed responses.
>> > > >> >> >>> > Text queries with Ranked entities are automatically
>> wrapped with this new decorator.
>> > > >> >> >>> >
>> > > >> >> >>> > This is a contour of solution. Please ask if any
>> questions.
>> > > >> >> >>> > Or i can create ticket and link PR with already tested
>> (yet locally) solution to it for detailed review.
>> > > >> >> >>> >
>> > > >> >> >>> > BR,
>> > > >> >> >>> > Yuriy
>> > > >> >> >>> >
>> > > >> >> >>> >
>> > > >> >> >>> > вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <
>> vololo100@gmail.com> пише:
>> > > >> >> >>> >>
>> > > >> >> >>> >> Hi Yuriy,
>> > > >> >> >>> >>
>> > > >> >> >>> >> Just would like to realize current state. Are you still
>> working on
>> > > >> >> >>> >> Ignite text queries? If not, are you going to continue
>> with it?
>> > > >> >> >>> >>
>> > > >> >> >>> >> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <
>> vololo100@gmail.com>:
>> > > >> >> >>> >> >
>> > > >> >> >>> >> > Yuriy,
>> > > >> >> >>> >> >
>> > > >> >> >>> >> > Sure, I will be glad to help.
>> > > >> >> >>> >> >
>> > > >> >> >>> >> > > - incorrect nodes/partition selection during
>> querying?
>> > > >> >> >>> >> > Apparently this is the problem. If you feel it really
>> complicated to
>> > > >> >> >>> >> > understand and debug then I can dig deeper and share
>> my vision how the
>> > > >> >> >>> >> > problem can be fixed.
>> > > >> >> >>> >> >
>> > > >> >> >>> >> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <
>> shuliga@gmail.com>:
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > > I will look to the MOVING partition issue.
>> > > >> >> >>> >> > > But also need a guidance there.
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > > Ivan, don't you mind to be that person?
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > > The question is whether we have an issue with:
>> > > >> >> >>> >> > > -  wrong storing targets during indexing OR
>> > > >> >> >>> >> > > - incorrect nodes/partition selection during
>> querying?
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > > BR,
>> > > >> >> >>> >> > > Yuriy Shluiha
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > >
>> > > >> >> >>> >> > > --
>> > > >> >> >>> >> > > Sent from:
>> http://apache-ignite-developers.2346864.n4.nabble.com/
>> > > >> >> >>> >> >
>> > > >> >> >>> >> >
>> > > >> >> >>> >> >
>> > > >> >> >>> >> > --
>> > > >> >> >>> >> > Best regards,
>> > > >> >> >>> >> > Ivan Pavlukhin
>> > > >> >> >>> >>
>> > > >> >> >>> >>
>> > > >> >> >>> >>
>> > > >> >> >>> >> --
>> > > >> >> >>> >> Best regards,
>> > > >> >> >>> >> Ivan Pavlukhin
>> > > >> >> >>>
>> > > >> >> >>>
>> > > >> >> >>>
>> > > >> >> >>> --
>> > > >> >> >>> Best regards,
>> > > >> >> >>> Ivan Pavlukhin
>>
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Yuriy Shuliga <sh...@gmail.com>.

Hi Ivan,

Actually I have engaged another developer to help bring TextQueries to
correctly working state.
For now we have solution that adds Ordering functionality to distributed
TextQueries .
This is developed and tested locally. I can share details here, then we can
discuss and decide whether to create a corresponding ticket.

The starting point is that by nature Lucene's documents are always ordered
by docScore:float;
So we created abstract class Ranked, implementing Comparable<Ranked> and
Serializable; and containing float rank value;

Each entity expected to be ordered on TextQuery merge should be
derived from this class.
All subsequent actions will be done under the hood automatically due
to new CacheQueryFutureRankedDecorator

that contain special BlockingIterator used for correct merge of distributed
responses.
Text queries with Ranked entities are automatically wrapped with this new
decorator.

This is a contour of solution. Please ask if any questions.
Or i can create ticket and link PR with already tested (yet locally)
solution to it for detailed review.

BR,
Yuriy


вт, 21 січ. 2020 о 07:29 Ivan Pavlukhin <vo...@gmail.com> пише:

> Hi Yuriy,
>
> Just would like to realize current state. Are you still working on
> Ignite text queries? If not, are you going to continue with it?
>
> пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <vo...@gmail.com>:
> >
> > Yuriy,
> >
> > Sure, I will be glad to help.
> >
> > > - incorrect nodes/partition selection during querying?
> > Apparently this is the problem. If you feel it really complicated to
> > understand and debug then I can dig deeper and share my vision how the
> > problem can be fixed.
> >
> > ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <sh...@gmail.com>:
> > >
> > > I will look to the MOVING partition issue.
> > > But also need a guidance there.
> > >
> > > Ivan, don't you mind to be that person?
> > >
> > > The question is whether we have an issue with:
> > > -  wrong storing targets during indexing OR
> > > - incorrect nodes/partition selection during querying?
> > >
> > > BR,
> > > Yuriy Shluiha
> > >
> > >
> > >
> > > --
> > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Hi Yuriy,

Just would like to realize current state. Are you still working on
Ignite text queries? If not, are you going to continue with it?

пт, 13 дек. 2019 г. в 11:52, Ivan Pavlukhin <vo...@gmail.com>:
>
> Yuriy,
>
> Sure, I will be glad to help.
>
> > - incorrect nodes/partition selection during querying?
> Apparently this is the problem. If you feel it really complicated to
> understand and debug then I can dig deeper and share my vision how the
> problem can be fixed.
>
> ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <sh...@gmail.com>:
> >
> > I will look to the MOVING partition issue.
> > But also need a guidance there.
> >
> > Ivan, don't you mind to be that person?
> >
> > The question is whether we have an issue with:
> > -  wrong storing targets during indexing OR
> > - incorrect nodes/partition selection during querying?
> >
> > BR,
> > Yuriy Shluiha
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Yuriy,

Sure, I will be glad to help.

> - incorrect nodes/partition selection during querying?
Apparently this is the problem. If you feel it really complicated to
understand and debug then I can dig deeper and share my vision how the
problem can be fixed.

ср, 11 дек. 2019 г. в 18:46, Yuriy Shuliga <sh...@gmail.com>:
>
> I will look to the MOVING partition issue.
> But also need a guidance there.
>
> Ivan, don't you mind to be that person?
>
> The question is whether we have an issue with:
> -  wrong storing targets during indexing OR
> - incorrect nodes/partition selection during querying?
>
> BR,
> Yuriy Shluiha
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Yuriy Shuliga <sh...@gmail.com>.

I will look to the MOVING partition issue.
But also need a guidance there. 

Ivan, don't you mind to be that person?

The question is whether we have an issue with:
-  wrong storing targets during indexing OR 
- incorrect nodes/partition selection during querying?

BR,
Yuriy Shluiha



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

Yes, I guess you are right :(

I can surely fix the range issue, It's just that it was so broken that I
could not figure the correct behavior for this case.

Regards,
-- 
Ilya Kasnacheev


пн, 2 дек. 2019 г. в 15:01, Ivan Pavlukhin <vo...@gmail.com>:

> Ilya,
>
> I checked your test on a revision before "limit" and it fails there as
> well. Could you please double check?
>
> пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <il...@gmail.com>:
> >
> > Hello!
> >
> > The problem is NOT specific to range queries. Range queries were broken
> > previously and they are broken now, but now even a simple "token in field
> > with limit" returns duplicates.
> >
> > Before limits were introduced, any tested use cases were unaffected by
> > duplicates, but now they are.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <vo...@gmail.com>:
> >
> > > And is the problem specific to range queries or not?
> > >
> > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
> > > >
> > > > Yuriy,
> > > >
> > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > the problem relates to introduced "limit"? Is it right that there
> were
> > > > no duplicates before "limit" support? After that support is
> introduced
> > > > are only limited queries contain duplicates, or unlimited, or both?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > >
> > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com
> > > >:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I have just found what I consider a major regression in Text
> Queries:
> > > it
> > > > > seems to me that text queries with limits will return same
> key-value
> > > > > entries multiple times.
> > > > >
> > > > > Please check the issue
> > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > and corresponding build
> > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

*on topologies

вт, 3 дек. 2019 г. в 17:15, Ivan Pavlukhin <vo...@gmail.com>:
>
> Ilya, Yuriy,
>
> It seems that text queries can return incorrect results on tologies
> with MOVING partitions. I left a comment in JIRA [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12401
>
> пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin <vo...@gmail.com>:
> >
> > Ilya,
> >
> > I checked your test on a revision before "limit" and it fails there as
> > well. Could you please double check?
> >
> > пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <il...@gmail.com>:
> > >
> > > Hello!
> > >
> > > The problem is NOT specific to range queries. Range queries were broken
> > > previously and they are broken now, but now even a simple "token in field
> > > with limit" returns duplicates.
> > >
> > > Before limits were introduced, any tested use cases were unaffected by
> > > duplicates, but now they are.
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <vo...@gmail.com>:
> > >
> > > > And is the problem specific to range queries or not?
> > > >
> > > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
> > > > >
> > > > > Yuriy,
> > > > >
> > > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > > the problem relates to introduced "limit"? Is it right that there were
> > > > > no duplicates before "limit" support? After that support is introduced
> > > > > are only limited queries contain duplicates, or unlimited, or both?
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > > >
> > > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> > > > >:
> > > > > >
> > > > > > Hello!
> > > > > >
> > > > > > I have just found what I consider a major regression in Text Queries:
> > > > it
> > > > > > seems to me that text queries with limits will return same key-value
> > > > > > entries multiple times.
> > > > > >
> > > > > > Please check the issue
> > > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > > and corresponding build
> > > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Ilya, Yuriy,

It seems that text queries can return incorrect results on tologies
with MOVING partitions. I left a comment in JIRA [1].

[1] https://issues.apache.org/jira/browse/IGNITE-12401

пн, 2 дек. 2019 г. в 15:00, Ivan Pavlukhin <vo...@gmail.com>:
>
> Ilya,
>
> I checked your test on a revision before "limit" and it fails there as
> well. Could you please double check?
>
> пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <il...@gmail.com>:
> >
> > Hello!
> >
> > The problem is NOT specific to range queries. Range queries were broken
> > previously and they are broken now, but now even a simple "token in field
> > with limit" returns duplicates.
> >
> > Before limits were introduced, any tested use cases were unaffected by
> > duplicates, but now they are.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <vo...@gmail.com>:
> >
> > > And is the problem specific to range queries or not?
> > >
> > > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
> > > >
> > > > Yuriy,
> > > >
> > > > Thank you for investigating the problem [1]. Still cannot realize how
> > > > the problem relates to introduced "limit"? Is it right that there were
> > > > no duplicates before "limit" support? After that support is introduced
> > > > are only limited queries contain duplicates, or unlimited, or both?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > > >
> > > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> > > >:
> > > > >
> > > > > Hello!
> > > > >
> > > > > I have just found what I consider a major regression in Text Queries:
> > > it
> > > > > seems to me that text queries with limits will return same key-value
> > > > > entries multiple times.
> > > > >
> > > > > Please check the issue
> > > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > > and corresponding build
> > > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Ilya,

I checked your test on a revision before "limit" and it fails there as
well. Could you please double check?

пн, 2 дек. 2019 г. в 13:21, Ilya Kasnacheev <il...@gmail.com>:
>
> Hello!
>
> The problem is NOT specific to range queries. Range queries were broken
> previously and they are broken now, but now even a simple "token in field
> with limit" returns duplicates.
>
> Before limits were introduced, any tested use cases were unaffected by
> duplicates, but now they are.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <vo...@gmail.com>:
>
> > And is the problem specific to range queries or not?
> >
> > пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
> > >
> > > Yuriy,
> > >
> > > Thank you for investigating the problem [1]. Still cannot realize how
> > > the problem relates to introduced "limit"? Is it right that there were
> > > no duplicates before "limit" support? After that support is introduced
> > > are only limited queries contain duplicates, or unlimited, or both?
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> > >
> > > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> > >:
> > > >
> > > > Hello!
> > > >
> > > > I have just found what I consider a major regression in Text Queries:
> > it
> > > > seems to me that text queries with limits will return same key-value
> > > > entries multiple times.
> > > >
> > > > Please check the issue
> > https://issues.apache.org/jira/browse/IGNITE-12401
> > > > and corresponding build
> > > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

The problem is NOT specific to range queries. Range queries were broken
previously and they are broken now, but now even a simple "token in field
with limit" returns duplicates.

Before limits were introduced, any tested use cases were unaffected by
duplicates, but now they are.

Regards,
-- 
Ilya Kasnacheev


пн, 2 дек. 2019 г. в 12:23, Ivan Pavlukhin <vo...@gmail.com>:

> And is the problem specific to range queries or not?
>
> пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
> >
> > Yuriy,
> >
> > Thank you for investigating the problem [1]. Still cannot realize how
> > the problem relates to introduced "limit"? Is it right that there were
> > no duplicates before "limit" support? After that support is introduced
> > are only limited queries contain duplicates, or unlimited, or both?
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-12401
> >
> > чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> >:
> > >
> > > Hello!
> > >
> > > I have just found what I consider a major regression in Text Queries:
> it
> > > seems to me that text queries with limits will return same key-value
> > > entries multiple times.
> > >
> > > Please check the issue
> https://issues.apache.org/jira/browse/IGNITE-12401
> > > and corresponding build
> > > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

And is the problem specific to range queries or not?

пн, 2 дек. 2019 г. в 11:12, Ivan Pavlukhin <vo...@gmail.com>:
>
> Yuriy,
>
> Thank you for investigating the problem [1]. Still cannot realize how
> the problem relates to introduced "limit"? Is it right that there were
> no duplicates before "limit" support? After that support is introduced
> are only limited queries contain duplicates, or unlimited, or both?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12401
>
> чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <il...@gmail.com>:
> >
> > Hello!
> >
> > I have just found what I consider a major regression in Text Queries: it
> > seems to me that text queries with limits will return same key-value
> > entries multiple times.
> >
> > Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
> > and corresponding build
> > https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
> >
> > Regards,
> > --
> > Ilya Kasnacheev
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Yuriy,

Thank you for investigating the problem [1]. Still cannot realize how
the problem relates to introduced "limit"? Is it right that there were
no duplicates before "limit" support? After that support is introduced
are only limited queries contain duplicates, or unlimited, or both?

[1] https://issues.apache.org/jira/browse/IGNITE-12401

чт, 28 нояб. 2019 г. в 18:30, Ilya Kasnacheev <il...@gmail.com>:
>
> Hello!
>
> I have just found what I consider a major regression in Text Queries: it
> seems to me that text queries with limits will return same key-value
> entries multiple times.
>
> Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
> and corresponding build
> https://ci.ignite.apache.org/viewQueued.html?itemId=4799634
>
> Regards,
> --
> Ilya Kasnacheev



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

I have just found what I consider a major regression in Text Queries: it
seems to me that text queries with limits will return same key-value
entries multiple times.

Please check the issue https://issues.apache.org/jira/browse/IGNITE-12401
and corresponding build
https://ci.ignite.apache.org/viewQueued.html?itemId=4799634

Regards,
-- 
Ilya Kasnacheev

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Folks, Yuriy,

I suppose that we are going to proceed with

>>>
Reducing on Ignite

The obvious point of distributed response reduction is class
GridCacheDistributedQueryFuture.
Though, @Ivan Pavlukhin mentioned class with similar functionality:
ReduceIndexSorted
What I see here, that it is tangled with H2 related classes
(org.h2.result.Row) and might not be unified with TextQuery reduction.
>>

From my side there is no strict opinion that we should unify
reduction. Having a separate reduction implementation for text queries
sounds for me as not bad option as well.

Are there still any open questions?

ср, 27 нояб. 2019 г. в 02:27, Denis Magda <dm...@apache.org>:
>
> I don't see anything wrong if Yuriy is willing to carry on and keep
> enhancing our full-text search support that lacks basic capabilities.
>
> The basics should be available. If anybody needs an advanced feature they
> can introduce Solr or ElastiSearch into the final architecture of the app.
>
> Folks, who of us can help Yuriy with the questions asked? Most like the SQL
> experts are the best candidates here.
>
>
> -
> Denis
>
>
> On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <vo...@gmail.com> wrote:
>
> > Folks,
> >
> > IEP is an Ignite-specific thing. In fact, I suppose that we are
> > already doing it in ASF way by having this dev-list discussion =)
> >
> > As for me, implementing "limit" feature for text queries is not so big
> > to make an IEP. But we might need to create one for next features.
> >
> > вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <il...@gmail.com>:
> > >
> > > Hello!
> > >
> > > ASF way should probably start with an IEP :)
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> > <arzamas123@mail.ru.invalid
> > > >:
> > >
> > > >
> > > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > > functionality is helpful and PR it, why not ?
> > > >
> > > > isn`t it ?
> > > >
> > > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > > ilya.kasnacheev@gmail.com>:
> > > > >
> > > > >Hello!
> > > > >
> > > > >The problem here is that Solr is a multi-year effort by a lot of
> > people.
> > > > We
> > > > >can't match that.
> > > > >
> > > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > > > cache
> > > > >information into their storage for indexing and relying on their own
> > > > >mechanisms for distributed IR sorting?
> > > > >
> > > > >Regards,
> > > > >--
> > > > >Ilya Kasnacheev
> > > > >
> > > > >
> > > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > > arzamas123@mail.ru.invalid
> > > > >>:
> > > > >
> > > > >>
> > > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > > > >>
> > > > >> thanks !
> > > > >>
> > > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > > >>  ilya.kasnacheev@gmail.com >:
> > > > >> >
> > > > >> >Hello!
> > > > >> >
> > > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> > Cloud)
> > > > >> into
> > > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > > justified.
> > > > >> >
> > > > >> >I don't think we should try to implement sorting in Apache Ignite,
> > > > because
> > > > >> >it is a lot of work, and a lot of code in our code base which we
> > don't
> > > > >> >really want.
> > > > >> >
> > > > >> >Regards,
> > > > >> >--
> > > > >> >Ilya Kasnacheev
> > > > >> >
> > > > >> >
> > > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shuliga@gmail.com
> > >:
> > > > >> >
> > > > >> >> Dear Igniters,
> > > > >> >>
> > > > >> >> The first part of TextQuery improvement - a result limit - was
> > > > developed
> > > > >> >> and merged.
> > > > >> >> Now we have to develop most important functionality here - proper
> > > > >> sorting
> > > > >> >> of Lucene index response and correct reducing of them for
> > distributed
> > > > >> >> queries.
> > > > >> >>
> > > > >> >> *There are two Lucene based aspects*
> > > > >> >>
> > > > >> >> 1. In case of using no sorting fields, the documents in response
> > are
> > > > >> still
> > > > >> >> ordered by relevance.
> > > > >> >> Actually this is ScoreDoc.score value.
> > > > >> >> In order to reduce the distributed results correctly, the score
> > > > should
> > > > >> be
> > > > >> >> passed with response.
> > > > >> >>
> > > > >> >> 2. When sorting by conventional fields, then Lucene should have
> > these
> > > > >> >> fields properly indexed and
> > > > >> >> corresponding Sort object should be applied to Lucene's search
> > call.
> > > > >> >> In order to mark those fields a new annotation like '@SortField'
> > may
> > > > be
> > > > >> >> introduced.
> > > > >> >>
> > > > >> >> *Reducing on Ignite *
> > > > >> >>
> > > > >> >> The obvious point of distributed response reduction is class
> > > > >> >> GridCacheDistributedQueryFuture.
> > > > >> >> Though, @Ivan Pavlukhin mentioned class with similar
> > functionality:
> > > > >> >> ReduceIndexSorted
> > > > >> >> What I see here, that it is tangled with H2 related classes (
> > > > >> >> org.h2.result.Row) and might not be unified with TextQuery
> > reduction.
> > > > >> >>
> > > > >> >> Still need a support here.
> > > > >> >>
> > > > >> >> Overall, the goal of this letter is to initiate discussion on
> > > > TextQuery
> > > > >> >> Sorting implementation and come closer to ticket creation.
> > > > >> >>
> > > > >> >> BR,
> > > > >> >> Yuriy Shuliha
> > > > >> >>
> > > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > > > andrey.mashenkov@gmail.com
> > > > >> >
> > > > >> >> пише:
> > > > >> >>
> > > > >> >> > Hi Dmitry, Yuriy.
> > > > >> >> >
> > > > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > > > AtomicInteger
> > > > >> >> > 'total' field and 'limit; field as primitive int.
> > > > >> >> >
> > > > >> >> > Both fields are used inside synchronized block only.
> > > > >> >> > So, we can make both private and downgrade AtomicInteger to
> > > > primitive
> > > > >> >> int.
> > > > >> >> >
> > > > >> >> > Most likely, these fields can be replaced with one field.
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > > > dpavlov@apache.org
> > > > >> >
> > > > >> >> > wrote:
> > > > >> >> >
> > > > >> >> > > Hi Andrey,
> > > > >> >> > >
> > > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> > > > (with
> > > > >> no
> > > > >> >> > > blockers).
> > > > >> >> > >
> > > > >> >> > > Do you have any concerns related to this patch?
> > > > >> >> > >
> > > > >> >> > > Sincerely,
> > > > >> >> > > Dmitriy Pavlov
> > > > >> >> > >
> > > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <
> > shuliga@gmail.com
> > > > >:
> > > > >> >> > >
> > > > >> >> > >> Andrey,
> > > > >> >> > >>
> > > > >> >> > >> Per you request, I created ticket
> > > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked
> > to
> > > > >> >> > >>
> > > > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > > > >> >> > >>
> > > > >> >> > >> Could you please proceed with PR merge ?
> > > > >> >> > >>
> > > > >> >> > >> BR,
> > > > >> >> > >> Yuriy Shuliha
> > > > >> >> > >>
> > > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> > > > >>  andrey.mashenkov@gmail.com
> > > > >> >> >
> > > > >> >> > >> пише:
> > > > >> >> > >>
> > > > >> >> > >> > Hi Yuri,
> > > > >> >> > >> >
> > > > >> >> > >> > To get access to TC Bot you should register as TeamCity
> > user
> > > > >> [1], if
> > > > >> >> > you
> > > > >> >> > >> > didn't do this already.
> > > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page
> > with
> > > > >> same
> > > > >> >> > >> > credentials.
> > > > >> >> > >> >
> > > > >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> > > > >> >> > >> >
> > > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> > > > shuliga@gmail.com
> > > > >> >
> > > > >> >> > wrote:
> > > > >> >> > >> >
> > > > >> >> > >> >> Andrew,
> > > > >> >> > >> >>
> > > > >> >> > >> >> I have corrected PR according to your notes. Please
> > review.
> > > > >> >> > >> >> What will be the next steps in order to merge in?
> > > > >> >> > >> >>
> > > > >> >> > >> >> Y.
> > > > >> >> > >> >>
> > > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > > > >> >> >  andrey.mashenkov@gmail.com >
> > > > >> >> > >> >> пише:
> > > > >> >> > >> >>
> > > > >> >> > >> >> > Yuri,
> > > > >> >> > >> >> >
> > > > >> >> > >> >> > I've done with review.
> > > > >> >> > >> >> > No crime found, but trivial compatibility bug.
> > > > >> >> > >> >> >
> > > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> > > > >>  shuliga@gmail.com >
> > > > >> >> > >> wrote:
> > > > >> >> > >> >> >
> > > > >> >> > >> >> > > Denis,
> > > > >> >> > >> >> > >
> > > > >> >> > >> >> > > Thank you for your attention to this.
> > > > >> >> > >> >> > > as for now, the
> > > > >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> > > > >> >> > >> >> > ticket
> > > > >> >> > >> >> > > is still pending review.
> > > > >> >> > >> >> > > Do we have a chance to move it forward somehow?
> > > > >> >> > >> >> > >
> > > > >> >> > >> >> > > BR,
> > > > >> >> > >> >> > > Yuriy Shuliha
> > > > >> >> > >> >> > >
> > > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> > > > dmagda@apache.org >
> > > > >> пише:
> > > > >> >> > >> >> > >
> > > > >> >> > >> >> > > > Yuriy,
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > > I've seen you opening a pull-request with the first
> > > > >> changes:
> > > > >> >> > >> >> > > >
> > https://issues.apache.org/jira/browse/IGNITE-12189
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to
> > do
> > > > the
> > > > >> >> > review?
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > > -
> > > > >> >> > >> >> > > > Denis
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > > > >> >> > >>  vololo100@gmail.com >
> > > > >> >> > >> >> > > wrote:
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > > > > Yuriy,
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > Thank you for providing details! Quite
> > interesting.
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > Yes, we already have support of distributed
> > limit and
> > > > >> >> merging
> > > > >> >> > >> >> sorted
> > > > >> >> > >> >> > > > > subresults for SQL queries. E.g.
> > ReduceIndexSorted
> > > > and
> > > > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted
> > > > streams.
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > Could you please also clarify about
> > score/relevance?
> > > > Is
> > > > >> it
> > > > >> >> > >> >> provided
> > > > >> >> > >> >> > by
> > > > >> >> > >> >> > > > > Lucene engine for each query result? I am
> > thinking
> > > > how
> > > > >> to
> > > > >> >> do
> > > > >> >> > >> >> sorted
> > > > >> >> > >> >> > > > > merge properly in this case.
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> > > > >> >> >  shuliga@gmail.com
> > > > >> >> > >> >:
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Ivan,
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Thank you for interesting question!
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Text searches (or full text searches) are
> > mostly
> > > > >> >> > >> human-oriented.
> > > > >> >> > >> >> > And
> > > > >> >> > >> >> > > > the
> > > > >> >> > >> >> > > > > > point of user's interest is topmost part of
> > > > response.
> > > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the
> > given
> > > > >> records
> > > > >> >> > for
> > > > >> >> > >> >> > further
> > > > >> >> > >> >> > > > > > purposes.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> > > > operations
> > > > >> >> with
> > > > >> >> > >> >> > financial
> > > > >> >> > >> >> > > > > data,
> > > > >> >> > >> >> > > > > > and there lots of text stuff like assets names,
> > > > fin.
> > > > >> >> > >> >> instruments,
> > > > >> >> > >> >> > > > > companies
> > > > >> >> > >> >> > > > > > etc.
> > > > >> >> > >> >> > > > > > In order to operate with this quickly and
> > reliably,
> > > > >> users
> > > > >> >> > >> used
> > > > >> >> > >> >> to
> > > > >> >> > >> >> > > work
> > > > >> >> > >> >> > > > > with
> > > > >> >> > >> >> > > > > > text search, type-ahead completions,
> > suggestions.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > For this purposes we are indexing particular
> > string
> > > > >> data
> > > > >> >> in
> > > > >> >> > >> >> > separate
> > > > >> >> > >> >> > > > > caches.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Sorting capabilities and response size
> > limitations
> > > > are
> > > > >> >> very
> > > > >> >> > >> >> > important
> > > > >> >> > >> >> > > > > > there. As our API have to provide most relevant
> > > > >> >> information
> > > > >> >> > >> in
> > > > >> >> > >> >> view
> > > > >> >> > >> >> > > of
> > > > >> >> > >> >> > > > > > limited size.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene
> > perspective.
> > > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> > > > >> >> > >> *TopDocs.scoresDocs
> > > > >> >> > >> >> > > *already
> > > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> > > > >> documents
> > > > >> >> > >> are on
> > > > >> >> > >> >> > the
> > > > >> >> > >> >> > > > top.
> > > > >> >> > >> >> > > > > > And currently distributed queries responses
> > from
> > > > >> >> different
> > > > >> >> > >> nodes
> > > > >> >> > >> >> > are
> > > > >> >> > >> >> > > > > merged
> > > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> > > > >> >> > >> >> > > > > > So in fact we already have the score order
> > ruined
> > > > >> here.
> > > > >> >> > Also
> > > > >> >> > >> >> Ignite
> > > > >> >> > >> >> > > > > > requests all possible documents from Lucene
> > that is
> > > > >> >> > redundant
> > > > >> >> > >> >> and
> > > > >> >> > >> >> > not
> > > > >> >> > >> >> > > > > good
> > > > >> >> > >> >> > > > > > for performance.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part
> > of
> > > > >> >> *TextQuery
> > > > >> >> > >> *and
> > > > >> >> > >> >> > have
> > > > >> >> > >> >> > > > to
> > > > >> >> > >> >> > > > > > notice that we still have to add sorting for
> > text
> > > > >> queries
> > > > >> >> > >> >> > processing
> > > > >> >> > >> >> > > in
> > > > >> >> > >> >> > > > > > order to have applicable results.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the
> > part of
> > > > >> >> issues
> > > > >> >> > >> from
> > > > >> >> > >> >> > > above,
> > > > >> >> > >> >> > > > > but
> > > > >> >> > >> >> > > > > > definitely, sorting by document score at least
> > > > should
> > > > >> be
> > > > >> >> > >> >> > implemented
> > > > >> >> > >> >> > > > > along
> > > > >> >> > >> >> > > > > > with limit.
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > This is a pretty short commentary if you still
> > have
> > > > >> any
> > > > >> >> > >> >> questions,
> > > > >> >> > >> >> > > > please
> > > > >> >> > >> >> > > > > > ask, do not hesitate)
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > BR,
> > > > >> >> > >> >> > > > > > Yuriy Shuliha
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> > > > >> >> >  vololo100@gmail.com >
> > > > >> >> > >> >> пише:
> > > > >> >> > >> >> > > > > >
> > > > >> >> > >> >> > > > > > > Yuriy,
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > > Greatly appreciate your interest.
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about
> > > > >> sorting?
> > > > >> >> > What
> > > > >> >> > >> >> tasks
> > > > >> >> > >> >> > > > does
> > > > >> >> > >> >> > > > > > > it help to solve and how? It would be great
> > to
> > > > >> provide
> > > > >> >> an
> > > > >> >> > >> >> > example.
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei
> > Scherbakov <
> > > > >> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > Denis,
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception
> > for
> > > > >> enabled
> > > > >> >> > text
> > > > >> >> > >> >> > queries
> > > > >> >> > >> >> > > > on
> > > > >> >> > >> >> > > > > > > > persistent caches.
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for
> > unsorted
> > > > >> >> > searches.
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> > > > >> >> > >>  dmagda@apache.org
> > > > >> >> > >> >> >:
> > > > >> >> > >> >> > > > > > > >
> > > > >> >> > >> >> > > > > > > > > Igniters,
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal
> > in
> > > > >> regards
> > > > >> >> > >> >> full-text
> > > > >> >> > >> >> > > > > search
> > > > >> >> > >> >> > > > > > > API
> > > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to
> > push it
> > > > >> >> > forward.
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes
> > > > total
> > > > >> >> sense
> > > > >> >> > >> for
> > > > >> >> > >> >> > > > in-memory
> > > > >> >> > >> >> > > > > data
> > > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data
> > of
> > > > an
> > > > >> >> > >> underlying
> > > > >> >> > >> >> DB
> > > > >> >> > >> >> > > like
> > > > >> >> > >> >> > > > > > > Postgres.
> > > > >> >> > >> >> > > > > > > > > As part of the changes, I would simply
> > throw
> > > > an
> > > > >> >> > >> exception
> > > > >> >> > >> >> (by
> > > > >> >> > >> >> > > > > default)
> > > > >> >> > >> >> > > > > > > if
> > > > >> >> > >> >> > > > > > > > > the one attempts to use text indices
> > with the
> > > > >> >> native
> > > > >> >> > >> >> > > persistence
> > > > >> >> > >> >> > > > > > > enabled.
> > > > >> >> > >> >> > > > > > > > > If the person is ready to live with that
> > > > >> limitation
> > > > >> >> > >> that
> > > > >> >> > >> >> an
> > > > >> >> > >> >> > > > > explicit
> > > > >> >> > >> >> > > > > > > > > configuration change is needed to come
> > around
> > > > >> the
> > > > >> >> > >> >> exception.
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > Thoughts?
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > -
> > > > >> >> > >> >> > > > > > > > > Denis
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> > > > Shuliga <
> > > > >> >> > >> >> > >  shuliga@gmail.com
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > > > wrote:
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > Hello to all again,
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > Thank you for important comments and
> > notes
> > > > >> given
> > > > >> >> > >> below!
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > Let me answer and continue the
> > discussion.
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > Alexei has referenced to
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > > >> >> > >> where
> > > > >> >> > >> >> > > > > > > > > > absence of index persistence was
> > declared
> > > > as
> > > > >> an
> > > > >> >> > >> >> obstacle to
> > > > >> >> > >> >> > > > > further
> > > > >> >> > >> >> > > > > > > > > > development.
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> > > > >> valid.b)
> > > > >> >> > >> There
> > > > >> >> > >> >> are
> > > > >> >> > >> >> > > > > definite
> > > > >> >> > >> >> > > > > > > needs
> > > > >> >> > >> >> > > > > > > > > > (and in our project as well) in just
> > > > in-memory
> > > > >> >> > >> indexing
> > > > >> >> > >> >> of
> > > > >> >> > >> >> > > > > selected
> > > > >> >> > >> >> > > > > > > data.
> > > > >> >> > >> >> > > > > > > > > > We intend to use search capabilities
> > for
> > > > >> fetching
> > > > >> >> > >> >> limited
> > > > >> >> > >> >> > > > amount
> > > > >> >> > >> >> > > > > of
> > > > >> >> > >> >> > > > > > > > > records
> > > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead
> > search /
> > > > >> >> > >> suggestions.
> > > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed
> > and the
> > > > >> are
> > > > >> >> no
> > > > >> >> > >> need
> > > > >> >> > >> >> in
> > > > >> >> > >> >> > > > Lucene
> > > > >> >> > >> >> > > > > > > index
> > > > >> >> > >> >> > > > > > > > > to
> > > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> > > > pattern of
> > > > >> >> > >> >> text-search
> > > > >> >> > >> >> > > > usage.
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> > > > >> implementation.
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> > > > *(*offset*
> > > > >> >> > seems
> > > > >> >> > >> to
> > > > >> >> > >> >> be
> > > > >> >> > >> >> > > not
> > > > >> >> > >> >> > > > > > > required
> > > > >> >> > >> >> > > > > > > > > in
> > > > >> >> > >> >> > > > > > > > > > text-search tasks for now)
> > > > >> >> > >> >> > > > > > > > > > I have investigated the data flow for
> > > > >> distributed
> > > > >> >> > >> text
> > > > >> >> > >> >> > > queries.
> > > > >> >> > >> >> > > > > it
> > > > >> >> > >> >> > > > > > > was
> > > > >> >> > >> >> > > > > > > > > > simple test prefix query, like
> > > > 'name'*='ene*'*
> > > > >> >> > >> >> > > > > > > > > > For now each server-node returns all
> > > > response
> > > > >> >> > >> records to
> > > > >> >> > >> >> > the
> > > > >> >> > >> >> > > > > > > client-node
> > > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> > > > >> thousands
> > > > >> >> > >> >> records.
> > > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100.
> > Again,
> > > > all
> > > > >> >> the
> > > > >> >> > >> >> results
> > > > >> >> > >> >> > > are
> > > > >> >> > >> >> > > > > added
> > > > >> >> > >> >> > > > > > > to
> > > > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> > > > >> arbitrary
> > > > >> >> > >> order
> > > > >> >> > >> >> by
> > > > >> >> > >> >> > > > pages.
> > > > >> >> > >> >> > > > > > > > > > I did not find here any means to
> > deliver
> > > > >> >> > >> deterministic
> > > > >> >> > >> >> > > result.
> > > > >> >> > >> >> > > > > > > > > > So implementing limit as part of query
> > and
> > > > >> >> > >> >> > > > > (GridCacheQueryRequest)
> > > > >> >> > >> >> > > > > > > will
> > > > >> >> > >> >> > > > > > > > > not
> > > > >> >> > >> >> > > > > > > > > > change the nature of response but will
> > > > limit
> > > > >> load
> > > > >> >> > on
> > > > >> >> > >> >> nodes
> > > > >> >> > >> >> > > and
> > > > >> >> > >> >> > > > > > > > > networking.
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for
> > this?
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> > > > >> exposition
> > > > >> >> to
> > > > >> >> > >> >> Ignite
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > a) Sorting
> > > > >> >> > >> >> > > > > > > > > > The solution for this could be:
> > > > >> >> > >> >> > > > > > > > > > - Make entities comparable
> > > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> > > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted
> > fields for
> > > > >> >> Lucene
> > > > >> >> > >> >> indexing
> > > > >> >> > >> >> > > > > > > > > > - Use comparators when merging
> > responses or
> > > > >> >> > reducing
> > > > >> >> > >> to
> > > > >> >> > >> >> > > desired
> > > > >> >> > >> >> > > > > > > limit on
> > > > >> >> > >> >> > > > > > > > > > client node.
> > > > >> >> > >> >> > > > > > > > > > Will require full result set to be
> > loaded
> > > > into
> > > > >> >> > >> memory.
> > > > >> >> > >> >> > Though
> > > > >> >> > >> >> > > > > can be
> > > > >> >> > >> >> > > > > > > used
> > > > >> >> > >> >> > > > > > > > > > for relatively small limits.
> > > > >> >> > >> >> > > > > > > > > > BR,
> > > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> > > > Scherbakov <
> > > > >> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> > > > >> >> > >> >> > > > > > > > > > пише:
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > Yuriy,
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for
> > text
> > > > >> >> queries
> > > > >> >> > is
> > > > >> >> > >> >> [1]
> > > > >> >> > >> >> > > which
> > > > >> >> > >> >> > > > > makes
> > > > >> >> > >> >> > > > > > > > > > lucene
> > > > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and
> > > > main
> > > > >> >> reason
> > > > >> >> > >> for
> > > > >> >> > >> >> > > > > > > discontinuation.
> > > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed
> > first
> > > > to
> > > > >> make
> > > > >> >> > >> text
> > > > >> >> > >> >> > > queries
> > > > >> >> > >> >> > > > a
> > > > >> >> > >> >> > > > > > > valid
> > > > >> >> > >> >> > > > > > > > > > > product feature.
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> > > > querying is
> > > > >> >> > indeed
> > > > >> >> > >> >> not a
> > > > >> >> > >> >> > > > > trivial
> > > > >> >> > >> >> > > > > > > task.
> > > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be
> > implemented
> > > > on
> > > > >> >> query
> > > > >> >> > >> >> > > originating
> > > > >> >> > >> >> > > > > node.
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > [1]
> > > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis
> > Magda
> > > > <
> > > > >> >> > >> >> > >  dmagda@apache.org
> > > > >> >> > >> >> > > > >:
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > Yuriy,
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> > > > >> full-text
> > > > >> >> > >> search
> > > > >> >> > >> >> > > indexes
> > > > >> >> > >> >> > > > > then
> > > > >> >> > >> >> > > > > > > > > please
> > > > >> >> > >> >> > > > > > > > > > go
> > > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> > > > >> community
> > > > >> >> > >> wants to
> > > > >> >> > >> >> > > > > discontinue
> > > > >> >> > >> >> > > > > > > them
> > > > >> >> > >> >> > > > > > > > > > > first
> > > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later)
> > are
> > > > the
> > > > >> >> > >> limitations
> > > > >> >> > >> >> > > listed
> > > > >> >> > >> >> > > > > by
> > > > >> >> > >> >> > > > > > > Andrey
> > > > >> >> > >> >> > > > > > > > > > and
> > > > >> >> > >> >> > > > > > > > > > > > minimal support from the community
> > end.
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > -
> > > > >> >> > >> >> > > > > > > > > > > > Denis
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM
> > Andrey
> > > > >> >> > Mashenkov
> > > > >> >> > >> <
> > > > >> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> > > > >> >> > >> >> > > > > > > > > > > > wrote:
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan
> > to
> > > > >> >> > discontinue
> > > > >> >> > >> >> > > > TextQueries
> > > > >> >> > >> >> > > > > in
> > > > >> >> > >> >> > > > > > > > > Ignite
> > > > >> >> > >> >> > > > > > > > > > > [1].
> > > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes
> > are
> > > > not
> > > > >> >> > >> >> persistent,
> > > > >> >> > >> >> > not
> > > > >> >> > >> >> > > > > > > > > transactional
> > > > >> >> > >> >> > > > > > > > > > > and
> > > > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL
> > or
> > > > >> inside
> > > > >> >> > SQL.
> > > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest
> > from
> > > > >> >> > community
> > > > >> >> > >> >> side.
> > > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these
> > > > issues
> > > > >> and
> > > > >> >> > >> make
> > > > >> >> > >> >> > > > > TextQueries
> > > > >> >> > >> >> > > > > > > great.
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to
> > limit
> > > > >> >> > resultset.
> > > > >> >> > >> >> > > > > > > > > > > > > Query results return from data
> > node
> > > > to
> > > > >> >> > >> client-side
> > > > >> >> > >> >> > > cursor
> > > > >> >> > >> >> > > > > in
> > > > >> >> > >> >> > > > > > > > > > > page-by-page
> > > > >> >> > >> >> > > > > > > > > > > > > manner and
> > > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed
> > control
> > > > page
> > > > >> >> size.
> > > > >> >> > >> It
> > > > >> >> > >> >> is
> > > > >> >> > >> >> > > > > supposed
> > > > >> >> > >> >> > > > > > > query
> > > > >> >> > >> >> > > > > > > > > > > > executes
> > > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> > > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full
> > resultset be
> > > > >> loaded
> > > > >> >> > to
> > > > >> >> > >> >> memory
> > > > >> >> > >> >> > > on
> > > > >> >> > >> >> > > > > server
> > > > >> >> > >> >> > > > > > > > > side
> > > > >> >> > >> >> > > > > > > > > > at
> > > > >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> > > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load
> > > > entire
> > > > >> >> > >> resultset
> > > > >> >> > >> >> > into
> > > > >> >> > >> >> > > > > memory
> > > > >> >> > >> >> > > > > > > > > before
> > > > >> >> > >> >> > > > > > > > > > > > first
> > > > >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should
> > be
> > > > >> added
> > > > >> >> to
> > > > >> >> > >> limit
> > > > >> >> > >> >> > > > result.
> > > > >> >> > >> >> > > > > The
> > > > >> >> > >> >> > > > > > > best
> > > > >> >> > >> >> > > > > > > > > > > > > solution is to use query language
> > > > >> commands
> > > > >> >> > for
> > > > >> >> > >> >> this,
> > > > >> >> > >> >> > > e.g.
> > > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> > > > >> >> > >> >> > > > > > > > > > > > in
> > > > >> >> > >> >> > > > > > > > > > > > > SQL.
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial.
> > > > Query is
> > > > >> >> > >> >> distributed
> > > > >> >> > >> >> > > > > operation
> > > > >> >> > >> >> > > > > > > and
> > > > >> >> > >> >> > > > > > > > > > same
> > > > >> >> > >> >> > > > > > > > > > > > > user query will be executed on
> > data
> > > > >> nodes
> > > > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes
> > > > should
> > > > >> be
> > > > >> >> > >> correcly
> > > > >> >> > >> >> > > merged
> > > > >> >> > >> >> > > > > > > before
> > > > >> >> > >> >> > > > > > > > > > being
> > > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> > > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on
> > every
> > > > >> node
> > > > >> >> and
> > > > >> >> > >> >> then on
> > > > >> >> > >> >> > > > merge
> > > > >> >> > >> >> > > > > > > phase.
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos,
> > > > limiting
> > > > >> >> > results
> > > > >> >> > >> >> make
> > > > >> >> > >> >> > no
> > > > >> >> > >> >> > > > > sence
> > > > >> >> > >> >> > > > > > > > > without
> > > > >> >> > >> >> > > > > > > > > > > > > sorting,
> > > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every
> > next
> > > > >> query
> > > > >> >> run
> > > > >> >> > >> will
> > > > >> >> > >> >> > > return
> > > > >> >> > >> >> > > > > same
> > > > >> >> > >> >> > > > > > > data
> > > > >> >> > >> >> > > > > > > > > > > > because
> > > > >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> > > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive
> > > > results
> > > > >> from
> > > > >> >> > >> data
> > > > >> >> > >> >> > nodes
> > > > >> >> > >> >> > > > > > > > > asynchronously
> > > > >> >> > >> >> > > > > > > > > > > and
> > > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes
> > can't
> > > > be
> > > > >> >> > ordered.
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > 2.
> > > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> > > > >> >> > @QueryTextFiled)
> > > > >> >> > >> >> looks
> > > > >> >> > >> >> > > more
> > > > >> >> > >> >> > > > > > > verbose,
> > > > >> >> > >> >> > > > > > > > > > > isn't
> > > > >> >> > >> >> > > > > > > > > > > > > it.
> > > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed
> > query?
> > > > How
> > > > >> >> > partial
> > > > >> >> > >> >> > results
> > > > >> >> > >> >> > > > from
> > > > >> >> > >> >> > > > > > > nodes
> > > > >> >> > >> >> > > > > > > > > > will
> > > > >> >> > >> >> > > > > > > > > > > be
> > > > >> >> > >> >> > > > > > > > > > > > > merged?
> > > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> > > > >> comparator
> > > > >> >> > for
> > > > >> >> > >> >> data
> > > > >> >> > >> >> > > > > sorting?
> > > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should
> > choose
> > > > to
> > > > >> >> sort
> > > > >> >> > >> >> result
> > > > >> >> > >> >> > on
> > > > >> >> > >> >> > > > > merge
> > > > >> >> > >> >> > > > > > > phase?
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> > > > >> >> configurable
> > > > >> >> > at
> > > > >> >> > >> >> all.
> > > > >> >> > >> >> > > E.g.
> > > > >> >> > >> >> > > > > it is
> > > > >> >> > >> >> > > > > > > > > > > > impossible
> > > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> > > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> > > > >> configure
> > > > >> >> > >> engine
> > > > >> >> > >> >> at
> > > > >> >> > >> >> > > > first
> > > > >> >> > >> >> > > > > and
> > > > >> >> > >> >> > > > > > > only
> > > > >> >> > >> >> > > > > > > > > > > then
> > > > >> >> > >> >> > > > > > > > > > > > go
> > > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement
> > complex
> > > > >> >> > features,
> > > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine
> > config.
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM
> > Yuriy
> > > > >> >> > Shuliga <
> > > > >> >> > >> >> > > > > > >  shuliga@gmail.com >
> > > > >> >> > >> >> > > > > > > > > > > wrote:
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > Dear community,
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd
> > like to
> > > > >> open
> > > > >> >> > >> >> discussion
> > > > >> >> > >> >> > > that
> > > > >> >> > >> >> > > > > would
> > > > >> >> > >> >> > > > > > > > > come
> > > > >> >> > >> >> > > > > > > > > > to
> > > > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj.
> > area.
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing
> > capabilities,
> > > > >> backed
> > > > >> >> up
> > > > >> >> > >> by
> > > > >> >> > >> >> > > > different
> > > > >> >> > >> >> > > > > > > > > > mechanisms,
> > > > >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used
> > > > (past
> > > > >> >> year
> > > > >> >> > >> >> > release).
> > > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and
> > mature
> > > > >> >> technology
> > > > >> >> > >> that
> > > > >> >> > >> >> > > covers
> > > > >> >> > >> >> > > > > text
> > > > >> >> > >> >> > > > > > > > > search
> > > > >> >> > >> >> > > > > > > > > > > > area
> > > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> > > > >> indexing).
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more
> > Lucene
> > > > >> >> > >> functionality
> > > > >> >> > >> >> to
> > > > >> >> > >> >> > > > Ignite
> > > > >> >> > >> >> > > > > > > > > indexing
> > > > >> >> > >> >> > > > > > > > > > > and
> > > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text
> > data*.
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at
> > > > current
> > > > >> >> stage.
> > > > >> >> > >> It
> > > > >> >> > >> >> is
> > > > >> >> > >> >> > > > coming
> > > > >> >> > >> >> > > > > > > from our
> > > > >> >> > >> >> > > > > > > > > > > > > project's
> > > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be
> > > > useful
> > > > >> for
> > > > >> >> a
> > > > >> >> > >> lot
> > > > >> >> > >> >> more
> > > > >> >> > >> >> > > > > people.
> > > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or
> > > > discuss
> > > > >> >> > about
> > > > >> >> > >> >> Jira
> > > > >> >> > >> >> > > > > tickets for
> > > > >> >> > >> >> > > > > > > > > them.
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> > > > >> dataQuery.getPageSize()
> > > > >> >> > to
> > > > >> >> > >> >> limit
> > > > >> >> > >> >> > > > search
> > > > >> >> > >> >> > > > > > > > > response
> > > > >> >> > >> >> > > > > > > > > > > > items
> > > > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> > > > >> Currently
> > > > >> >> > it
> > > > >> >> > >> is
> > > > >> >> > >> >> > > calling
> > > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> > > > >> >> > >> >> *Integer.MAX_VALUE*) -
> > > > >> >> > >> >> > so
> > > > >> >> > >> >> > > > > > > basically
> > > > >> >> > >> >> > > > > > > > > all
> > > > >> >> > >> >> > > > > > > > > > > > > scored
> > > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what
> > we
> > > > do
> > > > >> not
> > > > >> >> > >> need in
> > > > >> >> > >> >> > most
> > > > >> >> > >> >> > > > > cases.
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then
> > more
> > > > >> >> capable
> > > > >> >> > >> >> search
> > > > >> >> > >> >> > > call
> > > > >> >> > >> >> > > > > can be
> > > > >> >> > >> >> > > > > > > > > > > > > > executed:
> > > > *IndexSearcher.search(query,
> > > > >> >> > count,
> > > > >> >> > >> >> > > > > > > > > > > > > > sort) *
> > > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> > > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean
> > *sortField*
> > > > >> >> parameter
> > > > >> >> > in
> > > > >> >> > >> >> > > > > > > *@QueryTextFiled *
> > > > >> >> > >> >> > > > > > > > > > > > > > annotation. If
> > > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be
> > indexed
> > > > but
> > > > >> not
> > > > >> >> > >> >> tokenized.
> > > > >> >> > >> >> > > > > Number
> > > > >> >> > >> >> > > > > > > types
> > > > >> >> > >> >> > > > > > > > > > are
> > > > >> >> > >> >> > > > > > > > > > > > > > preferred here.
> > > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> > > > >> *TextQuery*
> > > > >> >> > >> >> > constructor.
> > > > >> >> > >> >> > > It
> > > > >> >> > >> >> > > > > > > should
> > > > >> >> > >> >> > > > > > > > > > define
> > > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for
> > > > querying.
> > > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage
> > in
> > > > >> >> > >> >> > > > > GridLuceneIndex.query().
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex
> > queries
> > > > >> with
> > > > >> >> > >> >> > *TextQuery*,
> > > > >> >> > >> >> > > > > > > including
> > > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> > > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only,
> > as
> > > > >> >> requires
> > > > >> >> > >> more
> > > > >> >> > >> >> > > > detailed
> > > > >> >> > >> >> > > > > > > work.
> > > > >> >> > >> >> > > > > > > > > > Should
> > > > >> >> > >> >> > > > > > > > > > > > be
> > > > >> >> > >> >> > > > > > > > > > > > > > extended if community is
> > > > interested in
> > > > >> >> it.*
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your
> > comments!
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > > BR,
> > > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> > > > >> >> > >> >> > > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > > > --
> > > > >> >> > >> >> > > > > > > > > > > > > Best regards,
> > > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> > > > >> >> > >> >> > > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > --
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > > > Best regards,
> > > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> > > > >> >> > >> >> > > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > > >
> > > > >> >> > >> >> > > > > > > > >
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > > > > --
> > > > >> >> > >> >> > > > > > > Best regards,
> > > > >> >> > >> >> > > > > > > Ivan Pavlukhin
> > > > >> >> > >> >> > > > > > >
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > > > --
> > > > >> >> > >> >> > > > > Best regards,
> > > > >> >> > >> >> > > > > Ivan Pavlukhin
> > > > >> >> > >> >> > > > >
> > > > >> >> > >> >> > > >
> > > > >> >> > >> >> > >
> > > > >> >> > >> >> >
> > > > >> >> > >> >> >
> > > > >> >> > >> >> > --
> > > > >> >> > >> >> > Best regards,
> > > > >> >> > >> >> > Andrey V. Mashenkov
> > > > >> >> > >> >> >
> > > > >> >> > >> >>
> > > > >> >> > >> >
> > > > >> >> > >> >
> > > > >> >> > >> > --
> > > > >> >> > >> > Best regards,
> > > > >> >> > >> > Andrey V. Mashenkov
> > > > >> >> > >> >
> > > > >> >> > >>
> > > > >> >> > >
> > > > >> >> >
> > > > >> >> > --
> > > > >> >> > Best regards,
> > > > >> >> > Andrey V. Mashenkov
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > > >
> > > >
> >
> >
> >
> > --
> > Best regards,
> > Ivan Pavlukhin
> >
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Denis Magda <dm...@apache.org>.

I don't see anything wrong if Yuriy is willing to carry on and keep
enhancing our full-text search support that lacks basic capabilities.

The basics should be available. If anybody needs an advanced feature they
can introduce Solr or ElastiSearch into the final architecture of the app.

Folks, who of us can help Yuriy with the questions asked? Most like the SQL
experts are the best candidates here.


-
Denis


On Tue, Nov 26, 2019 at 8:52 AM Ivan Pavlukhin <vo...@gmail.com> wrote:

> Folks,
>
> IEP is an Ignite-specific thing. In fact, I suppose that we are
> already doing it in ASF way by having this dev-list discussion =)
>
> As for me, implementing "limit" feature for text queries is not so big
> to make an IEP. But we might need to create one for next features.
>
> вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <il...@gmail.com>:
> >
> > Hello!
> >
> > ASF way should probably start with an IEP :)
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky
> <arzamas123@mail.ru.invalid
> > >:
> >
> > >
> > > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > > functionality is helpful and PR it, why not ?
> > >
> > > isn`t it ?
> > >
> > > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > > ilya.kasnacheev@gmail.com>:
> > > >
> > > >Hello!
> > > >
> > > >The problem here is that Solr is a multi-year effort by a lot of
> people.
> > > We
> > > >can't match that.
> > > >
> > > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > > cache
> > > >information into their storage for indexing and relying on their own
> > > >mechanisms for distributed IR sorting?
> > > >
> > > >Regards,
> > > >--
> > > >Ilya Kasnacheev
> > > >
> > > >
> > > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > > arzamas123@mail.ru.invalid
> > > >>:
> > > >
> > > >>
> > > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > > >>
> > > >> thanks !
> > > >>
> > > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > > >>  ilya.kasnacheev@gmail.com >:
> > > >> >
> > > >> >Hello!
> > > >> >
> > > >> >I have a hunch that we are trying to build Apache Solr (or Solr
> Cloud)
> > > >> into
> > > >> >Apache Ignite. I think that's a lot of effort that is not very
> > > justified.
> > > >> >
> > > >> >I don't think we should try to implement sorting in Apache Ignite,
> > > because
> > > >> >it is a lot of work, and a lot of code in our code base which we
> don't
> > > >> >really want.
> > > >> >
> > > >> >Regards,
> > > >> >--
> > > >> >Ilya Kasnacheev
> > > >> >
> > > >> >
> > > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shuliga@gmail.com
> >:
> > > >> >
> > > >> >> Dear Igniters,
> > > >> >>
> > > >> >> The first part of TextQuery improvement - a result limit - was
> > > developed
> > > >> >> and merged.
> > > >> >> Now we have to develop most important functionality here - proper
> > > >> sorting
> > > >> >> of Lucene index response and correct reducing of them for
> distributed
> > > >> >> queries.
> > > >> >>
> > > >> >> *There are two Lucene based aspects*
> > > >> >>
> > > >> >> 1. In case of using no sorting fields, the documents in response
> are
> > > >> still
> > > >> >> ordered by relevance.
> > > >> >> Actually this is ScoreDoc.score value.
> > > >> >> In order to reduce the distributed results correctly, the score
> > > should
> > > >> be
> > > >> >> passed with response.
> > > >> >>
> > > >> >> 2. When sorting by conventional fields, then Lucene should have
> these
> > > >> >> fields properly indexed and
> > > >> >> corresponding Sort object should be applied to Lucene's search
> call.
> > > >> >> In order to mark those fields a new annotation like '@SortField'
> may
> > > be
> > > >> >> introduced.
> > > >> >>
> > > >> >> *Reducing on Ignite *
> > > >> >>
> > > >> >> The obvious point of distributed response reduction is class
> > > >> >> GridCacheDistributedQueryFuture.
> > > >> >> Though, @Ivan Pavlukhin mentioned class with similar
> functionality:
> > > >> >> ReduceIndexSorted
> > > >> >> What I see here, that it is tangled with H2 related classes (
> > > >> >> org.h2.result.Row) and might not be unified with TextQuery
> reduction.
> > > >> >>
> > > >> >> Still need a support here.
> > > >> >>
> > > >> >> Overall, the goal of this letter is to initiate discussion on
> > > TextQuery
> > > >> >> Sorting implementation and come closer to ticket creation.
> > > >> >>
> > > >> >> BR,
> > > >> >> Yuriy Shuliha
> > > >> >>
> > > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > > andrey.mashenkov@gmail.com
> > > >> >
> > > >> >> пише:
> > > >> >>
> > > >> >> > Hi Dmitry, Yuriy.
> > > >> >> >
> > > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > > AtomicInteger
> > > >> >> > 'total' field and 'limit; field as primitive int.
> > > >> >> >
> > > >> >> > Both fields are used inside synchronized block only.
> > > >> >> > So, we can make both private and downgrade AtomicInteger to
> > > primitive
> > > >> >> int.
> > > >> >> >
> > > >> >> > Most likely, these fields can be replaced with one field.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > > dpavlov@apache.org
> > > >> >
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Hi Andrey,
> > > >> >> > >
> > > >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> > > (with
> > > >> no
> > > >> >> > > blockers).
> > > >> >> > >
> > > >> >> > > Do you have any concerns related to this patch?
> > > >> >> > >
> > > >> >> > > Sincerely,
> > > >> >> > > Dmitriy Pavlov
> > > >> >> > >
> > > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <
> shuliga@gmail.com
> > > >:
> > > >> >> > >
> > > >> >> > >> Andrey,
> > > >> >> > >>
> > > >> >> > >> Per you request, I created ticket
> > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked
> to
> > > >> >> > >>
> > > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > > >> >> > >>
> > > >> >> > >> Could you please proceed with PR merge ?
> > > >> >> > >>
> > > >> >> > >> BR,
> > > >> >> > >> Yuriy Shuliha
> > > >> >> > >>
> > > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> > > >>  andrey.mashenkov@gmail.com
> > > >> >> >
> > > >> >> > >> пише:
> > > >> >> > >>
> > > >> >> > >> > Hi Yuri,
> > > >> >> > >> >
> > > >> >> > >> > To get access to TC Bot you should register as TeamCity
> user
> > > >> [1], if
> > > >> >> > you
> > > >> >> > >> > didn't do this already.
> > > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page
> with
> > > >> same
> > > >> >> > >> > credentials.
> > > >> >> > >> >
> > > >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> > > >> >> > >> >
> > > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> > > shuliga@gmail.com
> > > >> >
> > > >> >> > wrote:
> > > >> >> > >> >
> > > >> >> > >> >> Andrew,
> > > >> >> > >> >>
> > > >> >> > >> >> I have corrected PR according to your notes. Please
> review.
> > > >> >> > >> >> What will be the next steps in order to merge in?
> > > >> >> > >> >>
> > > >> >> > >> >> Y.
> > > >> >> > >> >>
> > > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > > >> >> >  andrey.mashenkov@gmail.com >
> > > >> >> > >> >> пише:
> > > >> >> > >> >>
> > > >> >> > >> >> > Yuri,
> > > >> >> > >> >> >
> > > >> >> > >> >> > I've done with review.
> > > >> >> > >> >> > No crime found, but trivial compatibility bug.
> > > >> >> > >> >> >
> > > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> > > >>  shuliga@gmail.com >
> > > >> >> > >> wrote:
> > > >> >> > >> >> >
> > > >> >> > >> >> > > Denis,
> > > >> >> > >> >> > >
> > > >> >> > >> >> > > Thank you for your attention to this.
> > > >> >> > >> >> > > as for now, the
> > > >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> > > >> >> > >> >> > ticket
> > > >> >> > >> >> > > is still pending review.
> > > >> >> > >> >> > > Do we have a chance to move it forward somehow?
> > > >> >> > >> >> > >
> > > >> >> > >> >> > > BR,
> > > >> >> > >> >> > > Yuriy Shuliha
> > > >> >> > >> >> > >
> > > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> > > dmagda@apache.org >
> > > >> пише:
> > > >> >> > >> >> > >
> > > >> >> > >> >> > > > Yuriy,
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > > I've seen you opening a pull-request with the first
> > > >> changes:
> > > >> >> > >> >> > > >
> https://issues.apache.org/jira/browse/IGNITE-12189
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to
> do
> > > the
> > > >> >> > review?
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > > -
> > > >> >> > >> >> > > > Denis
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > > >> >> > >>  vololo100@gmail.com >
> > > >> >> > >> >> > > wrote:
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > > > > Yuriy,
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > Thank you for providing details! Quite
> interesting.
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > Yes, we already have support of distributed
> limit and
> > > >> >> merging
> > > >> >> > >> >> sorted
> > > >> >> > >> >> > > > > subresults for SQL queries. E.g.
> ReduceIndexSorted
> > > and
> > > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted
> > > streams.
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > Could you please also clarify about
> score/relevance?
> > > Is
> > > >> it
> > > >> >> > >> >> provided
> > > >> >> > >> >> > by
> > > >> >> > >> >> > > > > Lucene engine for each query result? I am
> thinking
> > > how
> > > >> to
> > > >> >> do
> > > >> >> > >> >> sorted
> > > >> >> > >> >> > > > > merge properly in this case.
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> > > >> >> >  shuliga@gmail.com
> > > >> >> > >> >:
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Ivan,
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Thank you for interesting question!
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Text searches (or full text searches) are
> mostly
> > > >> >> > >> human-oriented.
> > > >> >> > >> >> > And
> > > >> >> > >> >> > > > the
> > > >> >> > >> >> > > > > > point of user's interest is topmost part of
> > > response.
> > > >> >> > >> >> > > > > > Then user can read it, evaluate and use the
> given
> > > >> records
> > > >> >> > for
> > > >> >> > >> >> > further
> > > >> >> > >> >> > > > > > purposes.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> > > operations
> > > >> >> with
> > > >> >> > >> >> > financial
> > > >> >> > >> >> > > > > data,
> > > >> >> > >> >> > > > > > and there lots of text stuff like assets names,
> > > fin.
> > > >> >> > >> >> instruments,
> > > >> >> > >> >> > > > > companies
> > > >> >> > >> >> > > > > > etc.
> > > >> >> > >> >> > > > > > In order to operate with this quickly and
> reliably,
> > > >> users
> > > >> >> > >> used
> > > >> >> > >> >> to
> > > >> >> > >> >> > > work
> > > >> >> > >> >> > > > > with
> > > >> >> > >> >> > > > > > text search, type-ahead completions,
> suggestions.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > For this purposes we are indexing particular
> string
> > > >> data
> > > >> >> in
> > > >> >> > >> >> > separate
> > > >> >> > >> >> > > > > caches.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Sorting capabilities and response size
> limitations
> > > are
> > > >> >> very
> > > >> >> > >> >> > important
> > > >> >> > >> >> > > > > > there. As our API have to provide most relevant
> > > >> >> information
> > > >> >> > >> in
> > > >> >> > >> >> view
> > > >> >> > >> >> > > of
> > > >> >> > >> >> > > > > > limited size.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene
> perspective.
> > > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> > > >> >> > >> *TopDocs.scoresDocs
> > > >> >> > >> >> > > *already
> > > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> > > >> documents
> > > >> >> > >> are on
> > > >> >> > >> >> > the
> > > >> >> > >> >> > > > top.
> > > >> >> > >> >> > > > > > And currently distributed queries responses
> from
> > > >> >> different
> > > >> >> > >> nodes
> > > >> >> > >> >> > are
> > > >> >> > >> >> > > > > merged
> > > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> > > >> >> > >> >> > > > > > So in fact we already have the score order
> ruined
> > > >> here.
> > > >> >> > Also
> > > >> >> > >> >> Ignite
> > > >> >> > >> >> > > > > > requests all possible documents from Lucene
> that is
> > > >> >> > redundant
> > > >> >> > >> >> and
> > > >> >> > >> >> > not
> > > >> >> > >> >> > > > > good
> > > >> >> > >> >> > > > > > for performance.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part
> of
> > > >> >> *TextQuery
> > > >> >> > >> *and
> > > >> >> > >> >> > have
> > > >> >> > >> >> > > > to
> > > >> >> > >> >> > > > > > notice that we still have to add sorting for
> text
> > > >> queries
> > > >> >> > >> >> > processing
> > > >> >> > >> >> > > in
> > > >> >> > >> >> > > > > > order to have applicable results.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > *Limit* parameter itself should improve the
> part of
> > > >> >> issues
> > > >> >> > >> from
> > > >> >> > >> >> > > above,
> > > >> >> > >> >> > > > > but
> > > >> >> > >> >> > > > > > definitely, sorting by document score at least
> > > should
> > > >> be
> > > >> >> > >> >> > implemented
> > > >> >> > >> >> > > > > along
> > > >> >> > >> >> > > > > > with limit.
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > This is a pretty short commentary if you still
> have
> > > >> any
> > > >> >> > >> >> questions,
> > > >> >> > >> >> > > > please
> > > >> >> > >> >> > > > > > ask, do not hesitate)
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > BR,
> > > >> >> > >> >> > > > > > Yuriy Shuliha
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> > > >> >> >  vololo100@gmail.com >
> > > >> >> > >> >> пише:
> > > >> >> > >> >> > > > > >
> > > >> >> > >> >> > > > > > > Yuriy,
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > > Greatly appreciate your interest.
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > > Could you please elaborate a little bit about
> > > >> sorting?
> > > >> >> > What
> > > >> >> > >> >> tasks
> > > >> >> > >> >> > > > does
> > > >> >> > >> >> > > > > > > it help to solve and how? It would be great
> to
> > > >> provide
> > > >> >> an
> > > >> >> > >> >> > example.
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei
> Scherbakov <
> > > >> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > Denis,
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > I like the idea of throwing an exception
> for
> > > >> enabled
> > > >> >> > text
> > > >> >> > >> >> > queries
> > > >> >> > >> >> > > > on
> > > >> >> > >> >> > > > > > > > persistent caches.
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for
> unsorted
> > > >> >> > searches.
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> > > >> >> > >>  dmagda@apache.org
> > > >> >> > >> >> >:
> > > >> >> > >> >> > > > > > > >
> > > >> >> > >> >> > > > > > > > > Igniters,
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal
> in
> > > >> regards
> > > >> >> > >> >> full-text
> > > >> >> > >> >> > > > > search
> > > >> >> > >> >> > > > > > > API
> > > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to
> push it
> > > >> >> > forward.
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes
> > > total
> > > >> >> sense
> > > >> >> > >> for
> > > >> >> > >> >> > > > in-memory
> > > >> >> > >> >> > > > > data
> > > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data
> of
> > > an
> > > >> >> > >> underlying
> > > >> >> > >> >> DB
> > > >> >> > >> >> > > like
> > > >> >> > >> >> > > > > > > Postgres.
> > > >> >> > >> >> > > > > > > > > As part of the changes, I would simply
> throw
> > > an
> > > >> >> > >> exception
> > > >> >> > >> >> (by
> > > >> >> > >> >> > > > > default)
> > > >> >> > >> >> > > > > > > if
> > > >> >> > >> >> > > > > > > > > the one attempts to use text indices
> with the
> > > >> >> native
> > > >> >> > >> >> > > persistence
> > > >> >> > >> >> > > > > > > enabled.
> > > >> >> > >> >> > > > > > > > > If the person is ready to live with that
> > > >> limitation
> > > >> >> > >> that
> > > >> >> > >> >> an
> > > >> >> > >> >> > > > > explicit
> > > >> >> > >> >> > > > > > > > > configuration change is needed to come
> around
> > > >> the
> > > >> >> > >> >> exception.
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > Thoughts?
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > -
> > > >> >> > >> >> > > > > > > > > Denis
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> > > Shuliga <
> > > >> >> > >> >> > >  shuliga@gmail.com
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > > > wrote:
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > Hello to all again,
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > Thank you for important comments and
> notes
> > > >> given
> > > >> >> > >> below!
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > Let me answer and continue the
> discussion.
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > Alexei has referenced to
> > > >> >> > >> >> > > > > > > > > >
> > > >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > >> >> > >> where
> > > >> >> > >> >> > > > > > > > > > absence of index persistence was
> declared
> > > as
> > > >> an
> > > >> >> > >> >> obstacle to
> > > >> >> > >> >> > > > > further
> > > >> >> > >> >> > > > > > > > > > development.
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> > > >> valid.b)
> > > >> >> > >> There
> > > >> >> > >> >> are
> > > >> >> > >> >> > > > > definite
> > > >> >> > >> >> > > > > > > needs
> > > >> >> > >> >> > > > > > > > > > (and in our project as well) in just
> > > in-memory
> > > >> >> > >> indexing
> > > >> >> > >> >> of
> > > >> >> > >> >> > > > > selected
> > > >> >> > >> >> > > > > > > data.
> > > >> >> > >> >> > > > > > > > > > We intend to use search capabilities
> for
> > > >> fetching
> > > >> >> > >> >> limited
> > > >> >> > >> >> > > > amount
> > > >> >> > >> >> > > > > of
> > > >> >> > >> >> > > > > > > > > records
> > > >> >> > >> >> > > > > > > > > > that should be used in type-ahead
> search /
> > > >> >> > >> suggestions.
> > > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed
> and the
> > > >> are
> > > >> >> no
> > > >> >> > >> need
> > > >> >> > >> >> in
> > > >> >> > >> >> > > > Lucene
> > > >> >> > >> >> > > > > > > index
> > > >> >> > >> >> > > > > > > > > to
> > > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> > > pattern of
> > > >> >> > >> >> text-search
> > > >> >> > >> >> > > > usage.
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> > > >> implementation.
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> > > *(*offset*
> > > >> >> > seems
> > > >> >> > >> to
> > > >> >> > >> >> be
> > > >> >> > >> >> > > not
> > > >> >> > >> >> > > > > > > required
> > > >> >> > >> >> > > > > > > > > in
> > > >> >> > >> >> > > > > > > > > > text-search tasks for now)
> > > >> >> > >> >> > > > > > > > > > I have investigated the data flow for
> > > >> distributed
> > > >> >> > >> text
> > > >> >> > >> >> > > queries.
> > > >> >> > >> >> > > > > it
> > > >> >> > >> >> > > > > > > was
> > > >> >> > >> >> > > > > > > > > > simple test prefix query, like
> > > 'name'*='ene*'*
> > > >> >> > >> >> > > > > > > > > > For now each server-node returns all
> > > response
> > > >> >> > >> records to
> > > >> >> > >> >> > the
> > > >> >> > >> >> > > > > > > client-node
> > > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> > > >> thousands
> > > >> >> > >> >> records.
> > > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100.
> Again,
> > > all
> > > >> >> the
> > > >> >> > >> >> results
> > > >> >> > >> >> > > are
> > > >> >> > >> >> > > > > added
> > > >> >> > >> >> > > > > > > to
> > > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> > > >> arbitrary
> > > >> >> > >> order
> > > >> >> > >> >> by
> > > >> >> > >> >> > > > pages.
> > > >> >> > >> >> > > > > > > > > > I did not find here any means to
> deliver
> > > >> >> > >> deterministic
> > > >> >> > >> >> > > result.
> > > >> >> > >> >> > > > > > > > > > So implementing limit as part of query
> and
> > > >> >> > >> >> > > > > (GridCacheQueryRequest)
> > > >> >> > >> >> > > > > > > will
> > > >> >> > >> >> > > > > > > > > not
> > > >> >> > >> >> > > > > > > > > > change the nature of response but will
> > > limit
> > > >> load
> > > >> >> > on
> > > >> >> > >> >> nodes
> > > >> >> > >> >> > > and
> > > >> >> > >> >> > > > > > > > > networking.
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for
> this?
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> > > >> exposition
> > > >> >> to
> > > >> >> > >> >> Ignite
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > a) Sorting
> > > >> >> > >> >> > > > > > > > > > The solution for this could be:
> > > >> >> > >> >> > > > > > > > > > - Make entities comparable
> > > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> > > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted
> fields for
> > > >> >> Lucene
> > > >> >> > >> >> indexing
> > > >> >> > >> >> > > > > > > > > > - Use comparators when merging
> responses or
> > > >> >> > reducing
> > > >> >> > >> to
> > > >> >> > >> >> > > desired
> > > >> >> > >> >> > > > > > > limit on
> > > >> >> > >> >> > > > > > > > > > client node.
> > > >> >> > >> >> > > > > > > > > > Will require full result set to be
> loaded
> > > into
> > > >> >> > >> memory.
> > > >> >> > >> >> > Though
> > > >> >> > >> >> > > > > can be
> > > >> >> > >> >> > > > > > > used
> > > >> >> > >> >> > > > > > > > > > for relatively small limits.
> > > >> >> > >> >> > > > > > > > > > BR,
> > > >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> > > Scherbakov <
> > > >> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> > > >> >> > >> >> > > > > > > > > > пише:
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > Yuriy,
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for
> text
> > > >> >> queries
> > > >> >> > is
> > > >> >> > >> >> [1]
> > > >> >> > >> >> > > which
> > > >> >> > >> >> > > > > makes
> > > >> >> > >> >> > > > > > > > > > lucene
> > > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and
> > > main
> > > >> >> reason
> > > >> >> > >> for
> > > >> >> > >> >> > > > > > > discontinuation.
> > > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed
> first
> > > to
> > > >> make
> > > >> >> > >> text
> > > >> >> > >> >> > > queries
> > > >> >> > >> >> > > > a
> > > >> >> > >> >> > > > > > > valid
> > > >> >> > >> >> > > > > > > > > > > product feature.
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> > > querying is
> > > >> >> > indeed
> > > >> >> > >> >> not a
> > > >> >> > >> >> > > > > trivial
> > > >> >> > >> >> > > > > > > task.
> > > >> >> > >> >> > > > > > > > > > > Some kind of merging must be
> implemented
> > > on
> > > >> >> query
> > > >> >> > >> >> > > originating
> > > >> >> > >> >> > > > > node.
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > [1]
> > > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis
> Magda
> > > <
> > > >> >> > >> >> > >  dmagda@apache.org
> > > >> >> > >> >> > > > >:
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > Yuriy,
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> > > >> full-text
> > > >> >> > >> search
> > > >> >> > >> >> > > indexes
> > > >> >> > >> >> > > > > then
> > > >> >> > >> >> > > > > > > > > please
> > > >> >> > >> >> > > > > > > > > > go
> > > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> > > >> community
> > > >> >> > >> wants to
> > > >> >> > >> >> > > > > discontinue
> > > >> >> > >> >> > > > > > > them
> > > >> >> > >> >> > > > > > > > > > > first
> > > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later)
> are
> > > the
> > > >> >> > >> limitations
> > > >> >> > >> >> > > listed
> > > >> >> > >> >> > > > > by
> > > >> >> > >> >> > > > > > > Andrey
> > > >> >> > >> >> > > > > > > > > > and
> > > >> >> > >> >> > > > > > > > > > > > minimal support from the community
> end.
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > -
> > > >> >> > >> >> > > > > > > > > > > > Denis
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM
> Andrey
> > > >> >> > Mashenkov
> > > >> >> > >> <
> > > >> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> > > >> >> > >> >> > > > > > > > > > > > wrote:
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan
> to
> > > >> >> > discontinue
> > > >> >> > >> >> > > > TextQueries
> > > >> >> > >> >> > > > > in
> > > >> >> > >> >> > > > > > > > > Ignite
> > > >> >> > >> >> > > > > > > > > > > [1].
> > > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes
> are
> > > not
> > > >> >> > >> >> persistent,
> > > >> >> > >> >> > not
> > > >> >> > >> >> > > > > > > > > transactional
> > > >> >> > >> >> > > > > > > > > > > and
> > > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL
> or
> > > >> inside
> > > >> >> > SQL.
> > > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest
> from
> > > >> >> > community
> > > >> >> > >> >> side.
> > > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these
> > > issues
> > > >> and
> > > >> >> > >> make
> > > >> >> > >> >> > > > > TextQueries
> > > >> >> > >> >> > > > > > > great.
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to
> limit
> > > >> >> > resultset.
> > > >> >> > >> >> > > > > > > > > > > > > Query results return from data
> node
> > > to
> > > >> >> > >> client-side
> > > >> >> > >> >> > > cursor
> > > >> >> > >> >> > > > > in
> > > >> >> > >> >> > > > > > > > > > > page-by-page
> > > >> >> > >> >> > > > > > > > > > > > > manner and
> > > >> >> > >> >> > > > > > > > > > > > > this parameter is designed
> control
> > > page
> > > >> >> size.
> > > >> >> > >> It
> > > >> >> > >> >> is
> > > >> >> > >> >> > > > > supposed
> > > >> >> > >> >> > > > > > > query
> > > >> >> > >> >> > > > > > > > > > > > executes
> > > >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> > > >> >> > >> >> > > > > > > > > > > > > it is not excepted full
> resultset be
> > > >> loaded
> > > >> >> > to
> > > >> >> > >> >> memory
> > > >> >> > >> >> > > on
> > > >> >> > >> >> > > > > server
> > > >> >> > >> >> > > > > > > > > side
> > > >> >> > >> >> > > > > > > > > > at
> > > >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> > > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load
> > > entire
> > > >> >> > >> resultset
> > > >> >> > >> >> > into
> > > >> >> > >> >> > > > > memory
> > > >> >> > >> >> > > > > > > > > before
> > > >> >> > >> >> > > > > > > > > > > > first
> > > >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should
> be
> > > >> added
> > > >> >> to
> > > >> >> > >> limit
> > > >> >> > >> >> > > > result.
> > > >> >> > >> >> > > > > The
> > > >> >> > >> >> > > > > > > best
> > > >> >> > >> >> > > > > > > > > > > > > solution is to use query language
> > > >> commands
> > > >> >> > for
> > > >> >> > >> >> this,
> > > >> >> > >> >> > > e.g.
> > > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> > > >> >> > >> >> > > > > > > > > > > > in
> > > >> >> > >> >> > > > > > > > > > > > > SQL.
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial.
> > > Query is
> > > >> >> > >> >> distributed
> > > >> >> > >> >> > > > > operation
> > > >> >> > >> >> > > > > > > and
> > > >> >> > >> >> > > > > > > > > > same
> > > >> >> > >> >> > > > > > > > > > > > > user query will be executed on
> data
> > > >> nodes
> > > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes
> > > should
> > > >> be
> > > >> >> > >> correcly
> > > >> >> > >> >> > > merged
> > > >> >> > >> >> > > > > > > before
> > > >> >> > >> >> > > > > > > > > > being
> > > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> > > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on
> every
> > > >> node
> > > >> >> and
> > > >> >> > >> >> then on
> > > >> >> > >> >> > > > merge
> > > >> >> > >> >> > > > > > > phase.
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos,
> > > limiting
> > > >> >> > results
> > > >> >> > >> >> make
> > > >> >> > >> >> > no
> > > >> >> > >> >> > > > > sence
> > > >> >> > >> >> > > > > > > > > without
> > > >> >> > >> >> > > > > > > > > > > > > sorting,
> > > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every
> next
> > > >> query
> > > >> >> run
> > > >> >> > >> will
> > > >> >> > >> >> > > return
> > > >> >> > >> >> > > > > same
> > > >> >> > >> >> > > > > > > data
> > > >> >> > >> >> > > > > > > > > > > > because
> > > >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> > > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive
> > > results
> > > >> from
> > > >> >> > >> data
> > > >> >> > >> >> > nodes
> > > >> >> > >> >> > > > > > > > > asynchronously
> > > >> >> > >> >> > > > > > > > > > > and
> > > >> >> > >> >> > > > > > > > > > > > > messages from different nodes
> can't
> > > be
> > > >> >> > ordered.
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > 2.
> > > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> > > >> >> > @QueryTextFiled)
> > > >> >> > >> >> looks
> > > >> >> > >> >> > > more
> > > >> >> > >> >> > > > > > > verbose,
> > > >> >> > >> >> > > > > > > > > > > isn't
> > > >> >> > >> >> > > > > > > > > > > > > it.
> > > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed
> query?
> > > How
> > > >> >> > partial
> > > >> >> > >> >> > results
> > > >> >> > >> >> > > > from
> > > >> >> > >> >> > > > > > > nodes
> > > >> >> > >> >> > > > > > > > > > will
> > > >> >> > >> >> > > > > > > > > > > be
> > > >> >> > >> >> > > > > > > > > > > > > merged?
> > > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> > > >> comparator
> > > >> >> > for
> > > >> >> > >> >> data
> > > >> >> > >> >> > > > > sorting?
> > > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should
> choose
> > > to
> > > >> >> sort
> > > >> >> > >> >> result
> > > >> >> > >> >> > on
> > > >> >> > >> >> > > > > merge
> > > >> >> > >> >> > > > > > > phase?
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> > > >> >> configurable
> > > >> >> > at
> > > >> >> > >> >> all.
> > > >> >> > >> >> > > E.g.
> > > >> >> > >> >> > > > > it is
> > > >> >> > >> >> > > > > > > > > > > > impossible
> > > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> > > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> > > >> configure
> > > >> >> > >> engine
> > > >> >> > >> >> at
> > > >> >> > >> >> > > > first
> > > >> >> > >> >> > > > > and
> > > >> >> > >> >> > > > > > > only
> > > >> >> > >> >> > > > > > > > > > > then
> > > >> >> > >> >> > > > > > > > > > > > go
> > > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement
> complex
> > > >> >> > features,
> > > >> >> > >> >> > > > > > > > > > > > > that may depends on engine
> config.
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM
> Yuriy
> > > >> >> > Shuliga <
> > > >> >> > >> >> > > > > > >  shuliga@gmail.com >
> > > >> >> > >> >> > > > > > > > > > > wrote:
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > Dear community,
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd
> like to
> > > >> open
> > > >> >> > >> >> discussion
> > > >> >> > >> >> > > that
> > > >> >> > >> >> > > > > would
> > > >> >> > >> >> > > > > > > > > come
> > > >> >> > >> >> > > > > > > > > > to
> > > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj.
> area.
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing
> capabilities,
> > > >> backed
> > > >> >> up
> > > >> >> > >> by
> > > >> >> > >> >> > > > different
> > > >> >> > >> >> > > > > > > > > > mechanisms,
> > > >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used
> > > (past
> > > >> >> year
> > > >> >> > >> >> > release).
> > > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and
> mature
> > > >> >> technology
> > > >> >> > >> that
> > > >> >> > >> >> > > covers
> > > >> >> > >> >> > > > > text
> > > >> >> > >> >> > > > > > > > > search
> > > >> >> > >> >> > > > > > > > > > > > area
> > > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> > > >> indexing).
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more
> Lucene
> > > >> >> > >> functionality
> > > >> >> > >> >> to
> > > >> >> > >> >> > > > Ignite
> > > >> >> > >> >> > > > > > > > > indexing
> > > >> >> > >> >> > > > > > > > > > > and
> > > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text
> data*.
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at
> > > current
> > > >> >> stage.
> > > >> >> > >> It
> > > >> >> > >> >> is
> > > >> >> > >> >> > > > coming
> > > >> >> > >> >> > > > > > > from our
> > > >> >> > >> >> > > > > > > > > > > > > project's
> > > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be
> > > useful
> > > >> for
> > > >> >> a
> > > >> >> > >> lot
> > > >> >> > >> >> more
> > > >> >> > >> >> > > > > people.
> > > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or
> > > discuss
> > > >> >> > about
> > > >> >> > >> >> Jira
> > > >> >> > >> >> > > > > tickets for
> > > >> >> > >> >> > > > > > > > > them.
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> > > >> dataQuery.getPageSize()
> > > >> >> > to
> > > >> >> > >> >> limit
> > > >> >> > >> >> > > > search
> > > >> >> > >> >> > > > > > > > > response
> > > >> >> > >> >> > > > > > > > > > > > items
> > > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> > > >> Currently
> > > >> >> > it
> > > >> >> > >> is
> > > >> >> > >> >> > > calling
> > > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> > > >> >> > >> >> *Integer.MAX_VALUE*) -
> > > >> >> > >> >> > so
> > > >> >> > >> >> > > > > > > basically
> > > >> >> > >> >> > > > > > > > > all
> > > >> >> > >> >> > > > > > > > > > > > > scored
> > > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what
> we
> > > do
> > > >> not
> > > >> >> > >> need in
> > > >> >> > >> >> > most
> > > >> >> > >> >> > > > > cases.
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then
> more
> > > >> >> capable
> > > >> >> > >> >> search
> > > >> >> > >> >> > > call
> > > >> >> > >> >> > > > > can be
> > > >> >> > >> >> > > > > > > > > > > > > > executed:
> > > *IndexSearcher.search(query,
> > > >> >> > count,
> > > >> >> > >> >> > > > > > > > > > > > > > sort) *
> > > >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> > > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean
> *sortField*
> > > >> >> parameter
> > > >> >> > in
> > > >> >> > >> >> > > > > > > *@QueryTextFiled *
> > > >> >> > >> >> > > > > > > > > > > > > > annotation. If
> > > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be
> indexed
> > > but
> > > >> not
> > > >> >> > >> >> tokenized.
> > > >> >> > >> >> > > > > Number
> > > >> >> > >> >> > > > > > > types
> > > >> >> > >> >> > > > > > > > > > are
> > > >> >> > >> >> > > > > > > > > > > > > > preferred here.
> > > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> > > >> *TextQuery*
> > > >> >> > >> >> > constructor.
> > > >> >> > >> >> > > It
> > > >> >> > >> >> > > > > > > should
> > > >> >> > >> >> > > > > > > > > > define
> > > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for
> > > querying.
> > > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage
> in
> > > >> >> > >> >> > > > > GridLuceneIndex.query().
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex
> queries
> > > >> with
> > > >> >> > >> >> > *TextQuery*,
> > > >> >> > >> >> > > > > > > including
> > > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> > > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only,
> as
> > > >> >> requires
> > > >> >> > >> more
> > > >> >> > >> >> > > > detailed
> > > >> >> > >> >> > > > > > > work.
> > > >> >> > >> >> > > > > > > > > > Should
> > > >> >> > >> >> > > > > > > > > > > > be
> > > >> >> > >> >> > > > > > > > > > > > > > extended if community is
> > > interested in
> > > >> >> it.*
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your
> comments!
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > > BR,
> > > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> > > >> >> > >> >> > > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > > > --
> > > >> >> > >> >> > > > > > > > > > > > > Best regards,
> > > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> > > >> >> > >> >> > > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > --
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > > > Best regards,
> > > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> > > >> >> > >> >> > > > > > > > > > >
> > > >> >> > >> >> > > > > > > > > >
> > > >> >> > >> >> > > > > > > > >
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > > > > --
> > > >> >> > >> >> > > > > > > Best regards,
> > > >> >> > >> >> > > > > > > Ivan Pavlukhin
> > > >> >> > >> >> > > > > > >
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > > > --
> > > >> >> > >> >> > > > > Best regards,
> > > >> >> > >> >> > > > > Ivan Pavlukhin
> > > >> >> > >> >> > > > >
> > > >> >> > >> >> > > >
> > > >> >> > >> >> > >
> > > >> >> > >> >> >
> > > >> >> > >> >> >
> > > >> >> > >> >> > --
> > > >> >> > >> >> > Best regards,
> > > >> >> > >> >> > Andrey V. Mashenkov
> > > >> >> > >> >> >
> > > >> >> > >> >>
> > > >> >> > >> >
> > > >> >> > >> >
> > > >> >> > >> > --
> > > >> >> > >> > Best regards,
> > > >> >> > >> > Andrey V. Mashenkov
> > > >> >> > >> >
> > > >> >> > >>
> > > >> >> > >
> > > >> >> >
> > > >> >> > --
> > > >> >> > Best regards,
> > > >> >> > Andrey V. Mashenkov
> > > >> >> >
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >
> > >
> > >
> > >
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>
>

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Folks,

IEP is an Ignite-specific thing. In fact, I suppose that we are
already doing it in ASF way by having this dev-list discussion =)

As for me, implementing "limit" feature for text queries is not so big
to make an IEP. But we might need to create one for next features.

вт, 26 нояб. 2019 г. в 15:06, Ilya Kasnacheev <il...@gmail.com>:
>
> Hello!
>
> ASF way should probably start with an IEP :)
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
> >:
>
> >
> > Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> > functionality is helpful and PR it, why not ?
> >
> > isn`t it ?
> >
> > >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> > ilya.kasnacheev@gmail.com>:
> > >
> > >Hello!
> > >
> > >The problem here is that Solr is a multi-year effort by a lot of people.
> > We
> > >can't match that.
> > >
> > >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> > cache
> > >information into their storage for indexing and relying on their own
> > >mechanisms for distributed IR sorting?
> > >
> > >Regards,
> > >--
> > >Ilya Kasnacheev
> > >
> > >
> > >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> > arzamas123@mail.ru.invalid
> > >>:
> > >
> > >>
> > >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> > >>
> > >> thanks !
> > >>
> > >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> > >>  ilya.kasnacheev@gmail.com >:
> > >> >
> > >> >Hello!
> > >> >
> > >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> > >> into
> > >> >Apache Ignite. I think that's a lot of effort that is not very
> > justified.
> > >> >
> > >> >I don't think we should try to implement sorting in Apache Ignite,
> > because
> > >> >it is a lot of work, and a lot of code in our code base which we don't
> > >> >really want.
> > >> >
> > >> >Regards,
> > >> >--
> > >> >Ilya Kasnacheev
> > >> >
> > >> >
> > >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shuliga@gmail.com >:
> > >> >
> > >> >> Dear Igniters,
> > >> >>
> > >> >> The first part of TextQuery improvement - a result limit - was
> > developed
> > >> >> and merged.
> > >> >> Now we have to develop most important functionality here - proper
> > >> sorting
> > >> >> of Lucene index response and correct reducing of them for distributed
> > >> >> queries.
> > >> >>
> > >> >> *There are two Lucene based aspects*
> > >> >>
> > >> >> 1. In case of using no sorting fields, the documents in response are
> > >> still
> > >> >> ordered by relevance.
> > >> >> Actually this is ScoreDoc.score value.
> > >> >> In order to reduce the distributed results correctly, the score
> > should
> > >> be
> > >> >> passed with response.
> > >> >>
> > >> >> 2. When sorting by conventional fields, then Lucene should have these
> > >> >> fields properly indexed and
> > >> >> corresponding Sort object should be applied to Lucene's search call.
> > >> >> In order to mark those fields a new annotation like '@SortField' may
> > be
> > >> >> introduced.
> > >> >>
> > >> >> *Reducing on Ignite *
> > >> >>
> > >> >> The obvious point of distributed response reduction is class
> > >> >> GridCacheDistributedQueryFuture.
> > >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> > >> >> ReduceIndexSorted
> > >> >> What I see here, that it is tangled with H2 related classes (
> > >> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> > >> >>
> > >> >> Still need a support here.
> > >> >>
> > >> >> Overall, the goal of this letter is to initiate discussion on
> > TextQuery
> > >> >> Sorting implementation and come closer to ticket creation.
> > >> >>
> > >> >> BR,
> > >> >> Yuriy Shuliha
> > >> >>
> > >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> > andrey.mashenkov@gmail.com
> > >> >
> > >> >> пише:
> > >> >>
> > >> >> > Hi Dmitry, Yuriy.
> > >> >> >
> > >> >> > I've found GridCacheQueryFutureAdapter has newly added
> > AtomicInteger
> > >> >> > 'total' field and 'limit; field as primitive int.
> > >> >> >
> > >> >> > Both fields are used inside synchronized block only.
> > >> >> > So, we can make both private and downgrade AtomicInteger to
> > primitive
> > >> >> int.
> > >> >> >
> > >> >> > Most likely, these fields can be replaced with one field.
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> > dpavlov@apache.org
> > >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hi Andrey,
> > >> >> > >
> > >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> > (with
> > >> no
> > >> >> > > blockers).
> > >> >> > >
> > >> >> > > Do you have any concerns related to this patch?
> > >> >> > >
> > >> >> > > Sincerely,
> > >> >> > > Dmitriy Pavlov
> > >> >> > >
> > >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shuliga@gmail.com
> > >:
> > >> >> > >
> > >> >> > >> Andrey,
> > >> >> > >>
> > >> >> > >> Per you request, I created ticket
> > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> > >> >> > >>
> > >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> > >> >> > >>
> > >> >> > >> Could you please proceed with PR merge ?
> > >> >> > >>
> > >> >> > >> BR,
> > >> >> > >> Yuriy Shuliha
> > >> >> > >>
> > >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> > >>  andrey.mashenkov@gmail.com
> > >> >> >
> > >> >> > >> пише:
> > >> >> > >>
> > >> >> > >> > Hi Yuri,
> > >> >> > >> >
> > >> >> > >> > To get access to TC Bot you should register as TeamCity user
> > >> [1], if
> > >> >> > you
> > >> >> > >> > didn't do this already.
> > >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
> > >> same
> > >> >> > >> > credentials.
> > >> >> > >> >
> > >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> > >> >> > >> >
> > >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> > shuliga@gmail.com
> > >> >
> > >> >> > wrote:
> > >> >> > >> >
> > >> >> > >> >> Andrew,
> > >> >> > >> >>
> > >> >> > >> >> I have corrected PR according to your notes. Please review.
> > >> >> > >> >> What will be the next steps in order to merge in?
> > >> >> > >> >>
> > >> >> > >> >> Y.
> > >> >> > >> >>
> > >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> > >> >> >  andrey.mashenkov@gmail.com >
> > >> >> > >> >> пише:
> > >> >> > >> >>
> > >> >> > >> >> > Yuri,
> > >> >> > >> >> >
> > >> >> > >> >> > I've done with review.
> > >> >> > >> >> > No crime found, but trivial compatibility bug.
> > >> >> > >> >> >
> > >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> > >>  shuliga@gmail.com >
> > >> >> > >> wrote:
> > >> >> > >> >> >
> > >> >> > >> >> > > Denis,
> > >> >> > >> >> > >
> > >> >> > >> >> > > Thank you for your attention to this.
> > >> >> > >> >> > > as for now, the
> > >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > >> >> > ticket
> > >> >> > >> >> > > is still pending review.
> > >> >> > >> >> > > Do we have a chance to move it forward somehow?
> > >> >> > >> >> > >
> > >> >> > >> >> > > BR,
> > >> >> > >> >> > > Yuriy Shuliha
> > >> >> > >> >> > >
> > >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> > dmagda@apache.org >
> > >> пише:
> > >> >> > >> >> > >
> > >> >> > >> >> > > > Yuriy,
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > I've seen you opening a pull-request with the first
> > >> changes:
> > >> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do
> > the
> > >> >> > review?
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > -
> > >> >> > >> >> > > > Denis
> > >> >> > >> >> > > >
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> > >> >> > >>  vololo100@gmail.com >
> > >> >> > >> >> > > wrote:
> > >> >> > >> >> > > >
> > >> >> > >> >> > > > > Yuriy,
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Thank you for providing details! Quite interesting.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Yes, we already have support of distributed limit and
> > >> >> merging
> > >> >> > >> >> sorted
> > >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted
> > and
> > >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted
> > streams.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > Could you please also clarify about score/relevance?
> > Is
> > >> it
> > >> >> > >> >> provided
> > >> >> > >> >> > by
> > >> >> > >> >> > > > > Lucene engine for each query result? I am thinking
> > how
> > >> to
> > >> >> do
> > >> >> > >> >> sorted
> > >> >> > >> >> > > > > merge properly in this case.
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> > >> >> >  shuliga@gmail.com
> > >> >> > >> >:
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Ivan,
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Thank you for interesting question!
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Text searches (or full text searches) are mostly
> > >> >> > >> human-oriented.
> > >> >> > >> >> > And
> > >> >> > >> >> > > > the
> > >> >> > >> >> > > > > > point of user's interest is topmost part of
> > response.
> > >> >> > >> >> > > > > > Then user can read it, evaluate and use the given
> > >> records
> > >> >> > for
> > >> >> > >> >> > further
> > >> >> > >> >> > > > > > purposes.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> > operations
> > >> >> with
> > >> >> > >> >> > financial
> > >> >> > >> >> > > > > data,
> > >> >> > >> >> > > > > > and there lots of text stuff like assets names,
> > fin.
> > >> >> > >> >> instruments,
> > >> >> > >> >> > > > > companies
> > >> >> > >> >> > > > > > etc.
> > >> >> > >> >> > > > > > In order to operate with this quickly and reliably,
> > >> users
> > >> >> > >> used
> > >> >> > >> >> to
> > >> >> > >> >> > > work
> > >> >> > >> >> > > > > with
> > >> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > For this purposes we are indexing particular string
> > >> data
> > >> >> in
> > >> >> > >> >> > separate
> > >> >> > >> >> > > > > caches.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Sorting capabilities and response size limitations
> > are
> > >> >> very
> > >> >> > >> >> > important
> > >> >> > >> >> > > > > > there. As our API have to provide most relevant
> > >> >> information
> > >> >> > >> in
> > >> >> > >> >> view
> > >> >> > >> >> > > of
> > >> >> > >> >> > > > > > limited size.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> > >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> > >> >> > >> *TopDocs.scoresDocs
> > >> >> > >> >> > > *already
> > >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> > >> documents
> > >> >> > >> are on
> > >> >> > >> >> > the
> > >> >> > >> >> > > > top.
> > >> >> > >> >> > > > > > And currently distributed queries responses from
> > >> >> different
> > >> >> > >> nodes
> > >> >> > >> >> > are
> > >> >> > >> >> > > > > merged
> > >> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> > >> >> > >> >> > > > > > So in fact we already have the score order ruined
> > >> here.
> > >> >> > Also
> > >> >> > >> >> Ignite
> > >> >> > >> >> > > > > > requests all possible documents from Lucene that is
> > >> >> > redundant
> > >> >> > >> >> and
> > >> >> > >> >> > not
> > >> >> > >> >> > > > > good
> > >> >> > >> >> > > > > > for performance.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
> > >> >> *TextQuery
> > >> >> > >> *and
> > >> >> > >> >> > have
> > >> >> > >> >> > > > to
> > >> >> > >> >> > > > > > notice that we still have to add sorting for text
> > >> queries
> > >> >> > >> >> > processing
> > >> >> > >> >> > > in
> > >> >> > >> >> > > > > > order to have applicable results.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
> > >> >> issues
> > >> >> > >> from
> > >> >> > >> >> > > above,
> > >> >> > >> >> > > > > but
> > >> >> > >> >> > > > > > definitely, sorting by document score at least
> > should
> > >> be
> > >> >> > >> >> > implemented
> > >> >> > >> >> > > > > along
> > >> >> > >> >> > > > > > with limit.
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > This is a pretty short commentary if you still have
> > >> any
> > >> >> > >> >> questions,
> > >> >> > >> >> > > > please
> > >> >> > >> >> > > > > > ask, do not hesitate)
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > BR,
> > >> >> > >> >> > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> > >> >> >  vololo100@gmail.com >
> > >> >> > >> >> пише:
> > >> >> > >> >> > > > > >
> > >> >> > >> >> > > > > > > Yuriy,
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > Greatly appreciate your interest.
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > Could you please elaborate a little bit about
> > >> sorting?
> > >> >> > What
> > >> >> > >> >> tasks
> > >> >> > >> >> > > > does
> > >> >> > >> >> > > > > > > it help to solve and how? It would be great to
> > >> provide
> > >> >> an
> > >> >> > >> >> > example.
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> > >> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Denis,
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > I like the idea of throwing an exception for
> > >> enabled
> > >> >> > text
> > >> >> > >> >> > queries
> > >> >> > >> >> > > > on
> > >> >> > >> >> > > > > > > > persistent caches.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
> > >> >> > searches.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> > >> >> > >>  dmagda@apache.org
> > >> >> > >> >> >:
> > >> >> > >> >> > > > > > > >
> > >> >> > >> >> > > > > > > > > Igniters,
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
> > >> regards
> > >> >> > >> >> full-text
> > >> >> > >> >> > > > > search
> > >> >> > >> >> > > > > > > API
> > >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
> > >> >> > forward.
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes
> > total
> > >> >> sense
> > >> >> > >> for
> > >> >> > >> >> > > > in-memory
> > >> >> > >> >> > > > > data
> > >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of
> > an
> > >> >> > >> underlying
> > >> >> > >> >> DB
> > >> >> > >> >> > > like
> > >> >> > >> >> > > > > > > Postgres.
> > >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw
> > an
> > >> >> > >> exception
> > >> >> > >> >> (by
> > >> >> > >> >> > > > > default)
> > >> >> > >> >> > > > > > > if
> > >> >> > >> >> > > > > > > > > the one attempts to use text indices with the
> > >> >> native
> > >> >> > >> >> > > persistence
> > >> >> > >> >> > > > > > > enabled.
> > >> >> > >> >> > > > > > > > > If the person is ready to live with that
> > >> limitation
> > >> >> > >> that
> > >> >> > >> >> an
> > >> >> > >> >> > > > > explicit
> > >> >> > >> >> > > > > > > > > configuration change is needed to come around
> > >> the
> > >> >> > >> >> exception.
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > Thoughts?
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > -
> > >> >> > >> >> > > > > > > > > Denis
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> > Shuliga <
> > >> >> > >> >> > >  shuliga@gmail.com
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > > > wrote:
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Hello to all again,
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Thank you for important comments and notes
> > >> given
> > >> >> > >> below!
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Alexei has referenced to
> > >> >> > >> >> > > > > > > > > >
> > >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > >> >> > >> where
> > >> >> > >> >> > > > > > > > > > absence of index persistence was declared
> > as
> > >> an
> > >> >> > >> >> obstacle to
> > >> >> > >> >> > > > > further
> > >> >> > >> >> > > > > > > > > > development.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> > >> valid.b)
> > >> >> > >> There
> > >> >> > >> >> are
> > >> >> > >> >> > > > > definite
> > >> >> > >> >> > > > > > > needs
> > >> >> > >> >> > > > > > > > > > (and in our project as well) in just
> > in-memory
> > >> >> > >> indexing
> > >> >> > >> >> of
> > >> >> > >> >> > > > > selected
> > >> >> > >> >> > > > > > > data.
> > >> >> > >> >> > > > > > > > > > We intend to use search capabilities for
> > >> fetching
> > >> >> > >> >> limited
> > >> >> > >> >> > > > amount
> > >> >> > >> >> > > > > of
> > >> >> > >> >> > > > > > > > > records
> > >> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
> > >> >> > >> suggestions.
> > >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
> > >> are
> > >> >> no
> > >> >> > >> need
> > >> >> > >> >> in
> > >> >> > >> >> > > > Lucene
> > >> >> > >> >> > > > > > > index
> > >> >> > >> >> > > > > > > > > to
> > >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> > pattern of
> > >> >> > >> >> text-search
> > >> >> > >> >> > > > usage.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> > >> implementation.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> > *(*offset*
> > >> >> > seems
> > >> >> > >> to
> > >> >> > >> >> be
> > >> >> > >> >> > > not
> > >> >> > >> >> > > > > > > required
> > >> >> > >> >> > > > > > > > > in
> > >> >> > >> >> > > > > > > > > > text-search tasks for now)
> > >> >> > >> >> > > > > > > > > > I have investigated the data flow for
> > >> distributed
> > >> >> > >> text
> > >> >> > >> >> > > queries.
> > >> >> > >> >> > > > > it
> > >> >> > >> >> > > > > > > was
> > >> >> > >> >> > > > > > > > > > simple test prefix query, like
> > 'name'*='ene*'*
> > >> >> > >> >> > > > > > > > > > For now each server-node returns all
> > response
> > >> >> > >> records to
> > >> >> > >> >> > the
> > >> >> > >> >> > > > > > > client-node
> > >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> > >> thousands
> > >> >> > >> >> records.
> > >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again,
> > all
> > >> >> the
> > >> >> > >> >> results
> > >> >> > >> >> > > are
> > >> >> > >> >> > > > > added
> > >> >> > >> >> > > > > > > to
> > >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> > >> arbitrary
> > >> >> > >> order
> > >> >> > >> >> by
> > >> >> > >> >> > > > pages.
> > >> >> > >> >> > > > > > > > > > I did not find here any means to deliver
> > >> >> > >> deterministic
> > >> >> > >> >> > > result.
> > >> >> > >> >> > > > > > > > > > So implementing limit as part of query and
> > >> >> > >> >> > > > > (GridCacheQueryRequest)
> > >> >> > >> >> > > > > > > will
> > >> >> > >> >> > > > > > > > > not
> > >> >> > >> >> > > > > > > > > > change the nature of response but will
> > limit
> > >> load
> > >> >> > on
> > >> >> > >> >> nodes
> > >> >> > >> >> > > and
> > >> >> > >> >> > > > > > > > > networking.
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> > >> exposition
> > >> >> to
> > >> >> > >> >> Ignite
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > a) Sorting
> > >> >> > >> >> > > > > > > > > > The solution for this could be:
> > >> >> > >> >> > > > > > > > > > - Make entities comparable
> > >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> > >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
> > >> >> Lucene
> > >> >> > >> >> indexing
> > >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
> > >> >> > reducing
> > >> >> > >> to
> > >> >> > >> >> > > desired
> > >> >> > >> >> > > > > > > limit on
> > >> >> > >> >> > > > > > > > > > client node.
> > >> >> > >> >> > > > > > > > > > Will require full result set to be loaded
> > into
> > >> >> > >> memory.
> > >> >> > >> >> > Though
> > >> >> > >> >> > > > > can be
> > >> >> > >> >> > > > > > > used
> > >> >> > >> >> > > > > > > > > > for relatively small limits.
> > >> >> > >> >> > > > > > > > > > BR,
> > >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> > Scherbakov <
> > >> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> > >> >> > >> >> > > > > > > > > > пише:
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Yuriy,
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
> > >> >> queries
> > >> >> > is
> > >> >> > >> >> [1]
> > >> >> > >> >> > > which
> > >> >> > >> >> > > > > makes
> > >> >> > >> >> > > > > > > > > > lucene
> > >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and
> > main
> > >> >> reason
> > >> >> > >> for
> > >> >> > >> >> > > > > > > discontinuation.
> > >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first
> > to
> > >> make
> > >> >> > >> text
> > >> >> > >> >> > > queries
> > >> >> > >> >> > > > a
> > >> >> > >> >> > > > > > > valid
> > >> >> > >> >> > > > > > > > > > > product feature.
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> > querying is
> > >> >> > indeed
> > >> >> > >> >> not a
> > >> >> > >> >> > > > > trivial
> > >> >> > >> >> > > > > > > task.
> > >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented
> > on
> > >> >> query
> > >> >> > >> >> > > originating
> > >> >> > >> >> > > > > node.
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > [1]
> > >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda
> > <
> > >> >> > >> >> > >  dmagda@apache.org
> > >> >> > >> >> > > > >:
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > Yuriy,
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> > >> full-text
> > >> >> > >> search
> > >> >> > >> >> > > indexes
> > >> >> > >> >> > > > > then
> > >> >> > >> >> > > > > > > > > please
> > >> >> > >> >> > > > > > > > > > go
> > >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> > >> community
> > >> >> > >> wants to
> > >> >> > >> >> > > > > discontinue
> > >> >> > >> >> > > > > > > them
> > >> >> > >> >> > > > > > > > > > > first
> > >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are
> > the
> > >> >> > >> limitations
> > >> >> > >> >> > > listed
> > >> >> > >> >> > > > > by
> > >> >> > >> >> > > > > > > Andrey
> > >> >> > >> >> > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > minimal support from the community end.
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > -
> > >> >> > >> >> > > > > > > > > > > > Denis
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
> > >> >> > Mashenkov
> > >> >> > >> <
> > >> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> > >> >> > >> >> > > > > > > > > > > > wrote:
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
> > >> >> > discontinue
> > >> >> > >> >> > > > TextQueries
> > >> >> > >> >> > > > > in
> > >> >> > >> >> > > > > > > > > Ignite
> > >> >> > >> >> > > > > > > > > > > [1].
> > >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are
> > not
> > >> >> > >> >> persistent,
> > >> >> > >> >> > not
> > >> >> > >> >> > > > > > > > > transactional
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
> > >> inside
> > >> >> > SQL.
> > >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
> > >> >> > community
> > >> >> > >> >> side.
> > >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these
> > issues
> > >> and
> > >> >> > >> make
> > >> >> > >> >> > > > > TextQueries
> > >> >> > >> >> > > > > > > great.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
> > >> >> > resultset.
> > >> >> > >> >> > > > > > > > > > > > > Query results return from data node
> > to
> > >> >> > >> client-side
> > >> >> > >> >> > > cursor
> > >> >> > >> >> > > > > in
> > >> >> > >> >> > > > > > > > > > > page-by-page
> > >> >> > >> >> > > > > > > > > > > > > manner and
> > >> >> > >> >> > > > > > > > > > > > > this parameter is designed control
> > page
> > >> >> size.
> > >> >> > >> It
> > >> >> > >> >> is
> > >> >> > >> >> > > > > supposed
> > >> >> > >> >> > > > > > > query
> > >> >> > >> >> > > > > > > > > > > > executes
> > >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> > >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
> > >> loaded
> > >> >> > to
> > >> >> > >> >> memory
> > >> >> > >> >> > > on
> > >> >> > >> >> > > > > server
> > >> >> > >> >> > > > > > > > > side
> > >> >> > >> >> > > > > > > > > > at
> > >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> > >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load
> > entire
> > >> >> > >> resultset
> > >> >> > >> >> > into
> > >> >> > >> >> > > > > memory
> > >> >> > >> >> > > > > > > > > before
> > >> >> > >> >> > > > > > > > > > > > first
> > >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
> > >> added
> > >> >> to
> > >> >> > >> limit
> > >> >> > >> >> > > > result.
> > >> >> > >> >> > > > > The
> > >> >> > >> >> > > > > > > best
> > >> >> > >> >> > > > > > > > > > > > > solution is to use query language
> > >> commands
> > >> >> > for
> > >> >> > >> >> this,
> > >> >> > >> >> > > e.g.
> > >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> > >> >> > >> >> > > > > > > > > > > > in
> > >> >> > >> >> > > > > > > > > > > > > SQL.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial.
> > Query is
> > >> >> > >> >> distributed
> > >> >> > >> >> > > > > operation
> > >> >> > >> >> > > > > > > and
> > >> >> > >> >> > > > > > > > > > same
> > >> >> > >> >> > > > > > > > > > > > > user query will be executed on data
> > >> nodes
> > >> >> > >> >> > > > > > > > > > > > > and then results from all nodes
> > should
> > >> be
> > >> >> > >> correcly
> > >> >> > >> >> > > merged
> > >> >> > >> >> > > > > > > before
> > >> >> > >> >> > > > > > > > > > being
> > >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> > >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
> > >> node
> > >> >> and
> > >> >> > >> >> then on
> > >> >> > >> >> > > > merge
> > >> >> > >> >> > > > > > > phase.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos,
> > limiting
> > >> >> > results
> > >> >> > >> >> make
> > >> >> > >> >> > no
> > >> >> > >> >> > > > > sence
> > >> >> > >> >> > > > > > > > > without
> > >> >> > >> >> > > > > > > > > > > > > sorting,
> > >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
> > >> query
> > >> >> run
> > >> >> > >> will
> > >> >> > >> >> > > return
> > >> >> > >> >> > > > > same
> > >> >> > >> >> > > > > > > data
> > >> >> > >> >> > > > > > > > > > > > because
> > >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> > >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive
> > results
> > >> from
> > >> >> > >> data
> > >> >> > >> >> > nodes
> > >> >> > >> >> > > > > > > > > asynchronously
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't
> > be
> > >> >> > ordered.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 2.
> > >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> > >> >> > @QueryTextFiled)
> > >> >> > >> >> looks
> > >> >> > >> >> > > more
> > >> >> > >> >> > > > > > > verbose,
> > >> >> > >> >> > > > > > > > > > > isn't
> > >> >> > >> >> > > > > > > > > > > > > it.
> > >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query?
> > How
> > >> >> > partial
> > >> >> > >> >> > results
> > >> >> > >> >> > > > from
> > >> >> > >> >> > > > > > > nodes
> > >> >> > >> >> > > > > > > > > > will
> > >> >> > >> >> > > > > > > > > > > be
> > >> >> > >> >> > > > > > > > > > > > > merged?
> > >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> > >> comparator
> > >> >> > for
> > >> >> > >> >> data
> > >> >> > >> >> > > > > sorting?
> > >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose
> > to
> > >> >> sort
> > >> >> > >> >> result
> > >> >> > >> >> > on
> > >> >> > >> >> > > > > merge
> > >> >> > >> >> > > > > > > phase?
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> > >> >> configurable
> > >> >> > at
> > >> >> > >> >> all.
> > >> >> > >> >> > > E.g.
> > >> >> > >> >> > > > > it is
> > >> >> > >> >> > > > > > > > > > > > impossible
> > >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> > >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> > >> configure
> > >> >> > >> engine
> > >> >> > >> >> at
> > >> >> > >> >> > > > first
> > >> >> > >> >> > > > > and
> > >> >> > >> >> > > > > > > only
> > >> >> > >> >> > > > > > > > > > > then
> > >> >> > >> >> > > > > > > > > > > > go
> > >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
> > >> >> > features,
> > >> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
> > >> >> > Shuliga <
> > >> >> > >> >> > > > > > >  shuliga@gmail.com >
> > >> >> > >> >> > > > > > > > > > > wrote:
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Dear community,
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
> > >> open
> > >> >> > >> >> discussion
> > >> >> > >> >> > > that
> > >> >> > >> >> > > > > would
> > >> >> > >> >> > > > > > > > > come
> > >> >> > >> >> > > > > > > > > > to
> > >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
> > >> backed
> > >> >> up
> > >> >> > >> by
> > >> >> > >> >> > > > different
> > >> >> > >> >> > > > > > > > > > mechanisms,
> > >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used
> > (past
> > >> >> year
> > >> >> > >> >> > release).
> > >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
> > >> >> technology
> > >> >> > >> that
> > >> >> > >> >> > > covers
> > >> >> > >> >> > > > > text
> > >> >> > >> >> > > > > > > > > search
> > >> >> > >> >> > > > > > > > > > > > area
> > >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> > >> indexing).
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
> > >> >> > >> functionality
> > >> >> > >> >> to
> > >> >> > >> >> > > > Ignite
> > >> >> > >> >> > > > > > > > > indexing
> > >> >> > >> >> > > > > > > > > > > and
> > >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at
> > current
> > >> >> stage.
> > >> >> > >> It
> > >> >> > >> >> is
> > >> >> > >> >> > > > coming
> > >> >> > >> >> > > > > > > from our
> > >> >> > >> >> > > > > > > > > > > > > project's
> > >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be
> > useful
> > >> for
> > >> >> a
> > >> >> > >> lot
> > >> >> > >> >> more
> > >> >> > >> >> > > > > people.
> > >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or
> > discuss
> > >> >> > about
> > >> >> > >> >> Jira
> > >> >> > >> >> > > > > tickets for
> > >> >> > >> >> > > > > > > > > them.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> > >> dataQuery.getPageSize()
> > >> >> > to
> > >> >> > >> >> limit
> > >> >> > >> >> > > > search
> > >> >> > >> >> > > > > > > > > response
> > >> >> > >> >> > > > > > > > > > > > items
> > >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> > >> Currently
> > >> >> > it
> > >> >> > >> is
> > >> >> > >> >> > > calling
> > >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> > >> >> > >> >> *Integer.MAX_VALUE*) -
> > >> >> > >> >> > so
> > >> >> > >> >> > > > > > > basically
> > >> >> > >> >> > > > > > > > > all
> > >> >> > >> >> > > > > > > > > > > > > scored
> > >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we
> > do
> > >> not
> > >> >> > >> need in
> > >> >> > >> >> > most
> > >> >> > >> >> > > > > cases.
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
> > >> >> capable
> > >> >> > >> >> search
> > >> >> > >> >> > > call
> > >> >> > >> >> > > > > can be
> > >> >> > >> >> > > > > > > > > > > > > > executed:
> > *IndexSearcher.search(query,
> > >> >> > count,
> > >> >> > >> >> > > > > > > > > > > > > > sort) *
> > >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> > >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
> > >> >> parameter
> > >> >> > in
> > >> >> > >> >> > > > > > > *@QueryTextFiled *
> > >> >> > >> >> > > > > > > > > > > > > > annotation. If
> > >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed
> > but
> > >> not
> > >> >> > >> >> tokenized.
> > >> >> > >> >> > > > > Number
> > >> >> > >> >> > > > > > > types
> > >> >> > >> >> > > > > > > > > > are
> > >> >> > >> >> > > > > > > > > > > > > > preferred here.
> > >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> > >> *TextQuery*
> > >> >> > >> >> > constructor.
> > >> >> > >> >> > > It
> > >> >> > >> >> > > > > > > should
> > >> >> > >> >> > > > > > > > > > define
> > >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for
> > querying.
> > >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
> > >> >> > >> >> > > > > GridLuceneIndex.query().
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
> > >> with
> > >> >> > >> >> > *TextQuery*,
> > >> >> > >> >> > > > > > > including
> > >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> > >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
> > >> >> requires
> > >> >> > >> more
> > >> >> > >> >> > > > detailed
> > >> >> > >> >> > > > > > > work.
> > >> >> > >> >> > > > > > > > > > Should
> > >> >> > >> >> > > > > > > > > > > > be
> > >> >> > >> >> > > > > > > > > > > > > > extended if community is
> > interested in
> > >> >> it.*
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > > BR,
> > >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> > >> >> > >> >> > > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > > > --
> > >> >> > >> >> > > > > > > > > > > > > Best regards,
> > >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> > >> >> > >> >> > > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > --
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > > > Best regards,
> > >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> > >> >> > >> >> > > > > > > > > > >
> > >> >> > >> >> > > > > > > > > >
> > >> >> > >> >> > > > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > > > > --
> > >> >> > >> >> > > > > > > Best regards,
> > >> >> > >> >> > > > > > > Ivan Pavlukhin
> > >> >> > >> >> > > > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > > > --
> > >> >> > >> >> > > > > Best regards,
> > >> >> > >> >> > > > > Ivan Pavlukhin
> > >> >> > >> >> > > > >
> > >> >> > >> >> > > >
> > >> >> > >> >> > >
> > >> >> > >> >> >
> > >> >> > >> >> >
> > >> >> > >> >> > --
> > >> >> > >> >> > Best regards,
> > >> >> > >> >> > Andrey V. Mashenkov
> > >> >> > >> >> >
> > >> >> > >> >>
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > --
> > >> >> > >> > Best regards,
> > >> >> > >> > Andrey V. Mashenkov
> > >> >> > >> >
> > >> >> > >>
> > >> >> > >
> > >> >> >
> > >> >> > --
> > >> >> > Best regards,
> > >> >> > Andrey V. Mashenkov
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
> >
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

ASF way should probably start with an IEP :)

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 14:12, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Ok, lets forgot Solr and go through ASF way, if Yuriy prove this
> functionality is helpful and PR it, why not ?
>
> isn`t it ?
>
> >Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com>:
> >
> >Hello!
> >
> >The problem here is that Solr is a multi-year effort by a lot of people.
> We
> >can't match that.
> >
> >Maybe we could integrate with Solr/Solr Cloud instead, by feeding our
> cache
> >information into their storage for indexing and relying on their own
> >mechanisms for distributed IR sorting?
> >
> >Regards,
> >--
> >Ilya Kasnacheev
> >
> >
> >вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <
> arzamas123@mail.ru.invalid
> >>:
> >
> >>
> >> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
> >>
> >> thanks !
> >>
> >> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> >>  ilya.kasnacheev@gmail.com >:
> >> >
> >> >Hello!
> >> >
> >> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> >> into
> >> >Apache Ignite. I think that's a lot of effort that is not very
> justified.
> >> >
> >> >I don't think we should try to implement sorting in Apache Ignite,
> because
> >> >it is a lot of work, and a lot of code in our code base which we don't
> >> >really want.
> >> >
> >> >Regards,
> >> >--
> >> >Ilya Kasnacheev
> >> >
> >> >
> >> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shuliga@gmail.com >:
> >> >
> >> >> Dear Igniters,
> >> >>
> >> >> The first part of TextQuery improvement - a result limit - was
> developed
> >> >> and merged.
> >> >> Now we have to develop most important functionality here - proper
> >> sorting
> >> >> of Lucene index response and correct reducing of them for distributed
> >> >> queries.
> >> >>
> >> >> *There are two Lucene based aspects*
> >> >>
> >> >> 1. In case of using no sorting fields, the documents in response are
> >> still
> >> >> ordered by relevance.
> >> >> Actually this is ScoreDoc.score value.
> >> >> In order to reduce the distributed results correctly, the score
> should
> >> be
> >> >> passed with response.
> >> >>
> >> >> 2. When sorting by conventional fields, then Lucene should have these
> >> >> fields properly indexed and
> >> >> corresponding Sort object should be applied to Lucene's search call.
> >> >> In order to mark those fields a new annotation like '@SortField' may
> be
> >> >> introduced.
> >> >>
> >> >> *Reducing on Ignite *
> >> >>
> >> >> The obvious point of distributed response reduction is class
> >> >> GridCacheDistributedQueryFuture.
> >> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> >> >> ReduceIndexSorted
> >> >> What I see here, that it is tangled with H2 related classes (
> >> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> >> >>
> >> >> Still need a support here.
> >> >>
> >> >> Overall, the goal of this letter is to initiate discussion on
> TextQuery
> >> >> Sorting implementation and come closer to ticket creation.
> >> >>
> >> >> BR,
> >> >> Yuriy Shuliha
> >> >>
> >> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> >> >
> >> >> пише:
> >> >>
> >> >> > Hi Dmitry, Yuriy.
> >> >> >
> >> >> > I've found GridCacheQueryFutureAdapter has newly added
> AtomicInteger
> >> >> > 'total' field and 'limit; field as primitive int.
> >> >> >
> >> >> > Both fields are used inside synchronized block only.
> >> >> > So, we can make both private and downgrade AtomicInteger to
> primitive
> >> >> int.
> >> >> >
> >> >> > Most likely, these fields can be replaced with one field.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <
> dpavlov@apache.org
> >> >
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Andrey,
> >> >> > >
> >> >> > > I've checked this ticket comments, and there is a TC Bot visa
> (with
> >> no
> >> >> > > blockers).
> >> >> > >
> >> >> > > Do you have any concerns related to this patch?
> >> >> > >
> >> >> > > Sincerely,
> >> >> > > Dmitriy Pavlov
> >> >> > >
> >> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shuliga@gmail.com
> >:
> >> >> > >
> >> >> > >> Andrey,
> >> >> > >>
> >> >> > >> Per you request, I created ticket
> >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> >> >> > >>
> >>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >> >> > >>
> >> >> > >> Could you please proceed with PR merge ?
> >> >> > >>
> >> >> > >> BR,
> >> >> > >> Yuriy Shuliha
> >> >> > >>
> >> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> >>  andrey.mashenkov@gmail.com
> >> >> >
> >> >> > >> пише:
> >> >> > >>
> >> >> > >> > Hi Yuri,
> >> >> > >> >
> >> >> > >> > To get access to TC Bot you should register as TeamCity user
> >> [1], if
> >> >> > you
> >> >> > >> > didn't do this already.
> >> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
> >> same
> >> >> > >> > credentials.
> >> >> > >> >
> >> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> >> >> > >> >
> >> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <
> shuliga@gmail.com
> >> >
> >> >> > wrote:
> >> >> > >> >
> >> >> > >> >> Andrew,
> >> >> > >> >>
> >> >> > >> >> I have corrected PR according to your notes. Please review.
> >> >> > >> >> What will be the next steps in order to merge in?
> >> >> > >> >>
> >> >> > >> >> Y.
> >> >> > >> >>
> >> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> >> >> >  andrey.mashenkov@gmail.com >
> >> >> > >> >> пише:
> >> >> > >> >>
> >> >> > >> >> > Yuri,
> >> >> > >> >> >
> >> >> > >> >> > I've done with review.
> >> >> > >> >> > No crime found, but trivial compatibility bug.
> >> >> > >> >> >
> >> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> >>  shuliga@gmail.com >
> >> >> > >> wrote:
> >> >> > >> >> >
> >> >> > >> >> > > Denis,
> >> >> > >> >> > >
> >> >> > >> >> > > Thank you for your attention to this.
> >> >> > >> >> > > as for now, the
> >> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > >> >> > ticket
> >> >> > >> >> > > is still pending review.
> >> >> > >> >> > > Do we have a chance to move it forward somehow?
> >> >> > >> >> > >
> >> >> > >> >> > > BR,
> >> >> > >> >> > > Yuriy Shuliha
> >> >> > >> >> > >
> >> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <
> dmagda@apache.org >
> >> пише:
> >> >> > >> >> > >
> >> >> > >> >> > > > Yuriy,
> >> >> > >> >> > > >
> >> >> > >> >> > > > I've seen you opening a pull-request with the first
> >> changes:
> >> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
> >> >> > >> >> > > >
> >> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do
> the
> >> >> > review?
> >> >> > >> >> > > >
> >> >> > >> >> > > > -
> >> >> > >> >> > > > Denis
> >> >> > >> >> > > >
> >> >> > >> >> > > >
> >> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> >> >> > >>  vololo100@gmail.com >
> >> >> > >> >> > > wrote:
> >> >> > >> >> > > >
> >> >> > >> >> > > > > Yuriy,
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > Thank you for providing details! Quite interesting.
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > Yes, we already have support of distributed limit and
> >> >> merging
> >> >> > >> >> sorted
> >> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted
> and
> >> >> > >> >> > > > > MergeStreamIterator are used for merging sorted
> streams.
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > Could you please also clarify about score/relevance?
> Is
> >> it
> >> >> > >> >> provided
> >> >> > >> >> > by
> >> >> > >> >> > > > > Lucene engine for each query result? I am thinking
> how
> >> to
> >> >> do
> >> >> > >> >> sorted
> >> >> > >> >> > > > > merge properly in this case.
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> >> >> >  shuliga@gmail.com
> >> >> > >> >:
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Ivan,
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Thank you for interesting question!
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Text searches (or full text searches) are mostly
> >> >> > >> human-oriented.
> >> >> > >> >> > And
> >> >> > >> >> > > > the
> >> >> > >> >> > > > > > point of user's interest is topmost part of
> response.
> >> >> > >> >> > > > > > Then user can read it, evaluate and use the given
> >> records
> >> >> > for
> >> >> > >> >> > further
> >> >> > >> >> > > > > > purposes.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Particularly in our case, we use Ignite for
> operations
> >> >> with
> >> >> > >> >> > financial
> >> >> > >> >> > > > > data,
> >> >> > >> >> > > > > > and there lots of text stuff like assets names,
> fin.
> >> >> > >> >> instruments,
> >> >> > >> >> > > > > companies
> >> >> > >> >> > > > > > etc.
> >> >> > >> >> > > > > > In order to operate with this quickly and reliably,
> >> users
> >> >> > >> used
> >> >> > >> >> to
> >> >> > >> >> > > work
> >> >> > >> >> > > > > with
> >> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > For this purposes we are indexing particular string
> >> data
> >> >> in
> >> >> > >> >> > separate
> >> >> > >> >> > > > > caches.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Sorting capabilities and response size limitations
> are
> >> >> very
> >> >> > >> >> > important
> >> >> > >> >> > > > > > there. As our API have to provide most relevant
> >> >> information
> >> >> > >> in
> >> >> > >> >> view
> >> >> > >> >> > > of
> >> >> > >> >> > > > > > limited size.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> >> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> >> >> > >> *TopDocs.scoresDocs
> >> >> > >> >> > > *already
> >> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> >> documents
> >> >> > >> are on
> >> >> > >> >> > the
> >> >> > >> >> > > > top.
> >> >> > >> >> > > > > > And currently distributed queries responses from
> >> >> different
> >> >> > >> nodes
> >> >> > >> >> > are
> >> >> > >> >> > > > > merged
> >> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> >> >> > >> >> > > > > > So in fact we already have the score order ruined
> >> here.
> >> >> > Also
> >> >> > >> >> Ignite
> >> >> > >> >> > > > > > requests all possible documents from Lucene that is
> >> >> > redundant
> >> >> > >> >> and
> >> >> > >> >> > not
> >> >> > >> >> > > > > good
> >> >> > >> >> > > > > > for performance.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
> >> >> *TextQuery
> >> >> > >> *and
> >> >> > >> >> > have
> >> >> > >> >> > > > to
> >> >> > >> >> > > > > > notice that we still have to add sorting for text
> >> queries
> >> >> > >> >> > processing
> >> >> > >> >> > > in
> >> >> > >> >> > > > > > order to have applicable results.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
> >> >> issues
> >> >> > >> from
> >> >> > >> >> > > above,
> >> >> > >> >> > > > > but
> >> >> > >> >> > > > > > definitely, sorting by document score at least
> should
> >> be
> >> >> > >> >> > implemented
> >> >> > >> >> > > > > along
> >> >> > >> >> > > > > > with limit.
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > This is a pretty short commentary if you still have
> >> any
> >> >> > >> >> questions,
> >> >> > >> >> > > > please
> >> >> > >> >> > > > > > ask, do not hesitate)
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > BR,
> >> >> > >> >> > > > > > Yuriy Shuliha
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> >> >> >  vololo100@gmail.com >
> >> >> > >> >> пише:
> >> >> > >> >> > > > > >
> >> >> > >> >> > > > > > > Yuriy,
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > > Greatly appreciate your interest.
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > > Could you please elaborate a little bit about
> >> sorting?
> >> >> > What
> >> >> > >> >> tasks
> >> >> > >> >> > > > does
> >> >> > >> >> > > > > > > it help to solve and how? It would be great to
> >> provide
> >> >> an
> >> >> > >> >> > example.
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> >> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > Denis,
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > I like the idea of throwing an exception for
> >> enabled
> >> >> > text
> >> >> > >> >> > queries
> >> >> > >> >> > > > on
> >> >> > >> >> > > > > > > > persistent caches.
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
> >> >> > searches.
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> >> >> > >>  dmagda@apache.org
> >> >> > >> >> >:
> >> >> > >> >> > > > > > > >
> >> >> > >> >> > > > > > > > > Igniters,
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
> >> regards
> >> >> > >> >> full-text
> >> >> > >> >> > > > > search
> >> >> > >> >> > > > > > > API
> >> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
> >> >> > forward.
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes
> total
> >> >> sense
> >> >> > >> for
> >> >> > >> >> > > > in-memory
> >> >> > >> >> > > > > data
> >> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of
> an
> >> >> > >> underlying
> >> >> > >> >> DB
> >> >> > >> >> > > like
> >> >> > >> >> > > > > > > Postgres.
> >> >> > >> >> > > > > > > > > As part of the changes, I would simply throw
> an
> >> >> > >> exception
> >> >> > >> >> (by
> >> >> > >> >> > > > > default)
> >> >> > >> >> > > > > > > if
> >> >> > >> >> > > > > > > > > the one attempts to use text indices with the
> >> >> native
> >> >> > >> >> > > persistence
> >> >> > >> >> > > > > > > enabled.
> >> >> > >> >> > > > > > > > > If the person is ready to live with that
> >> limitation
> >> >> > >> that
> >> >> > >> >> an
> >> >> > >> >> > > > > explicit
> >> >> > >> >> > > > > > > > > configuration change is needed to come around
> >> the
> >> >> > >> >> exception.
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > Thoughts?
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > -
> >> >> > >> >> > > > > > > > > Denis
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy
> Shuliga <
> >> >> > >> >> > >  shuliga@gmail.com
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > > > wrote:
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > > > > > Hello to all again,
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > Thank you for important comments and notes
> >> given
> >> >> > >> below!
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > Alexei has referenced to
> >> >> > >> >> > > > > > > > > >
> >> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> >> >> > >> where
> >> >> > >> >> > > > > > > > > > absence of index persistence was declared
> as
> >> an
> >> >> > >> >> obstacle to
> >> >> > >> >> > > > > further
> >> >> > >> >> > > > > > > > > > development.
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> >> valid.b)
> >> >> > >> There
> >> >> > >> >> are
> >> >> > >> >> > > > > definite
> >> >> > >> >> > > > > > > needs
> >> >> > >> >> > > > > > > > > > (and in our project as well) in just
> in-memory
> >> >> > >> indexing
> >> >> > >> >> of
> >> >> > >> >> > > > > selected
> >> >> > >> >> > > > > > > data.
> >> >> > >> >> > > > > > > > > > We intend to use search capabilities for
> >> fetching
> >> >> > >> >> limited
> >> >> > >> >> > > > amount
> >> >> > >> >> > > > > of
> >> >> > >> >> > > > > > > > > records
> >> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
> >> >> > >> suggestions.
> >> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
> >> are
> >> >> no
> >> >> > >> need
> >> >> > >> >> in
> >> >> > >> >> > > > Lucene
> >> >> > >> >> > > > > > > index
> >> >> > >> >> > > > > > > > > to
> >> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide
> pattern of
> >> >> > >> >> text-search
> >> >> > >> >> > > > usage.
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> >> implementation.
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > a) Implementation of correct *limit
> *(*offset*
> >> >> > seems
> >> >> > >> to
> >> >> > >> >> be
> >> >> > >> >> > > not
> >> >> > >> >> > > > > > > required
> >> >> > >> >> > > > > > > > > in
> >> >> > >> >> > > > > > > > > > text-search tasks for now)
> >> >> > >> >> > > > > > > > > > I have investigated the data flow for
> >> distributed
> >> >> > >> text
> >> >> > >> >> > > queries.
> >> >> > >> >> > > > > it
> >> >> > >> >> > > > > > > was
> >> >> > >> >> > > > > > > > > > simple test prefix query, like
> 'name'*='ene*'*
> >> >> > >> >> > > > > > > > > > For now each server-node returns all
> response
> >> >> > >> records to
> >> >> > >> >> > the
> >> >> > >> >> > > > > > > client-node
> >> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> >> thousands
> >> >> > >> >> records.
> >> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again,
> all
> >> >> the
> >> >> > >> >> results
> >> >> > >> >> > > are
> >> >> > >> >> > > > > added
> >> >> > >> >> > > > > > > to
> >> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> >> arbitrary
> >> >> > >> order
> >> >> > >> >> by
> >> >> > >> >> > > > pages.
> >> >> > >> >> > > > > > > > > > I did not find here any means to deliver
> >> >> > >> deterministic
> >> >> > >> >> > > result.
> >> >> > >> >> > > > > > > > > > So implementing limit as part of query and
> >> >> > >> >> > > > > (GridCacheQueryRequest)
> >> >> > >> >> > > > > > > will
> >> >> > >> >> > > > > > > > > not
> >> >> > >> >> > > > > > > > > > change the nature of response but will
> limit
> >> load
> >> >> > on
> >> >> > >> >> nodes
> >> >> > >> >> > > and
> >> >> > >> >> > > > > > > > > networking.
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> >> exposition
> >> >> to
> >> >> > >> >> Ignite
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > a) Sorting
> >> >> > >> >> > > > > > > > > > The solution for this could be:
> >> >> > >> >> > > > > > > > > > - Make entities comparable
> >> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> >> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
> >> >> Lucene
> >> >> > >> >> indexing
> >> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
> >> >> > reducing
> >> >> > >> to
> >> >> > >> >> > > desired
> >> >> > >> >> > > > > > > limit on
> >> >> > >> >> > > > > > > > > > client node.
> >> >> > >> >> > > > > > > > > > Will require full result set to be loaded
> into
> >> >> > >> memory.
> >> >> > >> >> > Though
> >> >> > >> >> > > > > can be
> >> >> > >> >> > > > > > > used
> >> >> > >> >> > > > > > > > > > for relatively small limits.
> >> >> > >> >> > > > > > > > > > BR,
> >> >> > >> >> > > > > > > > > > Yuriy Shuliha
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei
> Scherbakov <
> >> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> >> >> > >> >> > > > > > > > > > пише:
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > Yuriy,
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
> >> >> queries
> >> >> > is
> >> >> > >> >> [1]
> >> >> > >> >> > > which
> >> >> > >> >> > > > > makes
> >> >> > >> >> > > > > > > > > > lucene
> >> >> > >> >> > > > > > > > > > > indexes unusable with persistence and
> main
> >> >> reason
> >> >> > >> for
> >> >> > >> >> > > > > > > discontinuation.
> >> >> > >> >> > > > > > > > > > > Probably it's should be addressed first
> to
> >> make
> >> >> > >> text
> >> >> > >> >> > > queries
> >> >> > >> >> > > > a
> >> >> > >> >> > > > > > > valid
> >> >> > >> >> > > > > > > > > > > product feature.
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > Distributed sorting and advanved
> querying is
> >> >> > indeed
> >> >> > >> >> not a
> >> >> > >> >> > > > > trivial
> >> >> > >> >> > > > > > > task.
> >> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented
> on
> >> >> query
> >> >> > >> >> > > originating
> >> >> > >> >> > > > > node.
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > [1]
> >> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda
> <
> >> >> > >> >> > >  dmagda@apache.org
> >> >> > >> >> > > > >:
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > Yuriy,
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> >> full-text
> >> >> > >> search
> >> >> > >> >> > > indexes
> >> >> > >> >> > > > > then
> >> >> > >> >> > > > > > > > > please
> >> >> > >> >> > > > > > > > > > go
> >> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> >> community
> >> >> > >> wants to
> >> >> > >> >> > > > > discontinue
> >> >> > >> >> > > > > > > them
> >> >> > >> >> > > > > > > > > > > first
> >> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are
> the
> >> >> > >> limitations
> >> >> > >> >> > > listed
> >> >> > >> >> > > > > by
> >> >> > >> >> > > > > > > Andrey
> >> >> > >> >> > > > > > > > > > and
> >> >> > >> >> > > > > > > > > > > > minimal support from the community end.
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > -
> >> >> > >> >> > > > > > > > > > > > Denis
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
> >> >> > Mashenkov
> >> >> > >> <
> >> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> >> >> > >> >> > > > > > > > > > > > wrote:
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
> >> >> > discontinue
> >> >> > >> >> > > > TextQueries
> >> >> > >> >> > > > > in
> >> >> > >> >> > > > > > > > > Ignite
> >> >> > >> >> > > > > > > > > > > [1].
> >> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are
> not
> >> >> > >> >> persistent,
> >> >> > >> >> > not
> >> >> > >> >> > > > > > > > > transactional
> >> >> > >> >> > > > > > > > > > > and
> >> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
> >> inside
> >> >> > SQL.
> >> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
> >> >> > community
> >> >> > >> >> side.
> >> >> > >> >> > > > > > > > > > > > > You are weclome to take on these
> issues
> >> and
> >> >> > >> make
> >> >> > >> >> > > > > TextQueries
> >> >> > >> >> > > > > > > great.
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
> >> >> > resultset.
> >> >> > >> >> > > > > > > > > > > > > Query results return from data node
> to
> >> >> > >> client-side
> >> >> > >> >> > > cursor
> >> >> > >> >> > > > > in
> >> >> > >> >> > > > > > > > > > > page-by-page
> >> >> > >> >> > > > > > > > > > > > > manner and
> >> >> > >> >> > > > > > > > > > > > > this parameter is designed control
> page
> >> >> size.
> >> >> > >> It
> >> >> > >> >> is
> >> >> > >> >> > > > > supposed
> >> >> > >> >> > > > > > > query
> >> >> > >> >> > > > > > > > > > > > executes
> >> >> > >> >> > > > > > > > > > > > > lazily on server side and
> >> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
> >> loaded
> >> >> > to
> >> >> > >> >> memory
> >> >> > >> >> > > on
> >> >> > >> >> > > > > server
> >> >> > >> >> > > > > > > > > side
> >> >> > >> >> > > > > > > > > > at
> >> >> > >> >> > > > > > > > > > > > > once, but by pages.
> >> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load
> entire
> >> >> > >> resultset
> >> >> > >> >> > into
> >> >> > >> >> > > > > memory
> >> >> > >> >> > > > > > > > > before
> >> >> > >> >> > > > > > > > > > > > first
> >> >> > >> >> > > > > > > > > > > > > page is sent to client?
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
> >> added
> >> >> to
> >> >> > >> limit
> >> >> > >> >> > > > result.
> >> >> > >> >> > > > > The
> >> >> > >> >> > > > > > > best
> >> >> > >> >> > > > > > > > > > > > > solution is to use query language
> >> commands
> >> >> > for
> >> >> > >> >> this,
> >> >> > >> >> > > e.g.
> >> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> >> >> > >> >> > > > > > > > > > > > in
> >> >> > >> >> > > > > > > > > > > > > SQL.
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial.
> Query is
> >> >> > >> >> distributed
> >> >> > >> >> > > > > operation
> >> >> > >> >> > > > > > > and
> >> >> > >> >> > > > > > > > > > same
> >> >> > >> >> > > > > > > > > > > > > user query will be executed on data
> >> nodes
> >> >> > >> >> > > > > > > > > > > > > and then results from all nodes
> should
> >> be
> >> >> > >> correcly
> >> >> > >> >> > > merged
> >> >> > >> >> > > > > > > before
> >> >> > >> >> > > > > > > > > > being
> >> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> >> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
> >> node
> >> >> and
> >> >> > >> >> then on
> >> >> > >> >> > > > merge
> >> >> > >> >> > > > > > > phase.
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos,
> limiting
> >> >> > results
> >> >> > >> >> make
> >> >> > >> >> > no
> >> >> > >> >> > > > > sence
> >> >> > >> >> > > > > > > > > without
> >> >> > >> >> > > > > > > > > > > > > sorting,
> >> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
> >> query
> >> >> run
> >> >> > >> will
> >> >> > >> >> > > return
> >> >> > >> >> > > > > same
> >> >> > >> >> > > > > > > data
> >> >> > >> >> > > > > > > > > > > > because
> >> >> > >> >> > > > > > > > > > > > > of page reordeing.
> >> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive
> results
> >> from
> >> >> > >> data
> >> >> > >> >> > nodes
> >> >> > >> >> > > > > > > > > asynchronously
> >> >> > >> >> > > > > > > > > > > and
> >> >> > >> >> > > > > > > > > > > > > messages from different nodes can't
> be
> >> >> > ordered.
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > 2.
> >> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> >> >> > @QueryTextFiled)
> >> >> > >> >> looks
> >> >> > >> >> > > more
> >> >> > >> >> > > > > > > verbose,
> >> >> > >> >> > > > > > > > > > > isn't
> >> >> > >> >> > > > > > > > > > > > > it.
> >> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query?
> How
> >> >> > partial
> >> >> > >> >> > results
> >> >> > >> >> > > > from
> >> >> > >> >> > > > > > > nodes
> >> >> > >> >> > > > > > > > > > will
> >> >> > >> >> > > > > > > > > > > be
> >> >> > >> >> > > > > > > > > > > > > merged?
> >> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> >> comparator
> >> >> > for
> >> >> > >> >> data
> >> >> > >> >> > > > > sorting?
> >> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose
> to
> >> >> sort
> >> >> > >> >> result
> >> >> > >> >> > on
> >> >> > >> >> > > > > merge
> >> >> > >> >> > > > > > > phase?
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> >> >> configurable
> >> >> > at
> >> >> > >> >> all.
> >> >> > >> >> > > E.g.
> >> >> > >> >> > > > > it is
> >> >> > >> >> > > > > > > > > > > > impossible
> >> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> >> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> >> configure
> >> >> > >> engine
> >> >> > >> >> at
> >> >> > >> >> > > > first
> >> >> > >> >> > > > > and
> >> >> > >> >> > > > > > > only
> >> >> > >> >> > > > > > > > > > > then
> >> >> > >> >> > > > > > > > > > > > go
> >> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
> >> >> > features,
> >> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
> >> >> > Shuliga <
> >> >> > >> >> > > > > > >  shuliga@gmail.com >
> >> >> > >> >> > > > > > > > > > > wrote:
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > Dear community,
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
> >> open
> >> >> > >> >> discussion
> >> >> > >> >> > > that
> >> >> > >> >> > > > > would
> >> >> > >> >> > > > > > > > > come
> >> >> > >> >> > > > > > > > > > to
> >> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
> >> backed
> >> >> up
> >> >> > >> by
> >> >> > >> >> > > > different
> >> >> > >> >> > > > > > > > > > mechanisms,
> >> >> > >> >> > > > > > > > > > > > > > including Lucene.
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used
> (past
> >> >> year
> >> >> > >> >> > release).
> >> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
> >> >> technology
> >> >> > >> that
> >> >> > >> >> > > covers
> >> >> > >> >> > > > > text
> >> >> > >> >> > > > > > > > > search
> >> >> > >> >> > > > > > > > > > > > area
> >> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> >> indexing).
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
> >> >> > >> functionality
> >> >> > >> >> to
> >> >> > >> >> > > > Ignite
> >> >> > >> >> > > > > > > > > indexing
> >> >> > >> >> > > > > > > > > > > and
> >> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > It's quite simple request at
> current
> >> >> stage.
> >> >> > >> It
> >> >> > >> >> is
> >> >> > >> >> > > > coming
> >> >> > >> >> > > > > > > from our
> >> >> > >> >> > > > > > > > > > > > > project's
> >> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be
> useful
> >> for
> >> >> a
> >> >> > >> lot
> >> >> > >> >> more
> >> >> > >> >> > > > > people.
> >> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or
> discuss
> >> >> > about
> >> >> > >> >> Jira
> >> >> > >> >> > > > > tickets for
> >> >> > >> >> > > > > > > > > them.
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> >> dataQuery.getPageSize()
> >> >> > to
> >> >> > >> >> limit
> >> >> > >> >> > > > search
> >> >> > >> >> > > > > > > > > response
> >> >> > >> >> > > > > > > > > > > > items
> >> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> >> Currently
> >> >> > it
> >> >> > >> is
> >> >> > >> >> > > calling
> >> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> >> >> > >> >> *Integer.MAX_VALUE*) -
> >> >> > >> >> > so
> >> >> > >> >> > > > > > > basically
> >> >> > >> >> > > > > > > > > all
> >> >> > >> >> > > > > > > > > > > > > scored
> >> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we
> do
> >> not
> >> >> > >> need in
> >> >> > >> >> > most
> >> >> > >> >> > > > > cases.
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
> >> >> capable
> >> >> > >> >> search
> >> >> > >> >> > > call
> >> >> > >> >> > > > > can be
> >> >> > >> >> > > > > > > > > > > > > > executed:
> *IndexSearcher.search(query,
> >> >> > count,
> >> >> > >> >> > > > > > > > > > > > > > sort) *
> >> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> >> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
> >> >> parameter
> >> >> > in
> >> >> > >> >> > > > > > > *@QueryTextFiled *
> >> >> > >> >> > > > > > > > > > > > > > annotation. If
> >> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed
> but
> >> not
> >> >> > >> >> tokenized.
> >> >> > >> >> > > > > Number
> >> >> > >> >> > > > > > > types
> >> >> > >> >> > > > > > > > > > are
> >> >> > >> >> > > > > > > > > > > > > > preferred here.
> >> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> >> *TextQuery*
> >> >> > >> >> > constructor.
> >> >> > >> >> > > It
> >> >> > >> >> > > > > > > should
> >> >> > >> >> > > > > > > > > > define
> >> >> > >> >> > > > > > > > > > > > > > desired sort fields used for
> querying.
> >> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
> >> >> > >> >> > > > > GridLuceneIndex.query().
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
> >> with
> >> >> > >> >> > *TextQuery*,
> >> >> > >> >> > > > > > > including
> >> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> >> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
> >> >> requires
> >> >> > >> more
> >> >> > >> >> > > > detailed
> >> >> > >> >> > > > > > > work.
> >> >> > >> >> > > > > > > > > > Should
> >> >> > >> >> > > > > > > > > > > > be
> >> >> > >> >> > > > > > > > > > > > > > extended if community is
> interested in
> >> >> it.*
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > > BR,
> >> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> >> >> > >> >> > > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > > > --
> >> >> > >> >> > > > > > > > > > > > > Best regards,
> >> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> >> >> > >> >> > > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > --
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > > > Best regards,
> >> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> >> >> > >> >> > > > > > > > > > >
> >> >> > >> >> > > > > > > > > >
> >> >> > >> >> > > > > > > > >
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > > > > --
> >> >> > >> >> > > > > > > Best regards,
> >> >> > >> >> > > > > > > Ivan Pavlukhin
> >> >> > >> >> > > > > > >
> >> >> > >> >> > > > >
> >> >> > >> >> > > > >
> >> >> > >> >> > > > >
> >> >> > >> >> > > > > --
> >> >> > >> >> > > > > Best regards,
> >> >> > >> >> > > > > Ivan Pavlukhin
> >> >> > >> >> > > > >
> >> >> > >> >> > > >
> >> >> > >> >> > >
> >> >> > >> >> >
> >> >> > >> >> >
> >> >> > >> >> > --
> >> >> > >> >> > Best regards,
> >> >> > >> >> > Andrey V. Mashenkov
> >> >> > >> >> >
> >> >> > >> >>
> >> >> > >> >
> >> >> > >> >
> >> >> > >> > --
> >> >> > >> > Best regards,
> >> >> > >> > Andrey V. Mashenkov
> >> >> > >> >
> >> >> > >>
> >> >> > >
> >> >> >
> >> >> > --
> >> >> > Best regards,
> >> >> > Andrey V. Mashenkov
> >> >> >
> >> >>
> >>
> >>
> >>
> >>
> >
>
>
>
>

Re[4]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Zhenya Stanilovsky <ar...@mail.ru.INVALID>.

Ok, lets forgot Solr and go through ASF way, if Yuriy prove this functionality is helpful and PR it, why not ?
 
isn`t it ?
  
>Вторник, 26 ноября 2019, 14:06 +03:00 от Ilya Kasnacheev <il...@gmail.com>:
> 
>Hello!
>
>The problem here is that Solr is a multi-year effort by a lot of people. We
>can't match that.
>
>Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache
>information into their storage for indexing and relying on their own
>mechanisms for distributed IR sorting?
>
>Regards,
>--
>Ilya Kasnacheev
>
>
>вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky < arzamas123@mail.ru.invalid
>>:
>
>>
>> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
>>
>> thanks !
>>
>> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
>>  ilya.kasnacheev@gmail.com >:
>> >
>> >Hello!
>> >
>> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
>> into
>> >Apache Ignite. I think that's a lot of effort that is not very justified.
>> >
>> >I don't think we should try to implement sorting in Apache Ignite, because
>> >it is a lot of work, and a lot of code in our code base which we don't
>> >really want.
>> >
>> >Regards,
>> >--
>> >Ilya Kasnacheev
>> >
>> >
>> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga <  shuliga@gmail.com >:
>> >
>> >> Dear Igniters,
>> >>
>> >> The first part of TextQuery improvement - a result limit - was developed
>> >> and merged.
>> >> Now we have to develop most important functionality here - proper
>> sorting
>> >> of Lucene index response and correct reducing of them for distributed
>> >> queries.
>> >>
>> >> *There are two Lucene based aspects*
>> >>
>> >> 1. In case of using no sorting fields, the documents in response are
>> still
>> >> ordered by relevance.
>> >> Actually this is ScoreDoc.score value.
>> >> In order to reduce the distributed results correctly, the score should
>> be
>> >> passed with response.
>> >>
>> >> 2. When sorting by conventional fields, then Lucene should have these
>> >> fields properly indexed and
>> >> corresponding Sort object should be applied to Lucene's search call.
>> >> In order to mark those fields a new annotation like '@SortField' may be
>> >> introduced.
>> >>
>> >> *Reducing on Ignite *
>> >>
>> >> The obvious point of distributed response reduction is class
>> >> GridCacheDistributedQueryFuture.
>> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
>> >> ReduceIndexSorted
>> >> What I see here, that it is tangled with H2 related classes (
>> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
>> >>
>> >> Still need a support here.
>> >>
>> >> Overall, the goal of this letter is to initiate discussion on TextQuery
>> >> Sorting implementation and come closer to ticket creation.
>> >>
>> >> BR,
>> >> Yuriy Shuliha
>> >>
>> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov <  andrey.mashenkov@gmail.com
>> >
>> >> пише:
>> >>
>> >> > Hi Dmitry, Yuriy.
>> >> >
>> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
>> >> > 'total' field and 'limit; field as primitive int.
>> >> >
>> >> > Both fields are used inside synchronized block only.
>> >> > So, we can make both private and downgrade AtomicInteger to primitive
>> >> int.
>> >> >
>> >> > Most likely, these fields can be replaced with one field.
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov <  dpavlov@apache.org
>> >
>> >> > wrote:
>> >> >
>> >> > > Hi Andrey,
>> >> > >
>> >> > > I've checked this ticket comments, and there is a TC Bot visa (with
>> no
>> >> > > blockers).
>> >> > >
>> >> > > Do you have any concerns related to this patch?
>> >> > >
>> >> > > Sincerely,
>> >> > > Dmitriy Pavlov
>> >> > >
>> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga <  shuliga@gmail.com >:
>> >> > >
>> >> > >> Andrey,
>> >> > >>
>> >> > >> Per you request, I created ticket
>> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
>> >> > >>
>>  https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
>> >> > >>
>> >> > >> Could you please proceed with PR merge ?
>> >> > >>
>> >> > >> BR,
>> >> > >> Yuriy Shuliha
>> >> > >>
>> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
>>  andrey.mashenkov@gmail.com
>> >> >
>> >> > >> пише:
>> >> > >>
>> >> > >> > Hi Yuri,
>> >> > >> >
>> >> > >> > To get access to TC Bot you should register as TeamCity user
>> [1], if
>> >> > you
>> >> > >> > didn't do this already.
>> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
>> same
>> >> > >> > credentials.
>> >> > >> >
>> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
>> >> > >> >
>> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga <  shuliga@gmail.com
>> >
>> >> > wrote:
>> >> > >> >
>> >> > >> >> Andrew,
>> >> > >> >>
>> >> > >> >> I have corrected PR according to your notes. Please review.
>> >> > >> >> What will be the next steps in order to merge in?
>> >> > >> >>
>> >> > >> >> Y.
>> >> > >> >>
>> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
>> >> >  andrey.mashenkov@gmail.com >
>> >> > >> >> пише:
>> >> > >> >>
>> >> > >> >> > Yuri,
>> >> > >> >> >
>> >> > >> >> > I've done with review.
>> >> > >> >> > No crime found, but trivial compatibility bug.
>> >> > >> >> >
>> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
>>  shuliga@gmail.com >
>> >> > >> wrote:
>> >> > >> >> >
>> >> > >> >> > > Denis,
>> >> > >> >> > >
>> >> > >> >> > > Thank you for your attention to this.
>> >> > >> >> > > as for now, the
>> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > >> >> > ticket
>> >> > >> >> > > is still pending review.
>> >> > >> >> > > Do we have a chance to move it forward somehow?
>> >> > >> >> > >
>> >> > >> >> > > BR,
>> >> > >> >> > > Yuriy Shuliha
>> >> > >> >> > >
>> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda <  dmagda@apache.org >
>> пише:
>> >> > >> >> > >
>> >> > >> >> > > > Yuriy,
>> >> > >> >> > > >
>> >> > >> >> > > > I've seen you opening a pull-request with the first
>> changes:
>> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
>> >> > >> >> > > >
>> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
>> >> > review?
>> >> > >> >> > > >
>> >> > >> >> > > > -
>> >> > >> >> > > > Denis
>> >> > >> >> > > >
>> >> > >> >> > > >
>> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
>> >> > >>  vololo100@gmail.com >
>> >> > >> >> > > wrote:
>> >> > >> >> > > >
>> >> > >> >> > > > > Yuriy,
>> >> > >> >> > > > >
>> >> > >> >> > > > > Thank you for providing details! Quite interesting.
>> >> > >> >> > > > >
>> >> > >> >> > > > > Yes, we already have support of distributed limit and
>> >> merging
>> >> > >> >> sorted
>> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
>> >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams.
>> >> > >> >> > > > >
>> >> > >> >> > > > > Could you please also clarify about score/relevance? Is
>> it
>> >> > >> >> provided
>> >> > >> >> > by
>> >> > >> >> > > > > Lucene engine for each query result? I am thinking how
>> to
>> >> do
>> >> > >> >> sorted
>> >> > >> >> > > > > merge properly in this case.
>> >> > >> >> > > > >
>> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
>> >> >  shuliga@gmail.com
>> >> > >> >:
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Ivan,
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Thank you for interesting question!
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Text searches (or full text searches) are mostly
>> >> > >> human-oriented.
>> >> > >> >> > And
>> >> > >> >> > > > the
>> >> > >> >> > > > > > point of user's interest is topmost part of response.
>> >> > >> >> > > > > > Then user can read it, evaluate and use the given
>> records
>> >> > for
>> >> > >> >> > further
>> >> > >> >> > > > > > purposes.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Particularly in our case, we use Ignite for operations
>> >> with
>> >> > >> >> > financial
>> >> > >> >> > > > > data,
>> >> > >> >> > > > > > and there lots of text stuff like assets names, fin.
>> >> > >> >> instruments,
>> >> > >> >> > > > > companies
>> >> > >> >> > > > > > etc.
>> >> > >> >> > > > > > In order to operate with this quickly and reliably,
>> users
>> >> > >> used
>> >> > >> >> to
>> >> > >> >> > > work
>> >> > >> >> > > > > with
>> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > For this purposes we are indexing particular string
>> data
>> >> in
>> >> > >> >> > separate
>> >> > >> >> > > > > caches.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Sorting capabilities and response size limitations are
>> >> very
>> >> > >> >> > important
>> >> > >> >> > > > > > there. As our API have to provide most relevant
>> >> information
>> >> > >> in
>> >> > >> >> view
>> >> > >> >> > > of
>> >> > >> >> > > > > > limited size.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
>> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
>> >> > >> *TopDocs.scoresDocs
>> >> > >> >> > > *already
>> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
>> documents
>> >> > >> are on
>> >> > >> >> > the
>> >> > >> >> > > > top.
>> >> > >> >> > > > > > And currently distributed queries responses from
>> >> different
>> >> > >> nodes
>> >> > >> >> > are
>> >> > >> >> > > > > merged
>> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
>> >> > >> >> > > > > > So in fact we already have the score order ruined
>> here.
>> >> > Also
>> >> > >> >> Ignite
>> >> > >> >> > > > > > requests all possible documents from Lucene that is
>> >> > redundant
>> >> > >> >> and
>> >> > >> >> > not
>> >> > >> >> > > > > good
>> >> > >> >> > > > > > for performance.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
>> >> *TextQuery
>> >> > >> *and
>> >> > >> >> > have
>> >> > >> >> > > > to
>> >> > >> >> > > > > > notice that we still have to add sorting for text
>> queries
>> >> > >> >> > processing
>> >> > >> >> > > in
>> >> > >> >> > > > > > order to have applicable results.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
>> >> issues
>> >> > >> from
>> >> > >> >> > > above,
>> >> > >> >> > > > > but
>> >> > >> >> > > > > > definitely, sorting by document score at least should
>> be
>> >> > >> >> > implemented
>> >> > >> >> > > > > along
>> >> > >> >> > > > > > with limit.
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > This is a pretty short commentary if you still have
>> any
>> >> > >> >> questions,
>> >> > >> >> > > > please
>> >> > >> >> > > > > > ask, do not hesitate)
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > BR,
>> >> > >> >> > > > > > Yuriy Shuliha
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
>> >> >  vololo100@gmail.com >
>> >> > >> >> пише:
>> >> > >> >> > > > > >
>> >> > >> >> > > > > > > Yuriy,
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > Greatly appreciate your interest.
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > Could you please elaborate a little bit about
>> sorting?
>> >> > What
>> >> > >> >> tasks
>> >> > >> >> > > > does
>> >> > >> >> > > > > > > it help to solve and how? It would be great to
>> provide
>> >> an
>> >> > >> >> > example.
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
>> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Denis,
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > I like the idea of throwing an exception for
>> enabled
>> >> > text
>> >> > >> >> > queries
>> >> > >> >> > > > on
>> >> > >> >> > > > > > > > persistent caches.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
>> >> > searches.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
>> >> > >>  dmagda@apache.org
>> >> > >> >> >:
>> >> > >> >> > > > > > > >
>> >> > >> >> > > > > > > > > Igniters,
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
>> regards
>> >> > >> >> full-text
>> >> > >> >> > > > > search
>> >> > >> >> > > > > > > API
>> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
>> >> > forward.
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total
>> >> sense
>> >> > >> for
>> >> > >> >> > > > in-memory
>> >> > >> >> > > > > data
>> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an
>> >> > >> underlying
>> >> > >> >> DB
>> >> > >> >> > > like
>> >> > >> >> > > > > > > Postgres.
>> >> > >> >> > > > > > > > > As part of the changes, I would simply throw an
>> >> > >> exception
>> >> > >> >> (by
>> >> > >> >> > > > > default)
>> >> > >> >> > > > > > > if
>> >> > >> >> > > > > > > > > the one attempts to use text indices with the
>> >> native
>> >> > >> >> > > persistence
>> >> > >> >> > > > > > > enabled.
>> >> > >> >> > > > > > > > > If the person is ready to live with that
>> limitation
>> >> > >> that
>> >> > >> >> an
>> >> > >> >> > > > > explicit
>> >> > >> >> > > > > > > > > configuration change is needed to come around
>> the
>> >> > >> >> exception.
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > Thoughts?
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > -
>> >> > >> >> > > > > > > > > Denis
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga <
>> >> > >> >> > >  shuliga@gmail.com
>> >> > >> >> > > > >
>> >> > >> >> > > > > > > wrote:
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > > > > > Hello to all again,
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Thank you for important comments and notes
>> given
>> >> > >> below!
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Alexei has referenced to
>> >> > >> >> > > > > > > > > >
>> >>  https://issues.apache.org/jira/browse/IGNITE-5371
>> >> > >> where
>> >> > >> >> > > > > > > > > > absence of index persistence was declared as
>> an
>> >> > >> >> obstacle to
>> >> > >> >> > > > > further
>> >> > >> >> > > > > > > > > > development.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
>> valid.b)
>> >> > >> There
>> >> > >> >> are
>> >> > >> >> > > > > definite
>> >> > >> >> > > > > > > needs
>> >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory
>> >> > >> indexing
>> >> > >> >> of
>> >> > >> >> > > > > selected
>> >> > >> >> > > > > > > data.
>> >> > >> >> > > > > > > > > > We intend to use search capabilities for
>> fetching
>> >> > >> >> limited
>> >> > >> >> > > > amount
>> >> > >> >> > > > > of
>> >> > >> >> > > > > > > > > records
>> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
>> >> > >> suggestions.
>> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
>> are
>> >> no
>> >> > >> need
>> >> > >> >> in
>> >> > >> >> > > > Lucene
>> >> > >> >> > > > > > > index
>> >> > >> >> > > > > > > > > to
>> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of
>> >> > >> >> text-search
>> >> > >> >> > > > usage.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
>> implementation.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset*
>> >> > seems
>> >> > >> to
>> >> > >> >> be
>> >> > >> >> > > not
>> >> > >> >> > > > > > > required
>> >> > >> >> > > > > > > > > in
>> >> > >> >> > > > > > > > > > text-search tasks for now)
>> >> > >> >> > > > > > > > > > I have investigated the data flow for
>> distributed
>> >> > >> text
>> >> > >> >> > > queries.
>> >> > >> >> > > > > it
>> >> > >> >> > > > > > > was
>> >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'*
>> >> > >> >> > > > > > > > > > For now each server-node returns all response
>> >> > >> records to
>> >> > >> >> > the
>> >> > >> >> > > > > > > client-node
>> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
>> thousands
>> >> > >> >> records.
>> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all
>> >> the
>> >> > >> >> results
>> >> > >> >> > > are
>> >> > >> >> > > > > added
>> >> > >> >> > > > > > > to
>> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
>> arbitrary
>> >> > >> order
>> >> > >> >> by
>> >> > >> >> > > > pages.
>> >> > >> >> > > > > > > > > > I did not find here any means to deliver
>> >> > >> deterministic
>> >> > >> >> > > result.
>> >> > >> >> > > > > > > > > > So implementing limit as part of query and
>> >> > >> >> > > > > (GridCacheQueryRequest)
>> >> > >> >> > > > > > > will
>> >> > >> >> > > > > > > > > not
>> >> > >> >> > > > > > > > > > change the nature of response but will limit
>> load
>> >> > on
>> >> > >> >> nodes
>> >> > >> >> > > and
>> >> > >> >> > > > > > > > > networking.
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
>> exposition
>> >> to
>> >> > >> >> Ignite
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > a) Sorting
>> >> > >> >> > > > > > > > > > The solution for this could be:
>> >> > >> >> > > > > > > > > > - Make entities comparable
>> >> > >> >> > > > > > > > > > - Add custom comparator to entity
>> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
>> >> Lucene
>> >> > >> >> indexing
>> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
>> >> > reducing
>> >> > >> to
>> >> > >> >> > > desired
>> >> > >> >> > > > > > > limit on
>> >> > >> >> > > > > > > > > > client node.
>> >> > >> >> > > > > > > > > > Will require full result set to be loaded into
>> >> > >> memory.
>> >> > >> >> > Though
>> >> > >> >> > > > > can be
>> >> > >> >> > > > > > > used
>> >> > >> >> > > > > > > > > > for relatively small limits.
>> >> > >> >> > > > > > > > > > BR,
>> >> > >> >> > > > > > > > > > Yuriy Shuliha
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
>> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
>> >> > >> >> > > > > > > > > > пише:
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Yuriy,
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
>> >> queries
>> >> > is
>> >> > >> >> [1]
>> >> > >> >> > > which
>> >> > >> >> > > > > makes
>> >> > >> >> > > > > > > > > > lucene
>> >> > >> >> > > > > > > > > > > indexes unusable with persistence and main
>> >> reason
>> >> > >> for
>> >> > >> >> > > > > > > discontinuation.
>> >> > >> >> > > > > > > > > > > Probably it's should be addressed first to
>> make
>> >> > >> text
>> >> > >> >> > > queries
>> >> > >> >> > > > a
>> >> > >> >> > > > > > > valid
>> >> > >> >> > > > > > > > > > > product feature.
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is
>> >> > indeed
>> >> > >> >> not a
>> >> > >> >> > > > > trivial
>> >> > >> >> > > > > > > task.
>> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on
>> >> query
>> >> > >> >> > > originating
>> >> > >> >> > > > > node.
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > [1]
>> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda <
>> >> > >> >> > >  dmagda@apache.org
>> >> > >> >> > > > >:
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > Yuriy,
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > If you are ready to take over the
>> full-text
>> >> > >> search
>> >> > >> >> > > indexes
>> >> > >> >> > > > > then
>> >> > >> >> > > > > > > > > please
>> >> > >> >> > > > > > > > > > go
>> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
>> community
>> >> > >> wants to
>> >> > >> >> > > > > discontinue
>> >> > >> >> > > > > > > them
>> >> > >> >> > > > > > > > > > > first
>> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the
>> >> > >> limitations
>> >> > >> >> > > listed
>> >> > >> >> > > > > by
>> >> > >> >> > > > > > > Andrey
>> >> > >> >> > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > minimal support from the community end.
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > -
>> >> > >> >> > > > > > > > > > > > Denis
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
>> >> > Mashenkov
>> >> > >> <
>> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
>> >> > >> >> > > > > > > > > > > > wrote:
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
>> >> > discontinue
>> >> > >> >> > > > TextQueries
>> >> > >> >> > > > > in
>> >> > >> >> > > > > > > > > Ignite
>> >> > >> >> > > > > > > > > > > [1].
>> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not
>> >> > >> >> persistent,
>> >> > >> >> > not
>> >> > >> >> > > > > > > > > transactional
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
>> inside
>> >> > SQL.
>> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
>> >> > community
>> >> > >> >> side.
>> >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues
>> and
>> >> > >> make
>> >> > >> >> > > > > TextQueries
>> >> > >> >> > > > > > > great.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
>> >> > resultset.
>> >> > >> >> > > > > > > > > > > > > Query results return from data node to
>> >> > >> client-side
>> >> > >> >> > > cursor
>> >> > >> >> > > > > in
>> >> > >> >> > > > > > > > > > > page-by-page
>> >> > >> >> > > > > > > > > > > > > manner and
>> >> > >> >> > > > > > > > > > > > > this parameter is designed control page
>> >> size.
>> >> > >> It
>> >> > >> >> is
>> >> > >> >> > > > > supposed
>> >> > >> >> > > > > > > query
>> >> > >> >> > > > > > > > > > > > executes
>> >> > >> >> > > > > > > > > > > > > lazily on server side and
>> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
>> loaded
>> >> > to
>> >> > >> >> memory
>> >> > >> >> > > on
>> >> > >> >> > > > > server
>> >> > >> >> > > > > > > > > side
>> >> > >> >> > > > > > > > > > at
>> >> > >> >> > > > > > > > > > > > > once, but by pages.
>> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire
>> >> > >> resultset
>> >> > >> >> > into
>> >> > >> >> > > > > memory
>> >> > >> >> > > > > > > > > before
>> >> > >> >> > > > > > > > > > > > first
>> >> > >> >> > > > > > > > > > > > > page is sent to client?
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
>> added
>> >> to
>> >> > >> limit
>> >> > >> >> > > > result.
>> >> > >> >> > > > > The
>> >> > >> >> > > > > > > best
>> >> > >> >> > > > > > > > > > > > > solution is to use query language
>> commands
>> >> > for
>> >> > >> >> this,
>> >> > >> >> > > e.g.
>> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
>> >> > >> >> > > > > > > > > > > > in
>> >> > >> >> > > > > > > > > > > > > SQL.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is
>> >> > >> >> distributed
>> >> > >> >> > > > > operation
>> >> > >> >> > > > > > > and
>> >> > >> >> > > > > > > > > > same
>> >> > >> >> > > > > > > > > > > > > user query will be executed on data
>> nodes
>> >> > >> >> > > > > > > > > > > > > and then results from all nodes should
>> be
>> >> > >> correcly
>> >> > >> >> > > merged
>> >> > >> >> > > > > > > before
>> >> > >> >> > > > > > > > > > being
>> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
>> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
>> node
>> >> and
>> >> > >> >> then on
>> >> > >> >> > > > merge
>> >> > >> >> > > > > > > phase.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting
>> >> > results
>> >> > >> >> make
>> >> > >> >> > no
>> >> > >> >> > > > > sence
>> >> > >> >> > > > > > > > > without
>> >> > >> >> > > > > > > > > > > > > sorting,
>> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
>> query
>> >> run
>> >> > >> will
>> >> > >> >> > > return
>> >> > >> >> > > > > same
>> >> > >> >> > > > > > > data
>> >> > >> >> > > > > > > > > > > > because
>> >> > >> >> > > > > > > > > > > > > of page reordeing.
>> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results
>> from
>> >> > >> data
>> >> > >> >> > nodes
>> >> > >> >> > > > > > > > > asynchronously
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > messages from different nodes can't be
>> >> > ordered.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 2.
>> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
>> >> > @QueryTextFiled)
>> >> > >> >> looks
>> >> > >> >> > > more
>> >> > >> >> > > > > > > verbose,
>> >> > >> >> > > > > > > > > > > isn't
>> >> > >> >> > > > > > > > > > > > > it.
>> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How
>> >> > partial
>> >> > >> >> > results
>> >> > >> >> > > > from
>> >> > >> >> > > > > > > nodes
>> >> > >> >> > > > > > > > > > will
>> >> > >> >> > > > > > > > > > > be
>> >> > >> >> > > > > > > > > > > > > merged?
>> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
>> comparator
>> >> > for
>> >> > >> >> data
>> >> > >> >> > > > > sorting?
>> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to
>> >> sort
>> >> > >> >> result
>> >> > >> >> > on
>> >> > >> >> > > > > merge
>> >> > >> >> > > > > > > phase?
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
>> >> configurable
>> >> > at
>> >> > >> >> all.
>> >> > >> >> > > E.g.
>> >> > >> >> > > > > it is
>> >> > >> >> > > > > > > > > > > > impossible
>> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
>> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
>> configure
>> >> > >> engine
>> >> > >> >> at
>> >> > >> >> > > > first
>> >> > >> >> > > > > and
>> >> > >> >> > > > > > > only
>> >> > >> >> > > > > > > > > > > then
>> >> > >> >> > > > > > > > > > > > go
>> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
>> >> > features,
>> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
>> >> > Shuliga <
>> >> > >> >> > > > > > >  shuliga@gmail.com >
>> >> > >> >> > > > > > > > > > > wrote:
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Dear community,
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
>> open
>> >> > >> >> discussion
>> >> > >> >> > > that
>> >> > >> >> > > > > would
>> >> > >> >> > > > > > > > > come
>> >> > >> >> > > > > > > > > > to
>> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
>> backed
>> >> up
>> >> > >> by
>> >> > >> >> > > > different
>> >> > >> >> > > > > > > > > > mechanisms,
>> >> > >> >> > > > > > > > > > > > > > including Lucene.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past
>> >> year
>> >> > >> >> > release).
>> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
>> >> technology
>> >> > >> that
>> >> > >> >> > > covers
>> >> > >> >> > > > > text
>> >> > >> >> > > > > > > > > search
>> >> > >> >> > > > > > > > > > > > area
>> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
>> indexing).
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
>> >> > >> functionality
>> >> > >> >> to
>> >> > >> >> > > > Ignite
>> >> > >> >> > > > > > > > > indexing
>> >> > >> >> > > > > > > > > > > and
>> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > It's quite simple request at current
>> >> stage.
>> >> > >> It
>> >> > >> >> is
>> >> > >> >> > > > coming
>> >> > >> >> > > > > > > from our
>> >> > >> >> > > > > > > > > > > > > project's
>> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful
>> for
>> >> a
>> >> > >> lot
>> >> > >> >> more
>> >> > >> >> > > > > people.
>> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss
>> >> > about
>> >> > >> >> Jira
>> >> > >> >> > > > > tickets for
>> >> > >> >> > > > > > > > > them.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
>> dataQuery.getPageSize()
>> >> > to
>> >> > >> >> limit
>> >> > >> >> > > > search
>> >> > >> >> > > > > > > > > response
>> >> > >> >> > > > > > > > > > > > items
>> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
>> Currently
>> >> > it
>> >> > >> is
>> >> > >> >> > > calling
>> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
>> >> > >> >> *Integer.MAX_VALUE*) -
>> >> > >> >> > so
>> >> > >> >> > > > > > > basically
>> >> > >> >> > > > > > > > > all
>> >> > >> >> > > > > > > > > > > > > scored
>> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do
>> not
>> >> > >> need in
>> >> > >> >> > most
>> >> > >> >> > > > > cases.
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
>> >> capable
>> >> > >> >> search
>> >> > >> >> > > call
>> >> > >> >> > > > > can be
>> >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query,
>> >> > count,
>> >> > >> >> > > > > > > > > > > > > > sort) *
>> >> > >> >> > > > > > > > > > > > > > Implementation steps:
>> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
>> >> parameter
>> >> > in
>> >> > >> >> > > > > > > *@QueryTextFiled *
>> >> > >> >> > > > > > > > > > > > > > annotation. If
>> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but
>> not
>> >> > >> >> tokenized.
>> >> > >> >> > > > > Number
>> >> > >> >> > > > > > > types
>> >> > >> >> > > > > > > > > > are
>> >> > >> >> > > > > > > > > > > > > > preferred here.
>> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
>> *TextQuery*
>> >> > >> >> > constructor.
>> >> > >> >> > > It
>> >> > >> >> > > > > > > should
>> >> > >> >> > > > > > > > > > define
>> >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying.
>> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
>> >> > >> >> > > > > GridLuceneIndex.query().
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
>> with
>> >> > >> >> > *TextQuery*,
>> >> > >> >> > > > > > > including
>> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
>> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
>> >> requires
>> >> > >> more
>> >> > >> >> > > > detailed
>> >> > >> >> > > > > > > work.
>> >> > >> >> > > > > > > > > > Should
>> >> > >> >> > > > > > > > > > > > be
>> >> > >> >> > > > > > > > > > > > > > extended if community is interested in
>> >> it.*
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > > BR,
>> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
>> >> > >> >> > > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > > > --
>> >> > >> >> > > > > > > > > > > > > Best regards,
>> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
>> >> > >> >> > > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > >
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > --
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > > > Best regards,
>> >> > >> >> > > > > > > > > > > Alexei Scherbakov
>> >> > >> >> > > > > > > > > > >
>> >> > >> >> > > > > > > > > >
>> >> > >> >> > > > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > >
>> >> > >> >> > > > > > > --
>> >> > >> >> > > > > > > Best regards,
>> >> > >> >> > > > > > > Ivan Pavlukhin
>> >> > >> >> > > > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > >
>> >> > >> >> > > > > --
>> >> > >> >> > > > > Best regards,
>> >> > >> >> > > > > Ivan Pavlukhin
>> >> > >> >> > > > >
>> >> > >> >> > > >
>> >> > >> >> > >
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > --
>> >> > >> >> > Best regards,
>> >> > >> >> > Andrey V. Mashenkov
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> > >> > --
>> >> > >> > Best regards,
>> >> > >> > Andrey V. Mashenkov
>> >> > >> >
>> >> > >>
>> >> > >
>> >> >
>> >> > --
>> >> > Best regards,
>> >> > Andrey V. Mashenkov
>> >> >
>> >>
>>
>>
>>
>>
>

Re: Re[2]: Text queries/indexes (GridLuceneIndex, @QueryTextFiled)

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

The problem here is that Solr is a multi-year effort by a lot of people. We
can't match that.

Maybe we could integrate with Solr/Solr Cloud instead, by feeding our cache
information into their storage for indexing and relying on their own
mechanisms for distributed IR sorting?

Regards,
-- 
Ilya Kasnacheev


вт, 26 нояб. 2019 г. в 13:59, Zhenya Stanilovsky <arzamas123@mail.ru.invalid
>:

>
> Ilya Kasnacheev, what a problem in Solr with Ignite functionality ?
>
> thanks !
>
> >Вторник, 26 ноября 2019, 13:50 +03:00 от Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com>:
> >
> >Hello!
> >
> >I have a hunch that we are trying to build Apache Solr (or Solr Cloud)
> into
> >Apache Ignite. I think that's a lot of effort that is not very justified.
> >
> >I don't think we should try to implement sorting in Apache Ignite, because
> >it is a lot of work, and a lot of code in our code base which we don't
> >really want.
> >
> >Regards,
> >--
> >Ilya Kasnacheev
> >
> >
> >пт, 22 нояб. 2019 г. в 20:59, Yuriy Shuliga < shuliga@gmail.com >:
> >
> >> Dear Igniters,
> >>
> >> The first part of TextQuery improvement - a result limit - was developed
> >> and merged.
> >> Now we have to develop most important functionality here - proper
> sorting
> >> of Lucene index response and correct reducing of them for distributed
> >> queries.
> >>
> >> *There are two Lucene based aspects*
> >>
> >> 1. In case of using no sorting fields, the documents in response are
> still
> >> ordered by relevance.
> >> Actually this is ScoreDoc.score value.
> >> In order to reduce the distributed results correctly, the score should
> be
> >> passed with response.
> >>
> >> 2. When sorting by conventional fields, then Lucene should have these
> >> fields properly indexed and
> >> corresponding Sort object should be applied to Lucene's search call.
> >> In order to mark those fields a new annotation like '@SortField' may be
> >> introduced.
> >>
> >> *Reducing on Ignite *
> >>
> >> The obvious point of distributed response reduction is class
> >> GridCacheDistributedQueryFuture.
> >> Though, @Ivan Pavlukhin mentioned class with similar functionality:
> >> ReduceIndexSorted
> >> What I see here, that it is tangled with H2 related classes (
> >> org.h2.result.Row) and might not be unified with TextQuery reduction.
> >>
> >> Still need a support here.
> >>
> >> Overall, the goal of this letter is to initiate discussion on TextQuery
> >> Sorting implementation and come closer to ticket creation.
> >>
> >> BR,
> >> Yuriy Shuliha
> >>
> >> вт, 22 жовт. 2019 о 13:31 Andrey Mashenkov < andrey.mashenkov@gmail.com
> >
> >> пише:
> >>
> >> > Hi Dmitry, Yuriy.
> >> >
> >> > I've found GridCacheQueryFutureAdapter has newly added AtomicInteger
> >> > 'total' field and 'limit; field as primitive int.
> >> >
> >> > Both fields are used inside synchronized block only.
> >> > So, we can make both private and downgrade AtomicInteger to primitive
> >> int.
> >> >
> >> > Most likely, these fields can be replaced with one field.
> >> >
> >> >
> >> >
> >> > On Mon, Oct 21, 2019 at 10:01 PM Dmitriy Pavlov < dpavlov@apache.org
> >
> >> > wrote:
> >> >
> >> > > Hi Andrey,
> >> > >
> >> > > I've checked this ticket comments, and there is a TC Bot visa (with
> no
> >> > > blockers).
> >> > >
> >> > > Do you have any concerns related to this patch?
> >> > >
> >> > > Sincerely,
> >> > > Dmitriy Pavlov
> >> > >
> >> > > чт, 17 окт. 2019 г. в 16:43, Yuriy Shuliga < shuliga@gmail.com >:
> >> > >
> >> > >> Andrey,
> >> > >>
> >> > >> Per you request, I created ticket
> >> > >>  https://issues.apache.org/jira/browse/IGNITE-12291 linked to
> >> > >>
> https://issues.apache.org/jira/projects/IGNITE/issues/IGNITE-12189
> >> > >>
> >> > >> Could you please proceed with PR merge ?
> >> > >>
> >> > >> BR,
> >> > >> Yuriy Shuliha
> >> > >>
> >> > >> ср, 9 жовт. 2019 о 12:52 Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> >> >
> >> > >> пише:
> >> > >>
> >> > >> > Hi Yuri,
> >> > >> >
> >> > >> > To get access to TC Bot you should register as TeamCity user
> [1], if
> >> > you
> >> > >> > didn't do this already.
> >> > >> > Then you will be able to authorize on Ignite TC Bot page with
> same
> >> > >> > credentials.
> >> > >> >
> >> > >> > [1]  https://ci.ignite.apache.org/registerUser.html
> >> > >> >
> >> > >> > On Fri, Oct 4, 2019 at 3:10 PM Yuriy Shuliga < shuliga@gmail.com
> >
> >> > wrote:
> >> > >> >
> >> > >> >> Andrew,
> >> > >> >>
> >> > >> >> I have corrected PR according to your notes. Please review.
> >> > >> >> What will be the next steps in order to merge in?
> >> > >> >>
> >> > >> >> Y.
> >> > >> >>
> >> > >> >> чт, 3 жовт. 2019 о 17:47 Andrey Mashenkov <
> >> >  andrey.mashenkov@gmail.com >
> >> > >> >> пише:
> >> > >> >>
> >> > >> >> > Yuri,
> >> > >> >> >
> >> > >> >> > I've done with review.
> >> > >> >> > No crime found, but trivial compatibility bug.
> >> > >> >> >
> >> > >> >> > On Thu, Oct 3, 2019 at 3:54 PM Yuriy Shuliga <
> shuliga@gmail.com >
> >> > >> wrote:
> >> > >> >> >
> >> > >> >> > > Denis,
> >> > >> >> > >
> >> > >> >> > > Thank you for your attention to this.
> >> > >> >> > > as for now, the
> >> >  https://issues.apache.org/jira/browse/IGNITE-12189
> >> > >> >> > ticket
> >> > >> >> > > is still pending review.
> >> > >> >> > > Do we have a chance to move it forward somehow?
> >> > >> >> > >
> >> > >> >> > > BR,
> >> > >> >> > > Yuriy Shuliha
> >> > >> >> > >
> >> > >> >> > > пн, 30 вер. 2019 о 23:35 Denis Magda < dmagda@apache.org >
> пише:
> >> > >> >> > >
> >> > >> >> > > > Yuriy,
> >> > >> >> > > >
> >> > >> >> > > > I've seen you opening a pull-request with the first
> changes:
> >> > >> >> > > >  https://issues.apache.org/jira/browse/IGNITE-12189
> >> > >> >> > > >
> >> > >> >> > > > Alex Scherbakov and Ivan are you the right guys to do the
> >> > review?
> >> > >> >> > > >
> >> > >> >> > > > -
> >> > >> >> > > > Denis
> >> > >> >> > > >
> >> > >> >> > > >
> >> > >> >> > > > On Fri, Sep 27, 2019 at 8:48 AM Павлухин Иван <
> >> > >>  vololo100@gmail.com >
> >> > >> >> > > wrote:
> >> > >> >> > > >
> >> > >> >> > > > > Yuriy,
> >> > >> >> > > > >
> >> > >> >> > > > > Thank you for providing details! Quite interesting.
> >> > >> >> > > > >
> >> > >> >> > > > > Yes, we already have support of distributed limit and
> >> merging
> >> > >> >> sorted
> >> > >> >> > > > > subresults for SQL queries. E.g. ReduceIndexSorted and
> >> > >> >> > > > > MergeStreamIterator are used for merging sorted streams.
> >> > >> >> > > > >
> >> > >> >> > > > > Could you please also clarify about score/relevance? Is
> it
> >> > >> >> provided
> >> > >> >> > by
> >> > >> >> > > > > Lucene engine for each query result? I am thinking how
> to
> >> do
> >> > >> >> sorted
> >> > >> >> > > > > merge properly in this case.
> >> > >> >> > > > >
> >> > >> >> > > > > ср, 25 сент. 2019 г. в 18:56, Yuriy Shuliga <
> >> >  shuliga@gmail.com
> >> > >> >:
> >> > >> >> > > > > >
> >> > >> >> > > > > > Ivan,
> >> > >> >> > > > > >
> >> > >> >> > > > > > Thank you for interesting question!
> >> > >> >> > > > > >
> >> > >> >> > > > > > Text searches (or full text searches) are mostly
> >> > >> human-oriented.
> >> > >> >> > And
> >> > >> >> > > > the
> >> > >> >> > > > > > point of user's interest is topmost part of response.
> >> > >> >> > > > > > Then user can read it, evaluate and use the given
> records
> >> > for
> >> > >> >> > further
> >> > >> >> > > > > > purposes.
> >> > >> >> > > > > >
> >> > >> >> > > > > > Particularly in our case, we use Ignite for operations
> >> with
> >> > >> >> > financial
> >> > >> >> > > > > data,
> >> > >> >> > > > > > and there lots of text stuff like assets names, fin.
> >> > >> >> instruments,
> >> > >> >> > > > > companies
> >> > >> >> > > > > > etc.
> >> > >> >> > > > > > In order to operate with this quickly and reliably,
> users
> >> > >> used
> >> > >> >> to
> >> > >> >> > > work
> >> > >> >> > > > > with
> >> > >> >> > > > > > text search, type-ahead completions, suggestions.
> >> > >> >> > > > > >
> >> > >> >> > > > > > For this purposes we are indexing particular string
> data
> >> in
> >> > >> >> > separate
> >> > >> >> > > > > caches.
> >> > >> >> > > > > >
> >> > >> >> > > > > > Sorting capabilities and response size limitations are
> >> very
> >> > >> >> > important
> >> > >> >> > > > > > there. As our API have to provide most relevant
> >> information
> >> > >> in
> >> > >> >> view
> >> > >> >> > > of
> >> > >> >> > > > > > limited size.
> >> > >> >> > > > > >
> >> > >> >> > > > > > Now let me comment some Ignite/Lucene perspective.
> >> > >> >> > > > > > Actually Ignite queries and Lucene returns
> >> > >> *TopDocs.scoresDocs
> >> > >> >> > > *already
> >> > >> >> > > > > > sorted by *score *(relevance). So most relevant
> documents
> >> > >> are on
> >> > >> >> > the
> >> > >> >> > > > top.
> >> > >> >> > > > > > And currently distributed queries responses from
> >> different
> >> > >> nodes
> >> > >> >> > are
> >> > >> >> > > > > merged
> >> > >> >> > > > > > into final query cursor queue in arbitrary way.
> >> > >> >> > > > > > So in fact we already have the score order ruined
> here.
> >> > Also
> >> > >> >> Ignite
> >> > >> >> > > > > > requests all possible documents from Lucene that is
> >> > redundant
> >> > >> >> and
> >> > >> >> > not
> >> > >> >> > > > > good
> >> > >> >> > > > > > for performance.
> >> > >> >> > > > > >
> >> > >> >> > > > > > I'm implementing *limit* parameter to be part of
> >> *TextQuery
> >> > >> *and
> >> > >> >> > have
> >> > >> >> > > > to
> >> > >> >> > > > > > notice that we still have to add sorting for text
> queries
> >> > >> >> > processing
> >> > >> >> > > in
> >> > >> >> > > > > > order to have applicable results.
> >> > >> >> > > > > >
> >> > >> >> > > > > > *Limit* parameter itself should improve the part of
> >> issues
> >> > >> from
> >> > >> >> > > above,
> >> > >> >> > > > > but
> >> > >> >> > > > > > definitely, sorting by document score at least should
> be
> >> > >> >> > implemented
> >> > >> >> > > > > along
> >> > >> >> > > > > > with limit.
> >> > >> >> > > > > >
> >> > >> >> > > > > > This is a pretty short commentary if you still have
> any
> >> > >> >> questions,
> >> > >> >> > > > please
> >> > >> >> > > > > > ask, do not hesitate)
> >> > >> >> > > > > >
> >> > >> >> > > > > > BR,
> >> > >> >> > > > > > Yuriy Shuliha
> >> > >> >> > > > > >
> >> > >> >> > > > > > чт, 19 вер. 2019 о 11:38 Павлухин Иван <
> >> >  vololo100@gmail.com >
> >> > >> >> пише:
> >> > >> >> > > > > >
> >> > >> >> > > > > > > Yuriy,
> >> > >> >> > > > > > >
> >> > >> >> > > > > > > Greatly appreciate your interest.
> >> > >> >> > > > > > >
> >> > >> >> > > > > > > Could you please elaborate a little bit about
> sorting?
> >> > What
> >> > >> >> tasks
> >> > >> >> > > > does
> >> > >> >> > > > > > > it help to solve and how? It would be great to
> provide
> >> an
> >> > >> >> > example.
> >> > >> >> > > > > > >
> >> > >> >> > > > > > > ср, 18 сент. 2019 г. в 09:39, Alexei Scherbakov <
> >> > >> >> > > > > > >  alexey.scherbakoff@gmail.com >:
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > Denis,
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > I like the idea of throwing an exception for
> enabled
> >> > text
> >> > >> >> > queries
> >> > >> >> > > > on
> >> > >> >> > > > > > > > persistent caches.
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > Also I'm fine with proposed limit for unsorted
> >> > searches.
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > Yury, please proceed with ticket creation.
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > вт, 17 сент. 2019 г., 22:06 Denis Magda <
> >> > >>  dmagda@apache.org
> >> > >> >> >:
> >> > >> >> > > > > > > >
> >> > >> >> > > > > > > > > Igniters,
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > I see nothing wrong with Yury's proposal in
> regards
> >> > >> >> full-text
> >> > >> >> > > > > search
> >> > >> >> > > > > > > API
> >> > >> >> > > > > > > > > evolution as long as Yury is ready to push it
> >> > forward.
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > As for the in-memory mode only, it makes total
> >> sense
> >> > >> for
> >> > >> >> > > > in-memory
> >> > >> >> > > > > data
> >> > >> >> > > > > > > > > grid deployments when Ignite caches data of an
> >> > >> underlying
> >> > >> >> DB
> >> > >> >> > > like
> >> > >> >> > > > > > > Postgres.
> >> > >> >> > > > > > > > > As part of the changes, I would simply throw an
> >> > >> exception
> >> > >> >> (by
> >> > >> >> > > > > default)
> >> > >> >> > > > > > > if
> >> > >> >> > > > > > > > > the one attempts to use text indices with the
> >> native
> >> > >> >> > > persistence
> >> > >> >> > > > > > > enabled.
> >> > >> >> > > > > > > > > If the person is ready to live with that
> limitation
> >> > >> that
> >> > >> >> an
> >> > >> >> > > > > explicit
> >> > >> >> > > > > > > > > configuration change is needed to come around
> the
> >> > >> >> exception.
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > Thoughts?
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > -
> >> > >> >> > > > > > > > > Denis
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > On Tue, Sep 17, 2019 at 7:44 AM Yuriy Shuliga <
> >> > >> >> > >  shuliga@gmail.com
> >> > >> >> > > > >
> >> > >> >> > > > > > > wrote:
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > > > > > Hello to all again,
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > Thank you for important comments and notes
> given
> >> > >> below!
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > Let me answer and continue the discussion.
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > (I) Overall needs in Lucene indexing
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > Alexei has referenced to
> >> > >> >> > > > > > > > > >
> >>  https://issues.apache.org/jira/browse/IGNITE-5371
> >> > >> where
> >> > >> >> > > > > > > > > > absence of index persistence was declared as
> an
> >> > >> >> obstacle to
> >> > >> >> > > > > further
> >> > >> >> > > > > > > > > > development.
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > a) This ticket is already closed as not
> valid.b)
> >> > >> There
> >> > >> >> are
> >> > >> >> > > > > definite
> >> > >> >> > > > > > > needs
> >> > >> >> > > > > > > > > > (and in our project as well) in just in-memory
> >> > >> indexing
> >> > >> >> of
> >> > >> >> > > > > selected
> >> > >> >> > > > > > > data.
> >> > >> >> > > > > > > > > > We intend to use search capabilities for
> fetching
> >> > >> >> limited
> >> > >> >> > > > amount
> >> > >> >> > > > > of
> >> > >> >> > > > > > > > > records
> >> > >> >> > > > > > > > > > that should be used in type-ahead search /
> >> > >> suggestions.
> >> > >> >> > > > > > > > > > Not all of the data will be indexed and the
> are
> >> no
> >> > >> need
> >> > >> >> in
> >> > >> >> > > > Lucene
> >> > >> >> > > > > > > index
> >> > >> >> > > > > > > > > to
> >> > >> >> > > > > > > > > > be persistence. Hope this is a wide pattern of
> >> > >> >> text-search
> >> > >> >> > > > usage.
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > (II) Necessary fixes in current
> implementation.
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > a) Implementation of correct *limit *(*offset*
> >> > seems
> >> > >> to
> >> > >> >> be
> >> > >> >> > > not
> >> > >> >> > > > > > > required
> >> > >> >> > > > > > > > > in
> >> > >> >> > > > > > > > > > text-search tasks for now)
> >> > >> >> > > > > > > > > > I have investigated the data flow for
> distributed
> >> > >> text
> >> > >> >> > > queries.
> >> > >> >> > > > > it
> >> > >> >> > > > > > > was
> >> > >> >> > > > > > > > > > simple test prefix query, like 'name'*='ene*'*
> >> > >> >> > > > > > > > > > For now each server-node returns all response
> >> > >> records to
> >> > >> >> > the
> >> > >> >> > > > > > > client-node
> >> > >> >> > > > > > > > > > and it may contain ~thousands, ~hundred
> thousands
> >> > >> >> records.
> >> > >> >> > > > > > > > > > Event if we need only first 10-100. Again, all
> >> the
> >> > >> >> results
> >> > >> >> > > are
> >> > >> >> > > > > added
> >> > >> >> > > > > > > to
> >> > >> >> > > > > > > > > > queue in GridCacheQueryFutureAdapter in
> arbitrary
> >> > >> order
> >> > >> >> by
> >> > >> >> > > > pages.
> >> > >> >> > > > > > > > > > I did not find here any means to deliver
> >> > >> deterministic
> >> > >> >> > > result.
> >> > >> >> > > > > > > > > > So implementing limit as part of query and
> >> > >> >> > > > > (GridCacheQueryRequest)
> >> > >> >> > > > > > > will
> >> > >> >> > > > > > > > > not
> >> > >> >> > > > > > > > > > change the nature of response but will limit
> load
> >> > on
> >> > >> >> nodes
> >> > >> >> > > and
> >> > >> >> > > > > > > > > networking.
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > Can we consider to open a ticket for this?
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > (III) Further extension of Lucene API
> exposition
> >> to
> >> > >> >> Ignite
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > a) Sorting
> >> > >> >> > > > > > > > > > The solution for this could be:
> >> > >> >> > > > > > > > > > - Make entities comparable
> >> > >> >> > > > > > > > > > - Add custom comparator to entity
> >> > >> >> > > > > > > > > > - Add annotations to mark sorted fields for
> >> Lucene
> >> > >> >> indexing
> >> > >> >> > > > > > > > > > - Use comparators when merging responses or
> >> > reducing
> >> > >> to
> >> > >> >> > > desired
> >> > >> >> > > > > > > limit on
> >> > >> >> > > > > > > > > > client node.
> >> > >> >> > > > > > > > > > Will require full result set to be loaded into
> >> > >> memory.
> >> > >> >> > Though
> >> > >> >> > > > > can be
> >> > >> >> > > > > > > used
> >> > >> >> > > > > > > > > > for relatively small limits.
> >> > >> >> > > > > > > > > > BR,
> >> > >> >> > > > > > > > > > Yuriy Shuliha
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > пт, 30 серп. 2019 о 10:37 Alexei Scherbakov <
> >> > >> >> > > > > > > > >  alexey.scherbakoff@gmail.com >
> >> > >> >> > > > > > > > > > пише:
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > > > > Yuriy,
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > Note what one of major blockers for text
> >> queries
> >> > is
> >> > >> >> [1]
> >> > >> >> > > which
> >> > >> >> > > > > makes
> >> > >> >> > > > > > > > > > lucene
> >> > >> >> > > > > > > > > > > indexes unusable with persistence and main
> >> reason
> >> > >> for
> >> > >> >> > > > > > > discontinuation.
> >> > >> >> > > > > > > > > > > Probably it's should be addressed first to
> make
> >> > >> text
> >> > >> >> > > queries
> >> > >> >> > > > a
> >> > >> >> > > > > > > valid
> >> > >> >> > > > > > > > > > > product feature.
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > Distributed sorting and advanved querying is
> >> > indeed
> >> > >> >> not a
> >> > >> >> > > > > trivial
> >> > >> >> > > > > > > task.
> >> > >> >> > > > > > > > > > > Some kind of merging must be implemented on
> >> query
> >> > >> >> > > originating
> >> > >> >> > > > > node.
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > [1]
> >> > >>  https://issues.apache.org/jira/browse/IGNITE-5371
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > чт, 29 авг. 2019 г. в 23:38, Denis Magda <
> >> > >> >> > >  dmagda@apache.org
> >> > >> >> > > > >:
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > Yuriy,
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > If you are ready to take over the
> full-text
> >> > >> search
> >> > >> >> > > indexes
> >> > >> >> > > > > then
> >> > >> >> > > > > > > > > please
> >> > >> >> > > > > > > > > > go
> >> > >> >> > > > > > > > > > > > ahead. The primary reason why the
> community
> >> > >> wants to
> >> > >> >> > > > > discontinue
> >> > >> >> > > > > > > them
> >> > >> >> > > > > > > > > > > first
> >> > >> >> > > > > > > > > > > > (and, probable, resurrect later) are the
> >> > >> limitations
> >> > >> >> > > listed
> >> > >> >> > > > > by
> >> > >> >> > > > > > > Andrey
> >> > >> >> > > > > > > > > > and
> >> > >> >> > > > > > > > > > > > minimal support from the community end.
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > -
> >> > >> >> > > > > > > > > > > > Denis
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > On Thu, Aug 29, 2019 at 1:29 PM Andrey
> >> > Mashenkov
> >> > >> <
> >> > >> >> > > > > > > > > > > >  andrey.mashenkov@gmail.com >
> >> > >> >> > > > > > > > > > > > wrote:
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > Hi Yuriy,
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > Unfortunatelly, there is a plan to
> >> > discontinue
> >> > >> >> > > > TextQueries
> >> > >> >> > > > > in
> >> > >> >> > > > > > > > > Ignite
> >> > >> >> > > > > > > > > > > [1].
> >> > >> >> > > > > > > > > > > > > Motivation here is text indexes are not
> >> > >> >> persistent,
> >> > >> >> > not
> >> > >> >> > > > > > > > > transactional
> >> > >> >> > > > > > > > > > > and
> >> > >> >> > > > > > > > > > > > > can't be user together with SQL or
> inside
> >> > SQL.
> >> > >> >> > > > > > > > > > > > > and there is a lack of interest from
> >> > community
> >> > >> >> side.
> >> > >> >> > > > > > > > > > > > > You are weclome to take on these issues
> and
> >> > >> make
> >> > >> >> > > > > TextQueries
> >> > >> >> > > > > > > great.
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > 1, PageSize can't be used to limit
> >> > resultset.
> >> > >> >> > > > > > > > > > > > > Query results return from data node to
> >> > >> client-side
> >> > >> >> > > cursor
> >> > >> >> > > > > in
> >> > >> >> > > > > > > > > > > page-by-page
> >> > >> >> > > > > > > > > > > > > manner and
> >> > >> >> > > > > > > > > > > > > this parameter is designed control page
> >> size.
> >> > >> It
> >> > >> >> is
> >> > >> >> > > > > supposed
> >> > >> >> > > > > > > query
> >> > >> >> > > > > > > > > > > > executes
> >> > >> >> > > > > > > > > > > > > lazily on server side and
> >> > >> >> > > > > > > > > > > > > it is not excepted full resultset be
> loaded
> >> > to
> >> > >> >> memory
> >> > >> >> > > on
> >> > >> >> > > > > server
> >> > >> >> > > > > > > > > side
> >> > >> >> > > > > > > > > > at
> >> > >> >> > > > > > > > > > > > > once, but by pages.
> >> > >> >> > > > > > > > > > > > > Do you mean you found Lucene load entire
> >> > >> resultset
> >> > >> >> > into
> >> > >> >> > > > > memory
> >> > >> >> > > > > > > > > before
> >> > >> >> > > > > > > > > > > > first
> >> > >> >> > > > > > > > > > > > > page is sent to client?
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > I'd think a new parameter should be
> added
> >> to
> >> > >> limit
> >> > >> >> > > > result.
> >> > >> >> > > > > The
> >> > >> >> > > > > > > best
> >> > >> >> > > > > > > > > > > > > solution is to use query language
> commands
> >> > for
> >> > >> >> this,
> >> > >> >> > > e.g.
> >> > >> >> > > > > > > > > > > "LIMIT/OFFSET"
> >> > >> >> > > > > > > > > > > > in
> >> > >> >> > > > > > > > > > > > > SQL.
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > This task doesn't look trivial. Query is
> >> > >> >> distributed
> >> > >> >> > > > > operation
> >> > >> >> > > > > > > and
> >> > >> >> > > > > > > > > > same
> >> > >> >> > > > > > > > > > > > > user query will be executed on data
> nodes
> >> > >> >> > > > > > > > > > > > > and then results from all nodes should
> be
> >> > >> correcly
> >> > >> >> > > merged
> >> > >> >> > > > > > > before
> >> > >> >> > > > > > > > > > being
> >> > >> >> > > > > > > > > > > > > returned via client-cursor.
> >> > >> >> > > > > > > > > > > > > So, LIMIT should be applied on every
> node
> >> and
> >> > >> >> then on
> >> > >> >> > > > merge
> >> > >> >> > > > > > > phase.
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > Also, this may be non-obviuos, limiting
> >> > results
> >> > >> >> make
> >> > >> >> > no
> >> > >> >> > > > > sence
> >> > >> >> > > > > > > > > without
> >> > >> >> > > > > > > > > > > > > sorting,
> >> > >> >> > > > > > > > > > > > > as there is no guarantee every next
> query
> >> run
> >> > >> will
> >> > >> >> > > return
> >> > >> >> > > > > same
> >> > >> >> > > > > > > data
> >> > >> >> > > > > > > > > > > > because
> >> > >> >> > > > > > > > > > > > > of page reordeing.
> >> > >> >> > > > > > > > > > > > > Basically, merge phase receive results
> from
> >> > >> data
> >> > >> >> > nodes
> >> > >> >> > > > > > > > > asynchronously
> >> > >> >> > > > > > > > > > > and
> >> > >> >> > > > > > > > > > > > > messages from different nodes can't be
> >> > ordered.
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > 2.
> >> > >> >> > > > > > > > > > > > > a. "tokenize" param name (for
> >> > @QueryTextFiled)
> >> > >> >> looks
> >> > >> >> > > more
> >> > >> >> > > > > > > verbose,
> >> > >> >> > > > > > > > > > > isn't
> >> > >> >> > > > > > > > > > > > > it.
> >> > >> >> > > > > > > > > > > > > b,c. What about distributed query? How
> >> > partial
> >> > >> >> > results
> >> > >> >> > > > from
> >> > >> >> > > > > > > nodes
> >> > >> >> > > > > > > > > > will
> >> > >> >> > > > > > > > > > > be
> >> > >> >> > > > > > > > > > > > > merged?
> >> > >> >> > > > > > > > > > > > > Does Lucene allows to configure
> comparator
> >> > for
> >> > >> >> data
> >> > >> >> > > > > sorting?
> >> > >> >> > > > > > > > > > > > > What comparator Ignite should choose to
> >> sort
> >> > >> >> result
> >> > >> >> > on
> >> > >> >> > > > > merge
> >> > >> >> > > > > > > phase?
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > 3. For now Lucene engine is not
> >> configurable
> >> > at
> >> > >> >> all.
> >> > >> >> > > E.g.
> >> > >> >> > > > > it is
> >> > >> >> > > > > > > > > > > > impossible
> >> > >> >> > > > > > > > > > > > > to configure Tokenizer.
> >> > >> >> > > > > > > > > > > > > I'd think about possible ways to
> configure
> >> > >> engine
> >> > >> >> at
> >> > >> >> > > > first
> >> > >> >> > > > > and
> >> > >> >> > > > > > > only
> >> > >> >> > > > > > > > > > > then
> >> > >> >> > > > > > > > > > > > go
> >> > >> >> > > > > > > > > > > > > further to discuss\implement complex
> >> > features,
> >> > >> >> > > > > > > > > > > > > that may depends on engine config.
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > On Thu, Aug 29, 2019 at 8:17 PM Yuriy
> >> > Shuliga <
> >> > >> >> > > > > > >  shuliga@gmail.com >
> >> > >> >> > > > > > > > > > > wrote:
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > Dear community,
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > By starting this chain I'd like to
> open
> >> > >> >> discussion
> >> > >> >> > > that
> >> > >> >> > > > > would
> >> > >> >> > > > > > > > > come
> >> > >> >> > > > > > > > > > to
> >> > >> >> > > > > > > > > > > > > > contribution results in subj. area.
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > Ignite has indexing capabilities,
> backed
> >> up
> >> > >> by
> >> > >> >> > > > different
> >> > >> >> > > > > > > > > > mechanisms,
> >> > >> >> > > > > > > > > > > > > > including Lucene.
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > Currently, Lucene 7.5.0 is used (past
> >> year
> >> > >> >> > release).
> >> > >> >> > > > > > > > > > > > > > This is a wide spread and mature
> >> technology
> >> > >> that
> >> > >> >> > > covers
> >> > >> >> > > > > text
> >> > >> >> > > > > > > > > search
> >> > >> >> > > > > > > > > > > > area
> >> > >> >> > > > > > > > > > > > > > and beyond (e.g. spacial data
> indexing).
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > My goal is to *expose more Lucene
> >> > >> functionality
> >> > >> >> to
> >> > >> >> > > > Ignite
> >> > >> >> > > > > > > > > indexing
> >> > >> >> > > > > > > > > > > and
> >> > >> >> > > > > > > > > > > > > > query mechanisms for text data*.
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > It's quite simple request at current
> >> stage.
> >> > >> It
> >> > >> >> is
> >> > >> >> > > > coming
> >> > >> >> > > > > > > from our
> >> > >> >> > > > > > > > > > > > > project's
> >> > >> >> > > > > > > > > > > > > > needs, but i believe, will be useful
> for
> >> a
> >> > >> lot
> >> > >> >> more
> >> > >> >> > > > > people.
> >> > >> >> > > > > > > > > > > > > > Let's walk through and vote or discuss
> >> > about
> >> > >> >> Jira
> >> > >> >> > > > > tickets for
> >> > >> >> > > > > > > > > them.
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > 1.[trivial] Use
> dataQuery.getPageSize()
> >> > to
> >> > >> >> limit
> >> > >> >> > > > search
> >> > >> >> > > > > > > > > response
> >> > >> >> > > > > > > > > > > > items
> >> > >> >> > > > > > > > > > > > > > inside GridLuceneIndex.query().
> Currently
> >> > it
> >> > >> is
> >> > >> >> > > calling
> >> > >> >> > > > > > > > > > > > > > IndexSearcher.search(query,
> >> > >> >> *Integer.MAX_VALUE*) -
> >> > >> >> > so
> >> > >> >> > > > > > > basically
> >> > >> >> > > > > > > > > all
> >> > >> >> > > > > > > > > > > > > scored
> >> > >> >> > > > > > > > > > > > > > matches will me returned, what we do
> not
> >> > >> need in
> >> > >> >> > most
> >> > >> >> > > > > cases.
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > 2.[simple] Add sorting. Then more
> >> capable
> >> > >> >> search
> >> > >> >> > > call
> >> > >> >> > > > > can be
> >> > >> >> > > > > > > > > > > > > > executed: *IndexSearcher.search(query,
> >> > count,
> >> > >> >> > > > > > > > > > > > > > sort) *
> >> > >> >> > > > > > > > > > > > > > Implementation steps:
> >> > >> >> > > > > > > > > > > > > > a) Introduce boolean *sortField*
> >> parameter
> >> > in
> >> > >> >> > > > > > > *@QueryTextFiled *
> >> > >> >> > > > > > > > > > > > > > annotation. If
> >> > >> >> > > > > > > > > > > > > > *true *the filed will be indexed but
> not
> >> > >> >> tokenized.
> >> > >> >> > > > > Number
> >> > >> >> > > > > > > types
> >> > >> >> > > > > > > > > > are
> >> > >> >> > > > > > > > > > > > > > preferred here.
> >> > >> >> > > > > > > > > > > > > > b) Add *sort* collection to
> *TextQuery*
> >> > >> >> > constructor.
> >> > >> >> > > It
> >> > >> >> > > > > > > should
> >> > >> >> > > > > > > > > > define
> >> > >> >> > > > > > > > > > > > > > desired sort fields used for querying.
> >> > >> >> > > > > > > > > > > > > > c) Implement Lucene sort usage in
> >> > >> >> > > > > GridLuceneIndex.query().
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > 3.[moderate] Build complex queries
> with
> >> > >> >> > *TextQuery*,
> >> > >> >> > > > > > > including
> >> > >> >> > > > > > > > > > > > > > terms/queries boosting.
> >> > >> >> > > > > > > > > > > > > > *This section for voting only, as
> >> requires
> >> > >> more
> >> > >> >> > > > detailed
> >> > >> >> > > > > > > work.
> >> > >> >> > > > > > > > > > Should
> >> > >> >> > > > > > > > > > > > be
> >> > >> >> > > > > > > > > > > > > > extended if community is interested in
> >> it.*
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > Looking forward to your comments!
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > > BR,
> >> > >> >> > > > > > > > > > > > > > Yuriy Shuliha
> >> > >> >> > > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > > > --
> >> > >> >> > > > > > > > > > > > > Best regards,
> >> > >> >> > > > > > > > > > > > > Andrey V. Mashenkov
> >> > >> >> > > > > > > > > > > > >
> >> > >> >> > > > > > > > > > > >
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > --
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > > > Best regards,
> >> > >> >> > > > > > > > > > > Alexei Scherbakov
> >> > >> >> > > > > > > > > > >
> >> > >> >> > > > > > > > > >
> >> > >> >> > > > > > > > >
> >> > >> >> > > > > > >
> >> > >> >> > > > > > >
> >> > >> >> > > > > > >
> >> > >> >> > > > > > > --
> >> > >> >> > > > > > > Best regards,
> >> > >> >> > > > > > > Ivan Pavlukhin
> >> > >> >> > > > > > >
> >> > >> >> > > > >
> >> > >> >> > > > >
> >> > >> >> > > > >
> >> > >> >> > > > > --
> >> > >> >> > > > > Best regards,
> >> > >> >> > > > > Ivan Pavlukhin
> >> > >> >> > > > >
> >> > >> >> > > >
> >> > >> >> > >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > --
> >> > >> >> > Best regards,
> >> > >> >> > Andrey V. Mashenkov
> >> > >> >> >
> >> > >> >>
> >> > >> >
> >> > >> >
> >> > >> > --
> >> > >> > Best regards,
> >> > >> > Andrey V. Mashenkov
> >> > >> >
> >> > >>
> >> > >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrey V. Mashenkov
> >> >
> >>
>
>
>
>