You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@netbeans.apache.org by Laszlo Kishalmi <la...@gmail.com> on 2023/05/14 00:14:08 UTC

Indexing, Lucene?

Dear all,

Does anyone have a good knowledge on our Indexing API?

I would like to add some search functionality to my HCL/Terraform support.

I do not know too much of Lucene, but what I know is not really match 
with what I see there. Then I checked and we are using Lucene 3.6.2, 
that is more than 10 years old by now.

I've also checked what would it mean to upgrade that to a recent 
version. It seems to be a hard job to take, moving even to Lucene 4.x 
would be hard, at least with my level of knowledge. (Lucene changed 
completely between 3.x to 4.x)

I've seen that the new Maven Indexer comes with Lucene 9.x. So we have 
recent Library support.

My first question, shall we start to do something about the old Lucene?

Shall I invest using our current indexing API?

What I'm leaning for at the moment is, to move the Lucene 9.x library 
out of Maven Indexer as a separate library project then I'd implement my 
things using Lucene 9 directly.

What do you think on that?




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
For additional commands, e-mail: dev-help@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists




Re: Indexing, Lucene?

Posted by Matthias Bläsing <mb...@doppel-helix.eu.INVALID>.
Hi,

first, the basic idea, that underlies alls lucene data is still the
same: You create a "document", add key-value pairs to it and let the
indexer do its magic. When it is done, you can query the index.

Yes we need to move to a newer version of lucene in the core indexer
and it is a problem, that the module exposed lucene internals and some
queries will be hard to adapt, but it is IMHO doable.

For normal queries and usage the Parsing and Indexing API IMHO works. I
think CSS indexer might be approachable, although it carries additional
complexity cause it handles embedding into HTML.

- The indexer is in css.editor and implemented in 
  org.netbeans.modules.css.indexing.CssIndexer. It is created by an
  embedded factory, which is registered in the layer file
  in Editors/text/css. 
- CssIndexer#index is called by the Parsing and Indexing infrastructure
  for each recognized document and when the parsing result is available
- the #index method extracts the IDs, classes, html elements and clors
  from the CSS parser and stores that in the documet
- The document is then stored using the Indexing Support

If you now want to now where inside the project a certain CSS class is
used, you use the convenience function findClasses in
org.netbeans.modules.css.indexing.api.CssIndex. This method builds the
query and dispatches it to the QuerySupport. It will return the
FileObjects, that hold the corresponding classes.

There are dark parts in the API, but the basic usage is ok.

HTH

Matthias

Am Samstag, dem 13.05.2023 um 17:14 -0700 schrieb Laszlo Kishalmi:
> Dear all,
> 
> Does anyone have a good knowledge on our Indexing API?
> 
> I would like to add some search functionality to my HCL/Terraform support.
> 
> I do not know too much of Lucene, but what I know is not really match 
> with what I see there. Then I checked and we are using Lucene 3.6.2, 
> that is more than 10 years old by now.
> 
> I've also checked what would it mean to upgrade that to a recent 
> version. It seems to be a hard job to take, moving even to Lucene 4.x 
> would be hard, at least with my level of knowledge. (Lucene changed 
> completely between 3.x to 4.x)
> 
> I've seen that the new Maven Indexer comes with Lucene 9.x. So we have 
> recent Library support.
> 
> My first question, shall we start to do something about the old Lucene?
> 
> Shall I invest using our current indexing API?
> What I'm leaning for at the moment is, to move the Lucene 9.x library 
> out of Maven Indexer as a separate library project then I'd implement my 
> things using Lucene 9 directly.
> 
> What do you think on that?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
For additional commands, e-mail: dev-help@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists




Re: Indexing, Lucene?

Posted by Laszlo Kishalmi <la...@gmail.com>.
On 5/14/23 02:52, Michael Bien wrote:
> Hi Laszlo,
>
> On 14.05.23 02:14, Laszlo Kishalmi wrote:
>> Dear all,
>>
>> Does anyone have a good knowledge on our Indexing API?
>>
>> I would like to add some search functionality to my HCL/Terraform 
>> support.
>>
>> I do not know too much of Lucene, but what I know is not really match 
>> with what I see there. Then I checked and we are using Lucene 3.6.2, 
>> that is more than 10 years old by now.
>>
>> I've also checked what would it mean to upgrade that to a recent 
>> version. It seems to be a hard job to take, moving even to Lucene 4.x 
>> would be hard, at least with my level of knowledge. (Lucene changed 
>> completely between 3.x to 4.x)
>>
>> I've seen that the new Maven Indexer comes with Lucene 9.x. So we 
>> have recent Library support.
>>
>> My first question, shall we start to do something about the old Lucene?
>
> yes we have to. I started a while ago in a local branch but got stuck 
> in some areas which were not covered by old migration guides. 3->4 had 
> some breaking changes. The problem was also that a lot of code is 
> using the API - as you probably saw.
>
> Then I decided to do the easier part first, which was the 
> maven-indexer module. This is mostly complete and master uses the 
> latest lucene 9.6 branch (although there are some pending refactoring 
> left to get rid of more deprecated code and also some maven-indexer 
> specific features I am working on).
>
> I am no lucene expert either - I mostly learn on the go. I did migrate 
> apache roller from an old version (can't remember) to 9.x a while ago, 
> so it certainly is doable, NB has just much more code.
>
>>
>> Shall I invest using our current indexing API?
>>
>> What I'm leaning for at the moment is, to move the Lucene 9.x library 
>> out of Maven Indexer as a separate library project then I'd implement 
>> my things using Lucene 9 directly.
>
> Using lucene directly would make it difficult to switch to the 
> netbeans API again some time in future, no? We could check if it is 
> possible to fork the NB API+impl and make it compatible with Lucene 9 
> and deploy both versions at the same time?

Seems to be a viable solution. Still it would be heavy lifting.

As far as I've checked Lucene keeps it's low-level API mixed with the 
"high-level" API, and try not to change the later one too much, but with 
the low level ones there are lot's of changes. Unfortunately the Parser 
API Lucene Support leaks those low level stuff as well, however 3.x -> 
4.x has drastic changes on the high level as well.

Also checking the migration guides from 4.0 to 9.0, it seems to be the 
biggest jump is between 3.x to 4.x. Also it looks like the Parsing API 
is used well outside of the java cluster without almost any Lucene 
dependency. Java Source modules however reach deeper. Does anyone knows why?


>
> This would allow to move one language at a time to the new API.
>
> But this is likely a larger project I an willing to start right now, I 
> still hope there is a way to migrate the whole thing, but in smaller 
> steps: 3->4->5...->9.
>
> -mbien
>
>>
>> What do you think on that?
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
>> For additional commands, e-mail: dev-help@netbeans.apache.org
>>
>> For further information about the NetBeans mailing lists, visit:
>> https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
For additional commands, e-mail: dev-help@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists




Re: Indexing, Lucene?

Posted by Michael Bien <mb...@gmail.com>.
Hi Laszlo,

On 14.05.23 02:14, Laszlo Kishalmi wrote:
> Dear all,
>
> Does anyone have a good knowledge on our Indexing API?
>
> I would like to add some search functionality to my HCL/Terraform 
> support.
>
> I do not know too much of Lucene, but what I know is not really match 
> with what I see there. Then I checked and we are using Lucene 3.6.2, 
> that is more than 10 years old by now.
>
> I've also checked what would it mean to upgrade that to a recent 
> version. It seems to be a hard job to take, moving even to Lucene 4.x 
> would be hard, at least with my level of knowledge. (Lucene changed 
> completely between 3.x to 4.x)
>
> I've seen that the new Maven Indexer comes with Lucene 9.x. So we have 
> recent Library support.
>
> My first question, shall we start to do something about the old Lucene?

yes we have to. I started a while ago in a local branch but got stuck in 
some areas which were not covered by old migration guides. 3->4 had some 
breaking changes. The problem was also that a lot of code is using the 
API - as you probably saw.

Then I decided to do the easier part first, which was the maven-indexer 
module. This is mostly complete and master uses the latest lucene 9.6 
branch (although there are some pending refactoring left to get rid of 
more deprecated code and also some maven-indexer specific features I am 
working on).

I am no lucene expert either - I mostly learn on the go. I did migrate 
apache roller from an old version (can't remember) to 9.x a while ago, 
so it certainly is doable, NB has just much more code.

>
> Shall I invest using our current indexing API?
>
> What I'm leaning for at the moment is, to move the Lucene 9.x library 
> out of Maven Indexer as a separate library project then I'd implement 
> my things using Lucene 9 directly.

Using lucene directly would make it difficult to switch to the netbeans 
API again some time in future, no? We could check if it is possible to 
fork the NB API+impl and make it compatible with Lucene 9 and deploy 
both versions at the same time?

This would allow to move one language at a time to the new API.

But this is likely a larger project I an willing to start right now, I 
still hope there is a way to migrate the whole thing, but in smaller 
steps: 3->4->5...->9.

-mbien

>
> What do you think on that?
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
> For additional commands, e-mail: dev-help@netbeans.apache.org
>
> For further information about the NetBeans mailing lists, visit:
> https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@netbeans.apache.org
For additional commands, e-mail: dev-help@netbeans.apache.org

For further information about the NetBeans mailing lists, visit:
https://cwiki.apache.org/confluence/display/NETBEANS/Mailing+lists