You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Itamar Syn-Hershko <it...@code972.com> on 2016/11/08 23:44:40 UTC

Lucene.NET 4.8 demo

Hey folks,

I just pushed a working demo for Lucene.NET 4.8 using the latest bits to
index and search public repositories on github. Check it out:
https://github.com/synhershko/LuceneNetDemo

I also recorded a Channel 9 video walking through the demo - I will post it
here again as soon as it's released on the nets.

This should clarify some mysteries around the new-ish API and hopefully
drive confidence in what we consider a stable beta release.

Cheers,

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

Re: Lucene.NET 4.8 demo

Posted by Michael O'Shea <mi...@gmail.com>.
Brilliant! Thanks for all the hard work.

M

On Wed, 9 Nov 2016, 00:44 Itamar Syn-Hershko, <it...@code972.com> wrote:

> Hey folks,
>
> I just pushed a working demo for Lucene.NET 4.8 using the latest bits to
> index and search public repositories on github. Check it out:
> https://github.com/synhershko/LuceneNetDemo
>
> I also recorded a Channel 9 video walking through the demo - I will post it
> here again as soon as it's released on the nets.
>
> This should clarify some mysteries around the new-ish API and hopefully
> drive confidence in what we consider a stable beta release.
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>

Re: Lucene.NET 4.8 demo

Posted by Itamar Syn-Hershko <it...@code972.com>.
Great feedback Shad, thanks

Yes, let's add AnonymousAnalyzer to core - or see if we can just use the
Analyzer class for this. This indeed looks better - the demo was just a
quick'n'dirty something I wrote, but it was intended especially for finding
pain-points like you just did. Another pain-point is the LuceneVersion
argument that we currently have as a requirement - and I'm mid-work on
removing it and setting it's value via a default. Anything else that you
can think of would probably make sense to add too :)

More comments inline.
--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Nov 9, 2016 at 1:27 AM, Shad Storhaug <sh...@shadstorhaug.com> wrote:

>
> Another thing I noticed is that we should probably move the
> TokenStreamComponents class so it is not a nested class of Analyzer to
> match the syntax more closely to Lucene.
>
>
Not sure, I think the current syntax make things quite concise. Maybe a
helper/shortcut method would be helpful instead.


> A few thoughts on the demo:
>
> 1. Not everyone is familiar with a GitHub organization. Perhaps the demo
> should provide a list to choose from? Currently, if you type something that
> doesn't exist you get an exception. I had to do a Google search to come up
> with something, since my own username didn't work. One of the top results
> (before an actual list of organizations) was an API that can be utilized to
> read all of the GitHub organizations: https://developer.github.com/
> v3/orgs/


microsoft, facebook, github, apache - there are plenty. I will fix the
exceptions, and some WriteLine and docs in the Readme will fix the rest.


>
> 2. Maybe there should be some kind of estimate given on how long it will
> take to index the organization. When I ultimately chose "apache" it took
> several minutes to index the results, which I was not expecting.
>

Shouldn't take minutes. The demo is pulling the Readme HTML for each so
that might be slowing things done, but I don't have the time to add a
progress bar :)


> 3. Perhaps the API key should be put into a separate (config) file rather
> than inline in the code. And you could pre-define the name of this file and
> put it into a .gitignore file. This would help prevent anyone from
> accidentally committing their API key to the Git repo.
>

Yup, maybe in the future :)


> 4. The search results seemed a bit underwhelming. Maybe there should be
> some kind of indicators how many results Lucene.Net had to sift through to
> come up with the short list. Or at least there should be some kind of
> explanation what is happening to put things into perspective. Think of a
> crime scene investigation. If the investigators enter the search criteria
> and it comes up with 50,000 suspects it would ruin their day. If it comes
> up with 3, then their work is much easier. But without some kind of
> indicator showing that 3 is better than 50,000, the latter seems much more
> impressive in a demo.
>

The total number of results is displayed - along with the 10 top-rated
results. I could prettify it and add higlighted snippets (which would be a
nice addition to the demo!), and give more context etc - but as I said this
is a quick'n'dirty job. I will probably do this later on to try and find
more pain points / improvements we could do to the API. Contributions
welcome.


> 5. Perhaps there should be some way to reset the index? I entered another
> organization to test my updates to the code and it added that
> organization's results to the original index, which I wasn't expecting.
>

That should be eay to do.


>
>
> Thanks,
> Shad Storhaug (NightOwl888)
>
>
> -----Original Message-----
> From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On
> Behalf Of Itamar Syn-Hershko
> Sent: Wednesday, November 9, 2016 6:45 AM
> To: dev@lucenenet.apache.org; user@lucenenet.apache.org
> Subject: Lucene.NET 4.8 demo
>
> Hey folks,
>
> I just pushed a working demo for Lucene.NET 4.8 using the latest bits to
> index and search public repositories on github. Check it out:
> https://github.com/synhershko/LuceneNetDemo
>
> I also recorded a Channel 9 video walking through the demo - I will post
> it here again as soon as it's released on the nets.
>
> This should clarify some mysteries around the new-ish API and hopefully
> drive confidence in what we consider a stable beta release.
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant Lucene.NET committer and PMC member
>

Re: Lucene.NET 4.8 demo

Posted by Itamar Syn-Hershko <it...@code972.com>.
Great feedback Shad, thanks

Yes, let's add AnonymousAnalyzer to core - or see if we can just use the
Analyzer class for this. This indeed looks better - the demo was just a
quick'n'dirty something I wrote, but it was intended especially for finding
pain-points like you just did. Another pain-point is the LuceneVersion
argument that we currently have as a requirement - and I'm mid-work on
removing it and setting it's value via a default. Anything else that you
can think of would probably make sense to add too :)

More comments inline.
--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko>
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Nov 9, 2016 at 1:27 AM, Shad Storhaug <sh...@shadstorhaug.com> wrote:

>
> Another thing I noticed is that we should probably move the
> TokenStreamComponents class so it is not a nested class of Analyzer to
> match the syntax more closely to Lucene.
>
>
Not sure, I think the current syntax make things quite concise. Maybe a
helper/shortcut method would be helpful instead.


> A few thoughts on the demo:
>
> 1. Not everyone is familiar with a GitHub organization. Perhaps the demo
> should provide a list to choose from? Currently, if you type something that
> doesn't exist you get an exception. I had to do a Google search to come up
> with something, since my own username didn't work. One of the top results
> (before an actual list of organizations) was an API that can be utilized to
> read all of the GitHub organizations: https://developer.github.com/
> v3/orgs/


microsoft, facebook, github, apache - there are plenty. I will fix the
exceptions, and some WriteLine and docs in the Readme will fix the rest.


>
> 2. Maybe there should be some kind of estimate given on how long it will
> take to index the organization. When I ultimately chose "apache" it took
> several minutes to index the results, which I was not expecting.
>

Shouldn't take minutes. The demo is pulling the Readme HTML for each so
that might be slowing things done, but I don't have the time to add a
progress bar :)


> 3. Perhaps the API key should be put into a separate (config) file rather
> than inline in the code. And you could pre-define the name of this file and
> put it into a .gitignore file. This would help prevent anyone from
> accidentally committing their API key to the Git repo.
>

Yup, maybe in the future :)


> 4. The search results seemed a bit underwhelming. Maybe there should be
> some kind of indicators how many results Lucene.Net had to sift through to
> come up with the short list. Or at least there should be some kind of
> explanation what is happening to put things into perspective. Think of a
> crime scene investigation. If the investigators enter the search criteria
> and it comes up with 50,000 suspects it would ruin their day. If it comes
> up with 3, then their work is much easier. But without some kind of
> indicator showing that 3 is better than 50,000, the latter seems much more
> impressive in a demo.
>

The total number of results is displayed - along with the 10 top-rated
results. I could prettify it and add higlighted snippets (which would be a
nice addition to the demo!), and give more context etc - but as I said this
is a quick'n'dirty job. I will probably do this later on to try and find
more pain points / improvements we could do to the API. Contributions
welcome.


> 5. Perhaps there should be some way to reset the index? I entered another
> organization to test my updates to the code and it added that
> organization's results to the original index, which I wasn't expecting.
>

That should be eay to do.


>
>
> Thanks,
> Shad Storhaug (NightOwl888)
>
>
> -----Original Message-----
> From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On
> Behalf Of Itamar Syn-Hershko
> Sent: Wednesday, November 9, 2016 6:45 AM
> To: dev@lucenenet.apache.org; user@lucenenet.apache.org
> Subject: Lucene.NET 4.8 demo
>
> Hey folks,
>
> I just pushed a working demo for Lucene.NET 4.8 using the latest bits to
> index and search public repositories on github. Check it out:
> https://github.com/synhershko/LuceneNetDemo
>
> I also recorded a Channel 9 video walking through the demo - I will post
> it here again as soon as it's released on the nets.
>
> This should clarify some mysteries around the new-ish API and hopefully
> drive confidence in what we consider a stable beta release.
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant Lucene.NET committer and PMC member
>

RE: Lucene.NET 4.8 demo

Posted by Shad Storhaug <sh...@shadstorhaug.com>.
Itamar,

Thanks for putting this together.

The demo made me realize something about the design of Analyzer that I didn't realize before. The abstract Analyzer class was designed to be used with Java's anonymous class functionality in mind. This makes creating custom Analyzers more concise in Java than it is in .NET.

In .NET we don't have anonymous classes. But we DO have anonymous methods that we could use to simulate this behavior, provided there is a helper class to assist with it. To demonstrate what I mean, I have updated the demo with a (very simple) AnonymousAnalyzer, which completely eliminates the need for the 3 analyzer classes that you made. https://github.com/NightOwl888/LuceneNetDemo/blob/master/LuceneNetDemo/GitHubIndex.cs

I am not suggesting we should update the demo like this, but I am suggesting that we should add something like AnonymousAnalyzer (perhaps renamed to CustomAnalyzer, InlineAnalyzer, DelegateAnalyzer, or something else more appropriate) in the box so .NET developers can take advantage of its language features in conjunction with Lucene the same way that Java developers do. In fact, I think there are many things we can add (such as utility classes, utility methods, extension methods, and builders) that would make developing with Lucene almost as seamless in .NET as it is in Java - we just need to put our thinking caps on.

For example, maybe there could be a fluent TokenStreamComponentsBuilder that could be used to put the components together in a fluent way...?

Another thing I noticed is that we should probably move the TokenStreamComponents class so it is not a nested class of Analyzer to match the syntax more closely to Lucene.
    

A few thoughts on the demo:

1. Not everyone is familiar with a GitHub organization. Perhaps the demo should provide a list to choose from? Currently, if you type something that doesn't exist you get an exception. I had to do a Google search to come up with something, since my own username didn't work. One of the top results (before an actual list of organizations) was an API that can be utilized to read all of the GitHub organizations: https://developer.github.com/v3/orgs/
2. Maybe there should be some kind of estimate given on how long it will take to index the organization. When I ultimately chose "apache" it took several minutes to index the results, which I was not expecting.
3. Perhaps the API key should be put into a separate (config) file rather than inline in the code. And you could pre-define the name of this file and put it into a .gitignore file. This would help prevent anyone from accidentally committing their API key to the Git repo.
4. The search results seemed a bit underwhelming. Maybe there should be some kind of indicators how many results Lucene.Net had to sift through to come up with the short list. Or at least there should be some kind of explanation what is happening to put things into perspective. Think of a crime scene investigation. If the investigators enter the search criteria and it comes up with 50,000 suspects it would ruin their day. If it comes up with 3, then their work is much easier. But without some kind of indicator showing that 3 is better than 50,000, the latter seems much more impressive in a demo.
5. Perhaps there should be some way to reset the index? I entered another organization to test my updates to the code and it added that organization's results to the original index, which I wasn't expecting.


Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On Behalf Of Itamar Syn-Hershko
Sent: Wednesday, November 9, 2016 6:45 AM
To: dev@lucenenet.apache.org; user@lucenenet.apache.org
Subject: Lucene.NET 4.8 demo

Hey folks,

I just pushed a working demo for Lucene.NET 4.8 using the latest bits to index and search public repositories on github. Check it out:
https://github.com/synhershko/LuceneNetDemo

I also recorded a Channel 9 video walking through the demo - I will post it here again as soon as it's released on the nets.

This should clarify some mysteries around the new-ish API and hopefully drive confidence in what we consider a stable beta release.

Cheers,

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Lucene.NET committer and PMC member

Re: Lucene.NET 4.8 demo

Posted by Michael O'Shea <mi...@gmail.com>.
Brilliant! Thanks for all the hard work.

M

On Wed, 9 Nov 2016, 00:44 Itamar Syn-Hershko, <it...@code972.com> wrote:

> Hey folks,
>
> I just pushed a working demo for Lucene.NET 4.8 using the latest bits to
> index and search public repositories on github. Check it out:
> https://github.com/synhershko/LuceneNetDemo
>
> I also recorded a Channel 9 video walking through the demo - I will post it
> here again as soon as it's released on the nets.
>
> This should clarify some mysteries around the new-ish API and hopefully
> drive confidence in what we consider a stable beta release.
>
> Cheers,
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>

RE: Lucene.NET 4.8 demo

Posted by Shad Storhaug <sh...@shadstorhaug.com>.
Itamar,

Thanks for putting this together.

The demo made me realize something about the design of Analyzer that I didn't realize before. The abstract Analyzer class was designed to be used with Java's anonymous class functionality in mind. This makes creating custom Analyzers more concise in Java than it is in .NET.

In .NET we don't have anonymous classes. But we DO have anonymous methods that we could use to simulate this behavior, provided there is a helper class to assist with it. To demonstrate what I mean, I have updated the demo with a (very simple) AnonymousAnalyzer, which completely eliminates the need for the 3 analyzer classes that you made. https://github.com/NightOwl888/LuceneNetDemo/blob/master/LuceneNetDemo/GitHubIndex.cs

I am not suggesting we should update the demo like this, but I am suggesting that we should add something like AnonymousAnalyzer (perhaps renamed to CustomAnalyzer, InlineAnalyzer, DelegateAnalyzer, or something else more appropriate) in the box so .NET developers can take advantage of its language features in conjunction with Lucene the same way that Java developers do. In fact, I think there are many things we can add (such as utility classes, utility methods, extension methods, and builders) that would make developing with Lucene almost as seamless in .NET as it is in Java - we just need to put our thinking caps on.

For example, maybe there could be a fluent TokenStreamComponentsBuilder that could be used to put the components together in a fluent way...?

Another thing I noticed is that we should probably move the TokenStreamComponents class so it is not a nested class of Analyzer to match the syntax more closely to Lucene.
    

A few thoughts on the demo:

1. Not everyone is familiar with a GitHub organization. Perhaps the demo should provide a list to choose from? Currently, if you type something that doesn't exist you get an exception. I had to do a Google search to come up with something, since my own username didn't work. One of the top results (before an actual list of organizations) was an API that can be utilized to read all of the GitHub organizations: https://developer.github.com/v3/orgs/
2. Maybe there should be some kind of estimate given on how long it will take to index the organization. When I ultimately chose "apache" it took several minutes to index the results, which I was not expecting.
3. Perhaps the API key should be put into a separate (config) file rather than inline in the code. And you could pre-define the name of this file and put it into a .gitignore file. This would help prevent anyone from accidentally committing their API key to the Git repo.
4. The search results seemed a bit underwhelming. Maybe there should be some kind of indicators how many results Lucene.Net had to sift through to come up with the short list. Or at least there should be some kind of explanation what is happening to put things into perspective. Think of a crime scene investigation. If the investigators enter the search criteria and it comes up with 50,000 suspects it would ruin their day. If it comes up with 3, then their work is much easier. But without some kind of indicator showing that 3 is better than 50,000, the latter seems much more impressive in a demo.
5. Perhaps there should be some way to reset the index? I entered another organization to test my updates to the code and it added that organization's results to the original index, which I wasn't expecting.


Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: itamar.synhershko@gmail.com [mailto:itamar.synhershko@gmail.com] On Behalf Of Itamar Syn-Hershko
Sent: Wednesday, November 9, 2016 6:45 AM
To: dev@lucenenet.apache.org; user@lucenenet.apache.org
Subject: Lucene.NET 4.8 demo

Hey folks,

I just pushed a working demo for Lucene.NET 4.8 using the latest bits to index and search public repositories on github. Check it out:
https://github.com/synhershko/LuceneNetDemo

I also recorded a Channel 9 video walking through the demo - I will post it here again as soon as it's released on the nets.

This should clarify some mysteries around the new-ish API and hopefully drive confidence in what we consider a stable beta release.

Cheers,

--

Itamar Syn-Hershko
http://code972.com | @synhershko <https://twitter.com/synhershko> Freelance Developer & Consultant Lucene.NET committer and PMC member