You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2013/03/10 11:55:13 UTC

[jira] [Commented] (LUCENE-3550) Create example code for core

    [ https://issues.apache.org/jira/browse/LUCENE-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598204#comment-13598204 ] 

Shai Erera commented on LUCENE-3550:
------------------------------------

Few comments:

* Please remove @author tags. We don't use them as well as the build fails if it finds any.

* In general, I think that the code needs to be more documented, since this is an example code. So for instance I would add:
** to index() a comment saying "IndexWriterConfig lets you configure how IndexWriter works as well as how documents are indexed".
** to search() a comment saying "QueryParser is able to parse a query string into a meaningful Query object which is used to match and score documents".
** etc...

* If there's nothing special to say about an exception that is thrown, can you please remove @throws from javadocs?

* addDocs:
** I would rename to addDoc
** Modify the comment "create index" to "add document to the index"

* Currently the code prints messages, which we try to avoid (e.g. during tests). So either we add to DemoConstants a VERBOSE property that is initialized to System.getProperty("tests.verbose"), or you just move all the prints to main()?
** In that regard, search() can return a {{ScoreDoc[]}} which main() can use to print results as well as tests could use to assert on.
** I.e. rather than asserting that search() returned 1 or 2 hits, we can assert their order etc. (not saying we have to for this example).

* In order to better test the example, I would make it take a Directory (e.g. index(Directory), search(Directory) or SimpleCoreExample(Directory)) and pass from tests newDirectory() (note: there's no space intentionally).
** This will detect incomplete code, e.g. you don't close the reader in search().

* Also, I think that the example should better clarify that we don't e.g. care about casing, so for instance if you index "Apache" search for "apache".
** main() could also run two searches, to print diverse results
** and tests (and main()) should test multi-word queries too

As a start, it looks great. I think though that it would be better if our simple example contained:
** Documents with more than one field, to show different Field types (TextField, StringField, DocValuesField)
** Instead of a single search(), have different searchXYZ methods, e.g.
*** searchKeyword (using default field), searchFields (execute fielded search)
*** searchBooleanQuery, searchRangeQuery to show QueryParser's syntax
*** searchSort to sort results

I consider these simple/basic examples, since that's really the essence of Lucene -- index documents with few fields and querying for them in different ways.
                
> Create example code for core
> ----------------------------
>
>                 Key: LUCENE-3550
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3550
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/other
>            Reporter: Shai Erera
>              Labels: newdev
>         Attachments: LUCENE-3550.patch
>
>
> Trunk has gone under lots of API changes. Some of which are not trivial, and the migration path from 3.x to 4.0 seems hard. I'd like to propose some way to tackle this, by means of live example code.
> The facet module implements this approach. There is live Java code under src/examples that demonstrate some well documented scenarios. The code itself is documented, in addition to javadoc. Also, the code itself is being unit tested regularly.
> We found it very difficult to keep documentation up-to-date -- javadocs always lag behind, Wiki pages get old etc. However, when you have live Java code, you're *forced* to keep it up-to-date. It doesn't compile if you break the API, it fails to run if you change internal impl behavior. If you keep it simple enough, its documentation stays simple to.
> And if we are successful at maintaining it (which we must be, otherwise the build should fail), then people should have an easy experience migrating between releases. So say you take the simple scenario "I'd like to index documents which have the fields ID, date and body". Then you create an example class/method that accomplishes that. And between releases, this code gets updated, and people can follow the changes required to implement that scenario.
> I'm not saying the examples code should always stay optimized. We can aim at that, but I don't try to fool myself thinking that we'll succeed. But at least we can get it compiled and regularly unit tested.
> I think that it would be good if we introduce the concept of examples such that if a module (core, contrib, modules) have an src/examples, we package it in a .jar and include it with the binary distribution. That's for a first step. We can also have meta examples, under their own module/contrib, that show how to combine several modules together (this might even uncover API problems), but that's definitely a second phase.
> At first, let's do the "unit examples" (ala unit tests) and better start with core. Whatever we succeed at writing for 4.0 will only help users. So let's use this issue to:
> # List example scenarios that we want to demonstrate for core
> # Building the infrastructure in our build system to package and distribute a module's examples.
> Please feel free to list here example scenarios that come to mind. We can then track what's been done and what's not. The more we do the better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org