You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2013/09/25 21:36:42 UTC

VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Please vote to release the following artifacts as the Apache Solr 
Reference Guide for 4.5...

https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/

$ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf

(When reviewing the PDF, please don't hesitate to point out any typos 
or formatting glitches or any other problems of subject matter. 
Re-spinning a new RC is trivial, So in my opinion the bar is very low in 
terms of what things are worth fixing before relase.)





-Hoss

Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Varun Thacker <va...@gmail.com>.
Hi Steve,

No problems.

I've created SOLR-5275 for this.


On Thu, Sep 26, 2013 at 3:26 PM, Steve Rowe <sa...@gmail.com> wrote:

> Hi Varun,
>
> Thanks, good catch!
>
> Permission to edit the Reference Guide directly is only granted to
> Lucene/Solr committers - see <
> https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation#Internal-MaintainingDocumentation-WhoCanEditThisDocumentation
> >.
>
> For small additions/corrections, non-committers can add a comment on a
> page in the section that is closest to where the content should go, and
> then a committer can put the content where it belongs.  But for larger
> stuff, it's better to create a JIRA issue, and attach the content there.
>
> Steve
>
> On Sep 26, 2013, at 5:48 AM, Varun Thacker <va...@gmail.com>
> wrote:
>
> > Hi,
> >
> > SOLR-3076 went into this release, but in the documentation for how to
> support Block Join in Solr is not present.
> >
> > In the ref guide there is a section called "Other Parsers" (
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We
> should add BlockJoinChildQParser and BlockJoinParentQParser.
> >
> > Also we should add an example on how to index childDocs in XML to make
> use of BlockJoin in Solr.
> >
> > I can document them right now but where should I post it? If someone can
> give me access to the Confluence I could add it there. My confluence
> username is [varunthacker]
> >
> >
> > On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter <
> hossman_lucene@fucit.org> wrote:
> >
> > Please vote to release the following artifacts as the Apache Solr
> Reference Guide for 4.5...
> >
> >
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
> >
> > $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
> > ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
> >
> > (When reviewing the PDF, please don't hesitate to point out any typos or
> formatting glitches or any other problems of subject matter. Re-spinning a
> new RC is trivial, So in my opinion the bar is very low in terms of what
> things are worth fixing before relase.)
> >
> >
> >
> >
> >
> > -Hoss
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> >
> >
> > --
> >
> >
> > Regards,
> > Varun Thacker
> > http://www.vthacker.in/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 


Regards,
Varun Thacker
http://www.vthacker.in/

Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Steve Rowe <sa...@gmail.com>.
Hi Varun,

Thanks, good catch!

Permission to edit the Reference Guide directly is only granted to Lucene/Solr committers - see <https://cwiki.apache.org/confluence/display/solr/Internal+-+Maintaining+Documentation#Internal-MaintainingDocumentation-WhoCanEditThisDocumentation>. 

For small additions/corrections, non-committers can add a comment on a page in the section that is closest to where the content should go, and then a committer can put the content where it belongs.  But for larger stuff, it's better to create a JIRA issue, and attach the content there.

Steve

On Sep 26, 2013, at 5:48 AM, Varun Thacker <va...@gmail.com> wrote:

> Hi,
> 
> SOLR-3076 went into this release, but in the documentation for how to support Block Join in Solr is not present.
> 
> In the ref guide there is a section called "Other Parsers" (https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We should add BlockJoinChildQParser and BlockJoinParentQParser. 
> 
> Also we should add an example on how to index childDocs in XML to make use of BlockJoin in Solr.
> 
> I can document them right now but where should I post it? If someone can give me access to the Confluence I could add it there. My confluence username is [varunthacker]
> 
> 
> On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter <ho...@fucit.org> wrote:
> 
> Please vote to release the following artifacts as the Apache Solr Reference Guide for 4.5...
> 
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
> 
> $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
> ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
> 
> (When reviewing the PDF, please don't hesitate to point out any typos or formatting glitches or any other problems of subject matter. Re-spinning a new RC is trivial, So in my opinion the bar is very low in terms of what things are worth fixing before relase.)
> 
> 
> 
> 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
> 
> 
> 
> -- 
> 
> 
> Regards,
> Varun Thacker
> http://www.vthacker.in/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Yonik Seeley <yo...@lucidworks.com>.
On Thu, Sep 26, 2013 at 5:48 AM, Varun Thacker
<va...@gmail.com> wrote:
> SOLR-3076 went into this release, but in the documentation for how to
> support Block Join in Solr is not present.

IMO, it's a work in progress / experimental.  It doesn't necessarily
need to be in the normal ref guide at this point, but if anything gets
added it should probably be marked as experimental and potentially
subject to change.

-Yonik
http://lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Varun Thacker <va...@gmail.com>.
Hi,

SOLR-3076 went into this release, but in the documentation for how to
support Block Join in Solr is not present.

In the ref guide there is a section called "Other Parsers" (
https://cwiki.apache.org/confluence/display/solr/Other+Parsers) . We should
add BlockJoinChildQParser and BlockJoinParentQParser.

Also we should add an example on how to index childDocs in XML to make use
of BlockJoin in Solr.

I can document them right now but where should I post it? If someone can
give me access to the Confluence I could add it there. My confluence
username is [varunthacker]


On Thu, Sep 26, 2013 at 1:06 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> Please vote to release the following artifacts as the Apache Solr
> Reference Guide for 4.5...
>
> https://dist.apache.org/repos/**dist/dev/lucene/solr/ref-**
> guide/apache-solr-ref-guide-4.**5-RC0/<https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/>
>
> $ cat apache-solr-ref-guide-4.5-RC0/**apache-solr-ref-guide-4.5.pdf.**sha1
> ee40215d30f264d663f723ea2196b7**2b8cc5effc  apache-solr-ref-guide-4.5.pdf
>
> (When reviewing the PDF, please don't hesitate to point out any typos or
> formatting glitches or any other problems of subject matter. Re-spinning a
> new RC is trivial, So in my opinion the bar is very low in terms of what
> things are worth fixing before relase.)
>
>
>
>
>
> -Hoss
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.**org<de...@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 


Regards,
Varun Thacker
http://www.vthacker.in/

Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Steve Rowe <sa...@gmail.com>.
Cassandra,

On Sep 26, 2013, at 10:39 AM, Cassandra Targett <ca...@gmail.com> wrote:
>> I'll take a look at the CSS - this is the one, right?: <https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr>
>> 
>> About the interim HTML, I found this description of how to get it: <https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML>.
> 
> My first reaction was that it wouldn't work: The HTML export exports
> the selected pages into a .zip file of HTML files (one file for each
> wiki page). The interim-HTML for the PDF is one big single HTML file.
> They're different exports, using different stylesheets. However, it
> would make sense if the HTML was similar, so I took a look with my own
> Confluence instance and the two exports use many of the same divs for
> the same elements. It's not 1:1, but you could at least figure out
> what the right divs are. The big difference will be heading levels -
> the PDF flattens them all depending on the page hierarchy.
> 
> There are also CSS' in place that you don't see and default rules that
> are applied if you haven't overridden them. And then I also think
> there are some styles put into the HTML itself that would override
> anything in the CSS. A few weeks ago I was working on a number of
> possible changes to the PDF, the formatting of code samples being one
> of them, but after two days working on it, I gave up for now. It
> really isn't fun to work on.

I added the following to the PDF stylesheet:

   /* trim leading blank line from pre-formatted code blocks */  
   div.codeContent>pre {  
     margin-top: -6px;  
   }   

and it seems to do the trick - the top and bottom vertical whitespace look balanced to me now on two individual pages I exported.  I'll export the whole thing now and look at every box to make sure this isn't doing the wrong thing somewhere.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Cassandra Targett <ca...@gmail.com>.
On Thu, Sep 26, 2013 at 8:59 AM, Steve Rowe <sa...@gmail.com> wrote:
>
> I'll try to do them all myself, but if it looks like it's going to take more than one day, I'll ask for help.
>

OK, let me know.

>
> I'll take a look at the CSS - this is the one, right?: <https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr>
>
> About the interim HTML, I found this description of how to get it: <https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML>.

My first reaction was that it wouldn't work: The HTML export exports
the selected pages into a .zip file of HTML files (one file for each
wiki page). The interim-HTML for the PDF is one big single HTML file.
They're different exports, using different stylesheets. However, it
would make sense if the HTML was similar, so I took a look with my own
Confluence instance and the two exports use many of the same divs for
the same elements. It's not 1:1, but you could at least figure out
what the right divs are. The big difference will be heading levels -
the PDF flattens them all depending on the page hierarchy.

There are also CSS' in place that you don't see and default rules that
are applied if you haven't overridden them. And then I also think
there are some styles put into the HTML itself that would override
anything in the CSS. A few weeks ago I was working on a number of
possible changes to the PDF, the formatting of code samples being one
of them, but after two days working on it, I gave up for now. It
really isn't fun to work on.

>
>>> 1. Pg 2: The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts.  (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.)  This looks like a general problem - see e.g. #34.
>>
>> CT: This is essentially a known problem (see my comment:
>> https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
>> last bullet point). The way the PDF is created is that Confluence
>> creates the entire document in an HTML page, which include bookmark
>> tags right before the different heading levels. When the PDF is then
>> generated, a rule is applied to insert a page-break before all h2
>> headings. That leaves the bookmark orphaned on the previous page. I
>> have never found a solution to this problem - you can't edit the HTML
>> and you don't have any control over where the bookmark tags in the
>> HTML are put before the HTML is converted to PDF. The only solution is
>> to never have page breaks, which I think severely diminishes
>> readability.
>
> Thanks for the explanation. I agree about page breaks being more important than off-by-one-page link targets.  I wonder if there is some CSS trick to put the page break before the target <a> instead of the <h2> section.
>
>>> 2. Pg 68: Stray asterisks in the <analyzer> tags in the <fieldType> example under "Analysis Phases", apparently to make the surrounded text bold (which also didn't happen).
>>
>> CT: BTW, it never will - code examples are rendered verbatim, without
>> any of the styling normally applied.
>
> Hmm, so there's no way to apply any formatting at all?  That's too bad.

You can apply syntax formatting based on the language of the example,
but not inline formatting to highlight specific lines - one way I've
gotten around that in other places is to enable line numbers to
display in the example, and then call out the line numbers in the
text.

Cassandra

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Steve Rowe <sa...@gmail.com>.
Hi Cassandra,

On Sep 26, 2013, at 9:15 AM, Cassandra Targett <ca...@gmail.com> wrote:
> I'll only address a couple of your specific issues inline. We can
> split the rest of the list if you'd like, but I think a lot of them
> are on the same page in the wiki (although multiple pages in the PDF)
> - let me know.

I'll try to do them all myself, but if it looks like it's going to take more than one day, I'll ask for help. 

> On Thu, Sep 26, 2013 at 7:29 AM, Steve Rowe <sa...@gmail.com> wrote:
>> 0. All examples in the exported PDF have an extra blank line at the top.  I was able to eliminate these from this page <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227> ("What is an analyzer?") by eliminating the newline between the initial {code …} line and the first line of the examples.  This doesn't have any apparent effect on the layout of the page on the wiki, but the PDF export of that page no longer has the extra blank lines.  Any objections to switching all {code} examples in the guide like this?
> 
> CT: is it that horrible? There are dozens and dozens of code examples,
> and it will take a while for someone to fix all of them. Since I edit
> in wiki markup mode, I've always found it easier to add the line break
> so my eyes can find the samples faster. That said, ease of use for
> users is more important than my convenience, so if you think it's
> badly distracting, then it's worth trying to fix it.

For me it's somewhere between annoying and badly distracting, but this will of course depend on the viewer.

> An alternative might be to try to change the CSS that produces the
> code examples - the problem is that the default styling for the PDF
> includes some padding, and then puts in the newline. Fiddling with the
> CSS is painful though - we can't see the interim HTML and it's
> essentially trial & error over & over.

I'll take a look at the CSS - this is the one, right?: <https://cwiki.apache.org/confluence/spaces/flyingpdf/viewpdfstyleconfig.action?key=solr>

About the interim HTML, I found this description of how to get it: <https://confluence.atlassian.com/display/CONF35/Exporting+Confluence+Pages+and+Spaces+to+HTML>.

>> 1. Pg 2: The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts.  (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.)  This looks like a general problem - see e.g. #34.
> 
> CT: This is essentially a known problem (see my comment:
> https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
> last bullet point). The way the PDF is created is that Confluence
> creates the entire document in an HTML page, which include bookmark
> tags right before the different heading levels. When the PDF is then
> generated, a rule is applied to insert a page-break before all h2
> headings. That leaves the bookmark orphaned on the previous page. I
> have never found a solution to this problem - you can't edit the HTML
> and you don't have any control over where the bookmark tags in the
> HTML are put before the HTML is converted to PDF. The only solution is
> to never have page breaks, which I think severely diminishes
> readability.

Thanks for the explanation. I agree about page breaks being more important than off-by-one-page link targets.  I wonder if there is some CSS trick to put the page break before the target <a> instead of the <h2> section.

>> 2. Pg 68: Stray asterisks in the <analyzer> tags in the <fieldType> example under "Analysis Phases", apparently to make the surrounded text bold (which also didn't happen).
> 
> CT: BTW, it never will - code examples are rendered verbatim, without
> any of the styling normally applied.

Hmm, so there's no way to apply any formatting at all?  That's too bad.

> 
>> 43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian are missing from the covered languages; Catalan and Irish should include ElisionFilterFactory in their examples - there are articles lists in Lucene's {Catalan,Irish}Analyzer.
> 
> CT: A general note about the languages and examples - there used to be
> examples that were incorrect so were removed so that might account for
> some of the gaps. There's an open issue you'll want to look at before
> diving in: https://issues.apache.org/jira/browse/SOLR-5031.

Thanks for the pointer.

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Cassandra Targett <ca...@gmail.com>.
Thanks Steve.

I'll only address a couple of your specific issues inline. We can
split the rest of the list if you'd like, but I think a lot of them
are on the same page in the wiki (although multiple pages in the PDF)
- let me know.

Cassandra

On Thu, Sep 26, 2013 at 7:29 AM, Steve Rowe <sa...@gmail.com> wrote:
> 0. All examples in the exported PDF have an extra blank line at the top.  I was able to eliminate these from this page <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227> ("What is an analyzer?") by eliminating the newline between the initial {code …} line and the first line of the examples.  This doesn't have any apparent effect on the layout of the page on the wiki, but the PDF export of that page no longer has the extra blank lines.  Any objections to switching all {code} examples in the guide like this?

CT: is it that horrible? There are dozens and dozens of code examples,
and it will take a while for someone to fix all of them. Since I edit
in wiki markup mode, I've always found it easier to add the line break
so my eyes can find the samples faster. That said, ease of use for
users is more important than my convenience, so if you think it's
badly distracting, then it's worth trying to fix it.

An alternative might be to try to change the CSS that produces the
code examples - the problem is that the default styling for the PDF
includes some padding, and then puts in the newline. Fiddling with the
CSS is painful though - we can't see the interim HTML and it's
essentially trial & error over & over.

So, it's essentially one of two annoying choices: edit all the code
examples by hand, or generate the PDF x-dozen times to maybe find out
the CSS approach won't work.

>
> 1. Pg 2: The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts.  (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.)  This looks like a general problem - see e.g. #34.

CT: This is essentially a known problem (see my comment:
https://issues.apache.org/jira/browse/SOLR-4886?focusedCommentId=13703660#comment-13703660,
last bullet point). The way the PDF is created is that Confluence
creates the entire document in an HTML page, which include bookmark
tags right before the different heading levels. When the PDF is then
generated, a rule is applied to insert a page-break before all h2
headings. That leaves the bookmark orphaned on the previous page. I
have never found a solution to this problem - you can't edit the HTML
and you don't have any control over where the bookmark tags in the
HTML are put before the HTML is converted to PDF. The only solution is
to never have page breaks, which I think severely diminishes
readability.

>
> 2. Pg 68: Stray asterisks in the <analyzer> tags in the <fieldType> example under "Analysis Phases", apparently to make the surrounded text bold (which also didn't happen).

CT: BTW, it never will - code examples are rendered verbatim, without
any of the styling normally applied.

> 43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian are missing from the covered languages; Catalan and Irish should include ElisionFilterFactory in their examples - there are articles lists in Lucene's {Catalan,Irish}Analyzer.

CT: A general note about the languages and examples - there used to be
examples that were incorrect so were removed so that might account for
some of the gaps. There's an open issue you'll want to look at before
diving in: https://issues.apache.org/jira/browse/SOLR-5031.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Steve Rowe <sa...@gmail.com>.
The TODO list is now empty (except for a shelved item), so that clears up the stuff I found.

Steve

On Sep 26, 2013, at 1:28 PM, Chris Hostetter <ho...@fucit.org> wrote:

> 
> Awesome work steve!
> 
> I collected all of this up into a scratch page, let's see how many we can 
> burn through easily and then post another RC...
> 
> https://cwiki.apache.org/confluence/display/solr/Internal+-+TODO+List
> 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Chris Hostetter <ho...@fucit.org>.
Awesome work steve!

I collected all of this up into a scratch page, let's see how many we can 
burn through easily and then post another RC...

https://cwiki.apache.org/confluence/display/solr/Internal+-+TODO+List


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Otis Gospodnetic <ot...@gmail.com>.
I have just 3 chars to contribute: WOW

Otis



On Thu, Sep 26, 2013 at 8:29 AM, Steve Rowe <sa...@gmail.com> wrote:
> Except for #1/#34 - internal links to beginning-of-page sections point one page earlier than they should - and #8/#41 - missing Thai and Polish chars - which I don't know how to fix, I'll try to address the other items on this (um, very long) list of mostly minor stuff I found:
>
> 0. All examples in the exported PDF have an extra blank line at the top.  I was able to eliminate these from this page <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227> ("What is an analyzer?") by eliminating the newline between the initial {code …} line and the first line of the examples.  This doesn't have any apparent effect on the layout of the page on the wiki, but the PDF export of that page no longer has the extra blank lines.  Any objections to switching all {code} examples in the guide like this?
>
> 1. Pg 2: The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts.  (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.)  This looks like a general problem - see e.g. #34.
>
> 2. Pg 68: Stray asterisks in the <analyzer> tags in the <fieldType> example under "Analysis Phases", apparently to make the surrounded text bold (which also didn't happen).
>
> 3. Pg 69: The solr.KeywordTokenizerFactory example is missing one quotation mark from each of the left and right hand sides.
>
> 4. Pg 70: Under "solr.TokenizerFactory", there is a bogus "StandardTokenizer" link in the sentence "Theere aren't any filters that use StandardTokenizer's types" - the link is to the non-existent "StandardTokenizer" page on the Solr wiki.  (It might be useful to systematically link stuff like this to the corresponding Lucene or Solr javadocs, but this should probably be templated or scripted, so that the version-specific links are handled properly.)
>
> 5. Pg 71: Under "Standard Tokenizer", the email addresses recognition claim is false, and Internet domain name recognition isn't validation per se, e.g. "google.supercomputername" will be tokenized as a single token along with "google.com".  The "Out" example output needs fixup accordingly.  I see that the "Classic Tokenizer" section on pg 72 has the verbatim email/domain text; for ClassicTokenizer, the email claim is true, but it has the same issue with internet domain names as StandardTokenizer.
>
> 6. Pg 74: The NGram Tokenizer example output should be ("bicy", "bicyc", "icyc", "icycl", "cycl", "cycle", "ycle") instead of all of the 4grams before the 5grams (I think this class's behavior was changed in 4.4 by LUCENE-5042).
>
> 7. Pg 75: The ICU tokenizer "rulefiles" argument is missing.
>
> 8. Pg 75: The ICU Tokenizer's "In" input and "Out" output are completely missing the Thai text that's visible on the wiki.
>
> 9. Pg 75: Missing spaces in the Regular Expression Pattern Tokenizer's "group" attribute description, at the boundaries between the first two sentences: "token(s).The" and "tokens.Non-negative".
>
> 10. Pg 72, 76, 77, etc.: Many analysis components' factory class names should be styled with a fixed-width font.
>
> 11. Pg 77: UAX29 URL Email Tokenizer recognizes not only .com Internet domain names, but also domain names including any other valid top-level domain (i.e., unlike StandardTokenizer and ClassicTokenizer, domain names are validated against the white list drawn from the IANA Root Zone database <http://www.internic.net/zones/root.zone> as of the last time "ant gen-tld" was performed and the tokenizer was generated.)
>
> 12. Pg 77: UAX29 tokenizer: "file:://" should be "file://"
>
> 13. Pg 77: UAX29 tokenizer's <URL> and <EMAIL> type names are missing angle brackets.
>
> 14. Pg 77: UAX29 tokenizer's maxTokenLength attribute name should be styled with a fixed-width font.
>
> 15. Pg 78: In the example demonstrating how arguments can be given to <filter> elements via attributes, there is a stray asterisk, apparently intended to bold the surrounding text, which also didn't work: *min="2" max="7"/>
>
> 16. Pg 79: The ASCII Folding Filter's "Out" output should have the accent stripped from the "á" -> "a" and the ASCII character value adjusted -> (ASCII character 97)
>
> 17. Pg 81: The Edge N-gram Filter's 4-6 gram size example "Out" should be ("four", "scor", "score", "twen", "twent", "twenty") - some of these are missing.
>
> 18. Pg 83: The ICU Normalizer 2 Filter example should include the "name" and "mode" attributes in the <filter> element.
>
> 19. Pg 87: Stray asterisks in both of the N-Gram Filter examples: *minGramSize="...
>
> 20. Pg 87: The N-Gram Filter 3-5 gram size example "Out" output should be ("fou", "four", "our", "sco", "scor", "score", "cor", "core", "ore") - rather than ordering by gram size, output is now ordered first by position and then by gram size.
>
> 21. Pg 88: Stray asterisk in the first occurrence only example of the Pattern Replace Filter: *replace="first".
>
> 22. Pg 89: "encoder" argument to the Phonetic Filter has surrounding double curly brackets instead of being styled with a fixed-width font.
>
> 23. Pg 90: It should be mentioned on Porter Stem Filter that it's *four times faster* than the English Snowball stemmer - I benchmarked it at <http://markmail.org/thread/d2c443z63z37rwf6>
>
> 24. Pg 90: The Position Filter Factory is deprecated and will be removed in 5.0 - this should be mentioned.
>
> 25. Pg 90: The Position Filter Factory example has the wrong token position on the second token - it should be 2 instead of 3.
>
> 26. Pg 90: The "testsyns.txt" file contents are missing from Remove Duplicates Token Filter.
>
> 27. Pg 92: Shingle Filter is missing params "minShingleSize", "outputUnigramsIfNoShingles", and "tokenSeparator".
>
> 28. Pg 93: Standard Filter: as of lucene match version 3.1, this filter is a no-op.
>
> 29. Pg 94: Stop Filter: the "enablePositionIncrements" arg is no longer supported as of Lucene/Solr 4.4 - this should be mentioned, and the example showing its use should be removed.  All of the examples need to have their positions adjusted accordingly.  Also, all language-specific examples later in the guide should have this arg removed.
>
> 30. Pg 97: Word Delimiter Filter: "-hotspot" is crossed out - the leading hyphen needs to be escaped or something.
>
> 31. Pg 97: WDF: Missing period+space in the "splitOnCaseChange" arg description: "XL"Example 1
>
> 32. Pg 97: WDF: "though" -> "through" in "protected" arg description.
>
> 33. Pg 98: CharFilterFactories: weird wording in "Char Filters can add, change, or remove characters without worrying about fault of Token offsets." - better: "Char Filters can add, change, or remove characters while preserving original character offsets to support e.g. highlighting."
>
> 34. Pg 99&100: Under solr.HTMLStripCharFilterFactory, the links labeled "Major Changes from Solr 3 to Solr 4." go one page previous to the start of this section in the guide.
>
> 35. Pg 100: solr.HTMLStripCharFilterFactory: this is incorrect: "Inline tags, such as <b>, <i>, or <span> will be replaced by a space."  It should be: "Inline tags, such as <b>, <i>, or <span> will be removed - no space or newline will be substituted."
>
> 36. Pg 100: solr.PatternReplaceCharFilterFactory: All of the "replaceWith" column contents are missing backslashes; some have commas that shouldn't be there; and some have curly brackets that shouldn't be there.
>
> 37. Pg 101: Dictionary Compound Word Token Filter: the content of "germanwords.txt" ("dummkopfdonaudampfschiff") is missing spaces or newlines between words - it should be "dumm kopf donau dampf schiff" instead.
>
> 38. Pg 102: Under "Unicode Collation", s/that also be used/that also *can* be used/ in "Unicode Collation is a language-sensitive method of sorting text that also be used for advanced search purposes."
>
> 39. Pg 102&103: Under "Sorting Text for a Specific Language", in the sentence "You can see a list of supported Locales _here_", the link is to a list of supported locales under Java 5.  The equivalent Java 6 link is <http://www.oracle.com/technetwork/java/javase/locales-137662.html>.  Similarly, the Collator javadocs link in the sentence "For more information, see the _Collator javadocs_", the link is to the Java 5 javadocs - the equivalent Java 6 link is <http://docs.oracle.com/javase/6/docs/api/java/text/Collator.html>.  Similarly, under "Sorting Text with Custom Rules", the RuleBasedCollator javadocs link in the sentence "For more information, see the _RuleBasedCollator javadocs_" is to the Java 5 javadocs - the equivalent Java 6 link is <http://docs.oracle.com/javase/6/docs/api/java/text/RuleBasedCollator.html>.
>
> 40. Pg 102-105: Under Unicode Collation: (ICU)CollationFilterFactory have been deprecated (and will be removed in 5.0) in favor of (ICU)CollationField, which will need descriptions and examples.
>
> 41. Pg 105: Under Collation Key Filter, several city names in the result example are missing characters with diacritics: "Białystok" is missing its "ł", "Łowicz" is missing its "Ł", and "Świdnik" is missing its "Ś".
>
> 42. Pg 106: ISO Latin Accent Filter: this class is no longer present as of Solr 4.0 - this section should be replaced with one about ASCIIFoldingFilter.  Also, the solr.MappingCharFilterFactory section on Pg 99 should be changed to use "mapping-FoldToASCII.txt" instead of "mapping-ISOLatin1Accent.txt".
>
> 43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian are missing from the covered languages; Catalan and Irish should include ElisionFilterFactory in their examples - there are articles lists in Lucene's {Catalan,Irish}Analyzer.
>
> 44. Pg 107-120: Example anlyzers for the following languages don't include a <tokenizer> - they should include StandardTokenizer: Arabic, Bulgarian, Czech, Galician, Hindi, Indonesian, Italian, Persian, Polish, Swedish, Spanish, and Turkish.
>
> 45. Pg 109-112: The Dutch, Finnish and German examples all include a stray trailing space in their <tokenizer> class names.
>
> 46. Pg 110: Elision Filter: used for other languages besides French (e.g. Catalan, Italian, and Irish); ElisionFilter class was moved from the o.a.l.analysis.fr package to o.a.l.analysis.util.
>
> 47. Pg 110: Elision Filter: "articles" arg is not required (defaults to FrenchAnalyzer.DEFAULT_ARTICLES)
>
> 48. Pg 110: Elision Filter: "ignoreCase" arg is missing.
>
> 49. Pg 113: Italian: an example using ElisionFilterFactory should be included - there is an articles list in Lucene's ItalianAnalyzer.
>
> 50. Pg 113: Kuromoji: ", as in the following example:" should be removed from the following sentence, since there is no following example: "You can also make discarding punctuation configurable in the JapaneseTokenizerFactory, by setting discardPunctuation to false (to show punctuation) or true (to discard punctuation), as in the following example:"
>
> 51. Pg 114: Lao, Myanmar, Khmer: these are no longer in analysis-extras.  There should either be an example for these here, or a pointer to another ICUTokenizerFactory example elsewhere in the guide.
>
> 52.  Pg 114-116: Norwegian: the Snowball stemmer isn't mentioned in the supported Norwegian stemmers list, but the two examples erroneously include the Snowball stemmer *along with another stemmer*!
>
> 53. Pg 117: Russian: Russian Letter Tokenizer is deprecated, and it no longer supports the "charset" arg.
>
> 54. Pg 117: Russian: Russian Lower Case Filter was removed in 4.0.  It should be replaced by LowerCaseFilter in all examples.
>
> Steve
>
> On Sep 25, 2013, at 3:36 PM, Chris Hostetter <ho...@fucit.org> wrote:
>
>>
>> Please vote to release the following artifacts as the Apache Solr Reference Guide for 4.5...
>>
>> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
>>
>> $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
>> ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
>>
>> (When reviewing the PDF, please don't hesitate to point out any typos or formatting glitches or any other problems of subject matter. Re-spinning a new RC is trivial, So in my opinion the bar is very low in terms of what things are worth fixing before relase.)
>>
>>
>>
>>
>>
>> -Hoss
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VOTE RC0 Release apache-solr-ref-guide-4.5.pdf"

Posted by Steve Rowe <sa...@gmail.com>.
Except for #1/#34 - internal links to beginning-of-page sections point one page earlier than they should - and #8/#41 - missing Thai and Polish chars - which I don't know how to fix, I'll try to address the other items on this (um, very long) list of mostly minor stuff I found:

0. All examples in the exported PDF have an extra blank line at the top.  I was able to eliminate these from this page <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=32604227> ("What is an analyzer?") by eliminating the newline between the initial {code …} line and the first line of the examples.  This doesn't have any apparent effect on the layout of the page on the wiki, but the PDF export of that page no longer has the extra blank lines.  Any objections to switching all {code} examples in the guide like this?

1. Pg 2: The section links from the TOC all take you to the previous page, rather than to the top of the page where the section starts.  (Same behavior on OS X Preview, and under Windows, on Firefox's built-in PDF viewer and on Adobe Reader.)  This looks like a general problem - see e.g. #34.

2. Pg 68: Stray asterisks in the <analyzer> tags in the <fieldType> example under "Analysis Phases", apparently to make the surrounded text bold (which also didn't happen).

3. Pg 69: The solr.KeywordTokenizerFactory example is missing one quotation mark from each of the left and right hand sides.

4. Pg 70: Under "solr.TokenizerFactory", there is a bogus "StandardTokenizer" link in the sentence "Theere aren't any filters that use StandardTokenizer's types" - the link is to the non-existent "StandardTokenizer" page on the Solr wiki.  (It might be useful to systematically link stuff like this to the corresponding Lucene or Solr javadocs, but this should probably be templated or scripted, so that the version-specific links are handled properly.)

5. Pg 71: Under "Standard Tokenizer", the email addresses recognition claim is false, and Internet domain name recognition isn't validation per se, e.g. "google.supercomputername" will be tokenized as a single token along with "google.com".  The "Out" example output needs fixup accordingly.  I see that the "Classic Tokenizer" section on pg 72 has the verbatim email/domain text; for ClassicTokenizer, the email claim is true, but it has the same issue with internet domain names as StandardTokenizer.

6. Pg 74: The NGram Tokenizer example output should be ("bicy", "bicyc", "icyc", "icycl", "cycl", "cycle", "ycle") instead of all of the 4grams before the 5grams (I think this class's behavior was changed in 4.4 by LUCENE-5042).

7. Pg 75: The ICU tokenizer "rulefiles" argument is missing.

8. Pg 75: The ICU Tokenizer's "In" input and "Out" output are completely missing the Thai text that's visible on the wiki.

9. Pg 75: Missing spaces in the Regular Expression Pattern Tokenizer's "group" attribute description, at the boundaries between the first two sentences: "token(s).The" and "tokens.Non-negative".

10. Pg 72, 76, 77, etc.: Many analysis components' factory class names should be styled with a fixed-width font.

11. Pg 77: UAX29 URL Email Tokenizer recognizes not only .com Internet domain names, but also domain names including any other valid top-level domain (i.e., unlike StandardTokenizer and ClassicTokenizer, domain names are validated against the white list drawn from the IANA Root Zone database <http://www.internic.net/zones/root.zone> as of the last time "ant gen-tld" was performed and the tokenizer was generated.)

12. Pg 77: UAX29 tokenizer: "file:://" should be "file://"

13. Pg 77: UAX29 tokenizer's <URL> and <EMAIL> type names are missing angle brackets.

14. Pg 77: UAX29 tokenizer's maxTokenLength attribute name should be styled with a fixed-width font.

15. Pg 78: In the example demonstrating how arguments can be given to <filter> elements via attributes, there is a stray asterisk, apparently intended to bold the surrounding text, which also didn't work: *min="2" max="7"/>

16. Pg 79: The ASCII Folding Filter's "Out" output should have the accent stripped from the "á" -> "a" and the ASCII character value adjusted -> (ASCII character 97)

17. Pg 81: The Edge N-gram Filter's 4-6 gram size example "Out" should be ("four", "scor", "score", "twen", "twent", "twenty") - some of these are missing.

18. Pg 83: The ICU Normalizer 2 Filter example should include the "name" and "mode" attributes in the <filter> element.

19. Pg 87: Stray asterisks in both of the N-Gram Filter examples: *minGramSize="...

20. Pg 87: The N-Gram Filter 3-5 gram size example "Out" output should be ("fou", "four", "our", "sco", "scor", "score", "cor", "core", "ore") - rather than ordering by gram size, output is now ordered first by position and then by gram size.

21. Pg 88: Stray asterisk in the first occurrence only example of the Pattern Replace Filter: *replace="first".

22. Pg 89: "encoder" argument to the Phonetic Filter has surrounding double curly brackets instead of being styled with a fixed-width font. 

23. Pg 90: It should be mentioned on Porter Stem Filter that it's *four times faster* than the English Snowball stemmer - I benchmarked it at <http://markmail.org/thread/d2c443z63z37rwf6>

24. Pg 90: The Position Filter Factory is deprecated and will be removed in 5.0 - this should be mentioned.

25. Pg 90: The Position Filter Factory example has the wrong token position on the second token - it should be 2 instead of 3.

26. Pg 90: The "testsyns.txt" file contents are missing from Remove Duplicates Token Filter.

27. Pg 92: Shingle Filter is missing params "minShingleSize", "outputUnigramsIfNoShingles", and "tokenSeparator".

28. Pg 93: Standard Filter: as of lucene match version 3.1, this filter is a no-op.

29. Pg 94: Stop Filter: the "enablePositionIncrements" arg is no longer supported as of Lucene/Solr 4.4 - this should be mentioned, and the example showing its use should be removed.  All of the examples need to have their positions adjusted accordingly.  Also, all language-specific examples later in the guide should have this arg removed.

30. Pg 97: Word Delimiter Filter: "-hotspot" is crossed out - the leading hyphen needs to be escaped or something.

31. Pg 97: WDF: Missing period+space in the "splitOnCaseChange" arg description: "XL"Example 1 

32. Pg 97: WDF: "though" -> "through" in "protected" arg description.

33. Pg 98: CharFilterFactories: weird wording in "Char Filters can add, change, or remove characters without worrying about fault of Token offsets." - better: "Char Filters can add, change, or remove characters while preserving original character offsets to support e.g. highlighting."

34. Pg 99&100: Under solr.HTMLStripCharFilterFactory, the links labeled "Major Changes from Solr 3 to Solr 4." go one page previous to the start of this section in the guide.

35. Pg 100: solr.HTMLStripCharFilterFactory: this is incorrect: "Inline tags, such as <b>, <i>, or <span> will be replaced by a space."  It should be: "Inline tags, such as <b>, <i>, or <span> will be removed - no space or newline will be substituted."

36. Pg 100: solr.PatternReplaceCharFilterFactory: All of the "replaceWith" column contents are missing backslashes; some have commas that shouldn't be there; and some have curly brackets that shouldn't be there.

37. Pg 101: Dictionary Compound Word Token Filter: the content of "germanwords.txt" ("dummkopfdonaudampfschiff") is missing spaces or newlines between words - it should be "dumm kopf donau dampf schiff" instead.

38. Pg 102: Under "Unicode Collation", s/that also be used/that also *can* be used/ in "Unicode Collation is a language-sensitive method of sorting text that also be used for advanced search purposes."

39. Pg 102&103: Under "Sorting Text for a Specific Language", in the sentence "You can see a list of supported Locales _here_", the link is to a list of supported locales under Java 5.  The equivalent Java 6 link is <http://www.oracle.com/technetwork/java/javase/locales-137662.html>.  Similarly, the Collator javadocs link in the sentence "For more information, see the _Collator javadocs_", the link is to the Java 5 javadocs - the equivalent Java 6 link is <http://docs.oracle.com/javase/6/docs/api/java/text/Collator.html>.  Similarly, under "Sorting Text with Custom Rules", the RuleBasedCollator javadocs link in the sentence "For more information, see the _RuleBasedCollator javadocs_" is to the Java 5 javadocs - the equivalent Java 6 link is <http://docs.oracle.com/javase/6/docs/api/java/text/RuleBasedCollator.html>.

40. Pg 102-105: Under Unicode Collation: (ICU)CollationFilterFactory have been deprecated (and will be removed in 5.0) in favor of (ICU)CollationField, which will need descriptions and examples.

41. Pg 105: Under Collation Key Filter, several city names in the result example are missing characters with diacritics: "Białystok" is missing its "ł", "Łowicz" is missing its "Ł", and "Świdnik" is missing its "Ś".

42. Pg 106: ISO Latin Accent Filter: this class is no longer present as of Solr 4.0 - this section should be replaced with one about ASCIIFoldingFilter.  Also, the solr.MappingCharFilterFactory section on Pg 99 should be changed to use "mapping-FoldToASCII.txt" instead of "mapping-ISOLatin1Accent.txt".

43. Pg 106: Langauge-Specific Factories: Catalan, Danish, Irish and Romanian are missing from the covered languages; Catalan and Irish should include ElisionFilterFactory in their examples - there are articles lists in Lucene's {Catalan,Irish}Analyzer.

44. Pg 107-120: Example anlyzers for the following languages don't include a <tokenizer> - they should include StandardTokenizer: Arabic, Bulgarian, Czech, Galician, Hindi, Indonesian, Italian, Persian, Polish, Swedish, Spanish, and Turkish.

45. Pg 109-112: The Dutch, Finnish and German examples all include a stray trailing space in their <tokenizer> class names.

46. Pg 110: Elision Filter: used for other languages besides French (e.g. Catalan, Italian, and Irish); ElisionFilter class was moved from the o.a.l.analysis.fr package to o.a.l.analysis.util.

47. Pg 110: Elision Filter: "articles" arg is not required (defaults to FrenchAnalyzer.DEFAULT_ARTICLES)

48. Pg 110: Elision Filter: "ignoreCase" arg is missing. 

49. Pg 113: Italian: an example using ElisionFilterFactory should be included - there is an articles list in Lucene's ItalianAnalyzer.

50. Pg 113: Kuromoji: ", as in the following example:" should be removed from the following sentence, since there is no following example: "You can also make discarding punctuation configurable in the JapaneseTokenizerFactory, by setting discardPunctuation to false (to show punctuation) or true (to discard punctuation), as in the following example:"

51. Pg 114: Lao, Myanmar, Khmer: these are no longer in analysis-extras.  There should either be an example for these here, or a pointer to another ICUTokenizerFactory example elsewhere in the guide.

52. Pg 114-116: Norwegian: the Snowball stemmer isn't mentioned in the supported Norwegian stemmers list, but the two examples erroneously include the Snowball stemmer *along with another stemmer*!

53. Pg 117: Russian: Russian Letter Tokenizer is deprecated, and it no longer supports the "charset" arg.

54. Pg 117: Russian: Russian Lower Case Filter was removed in 4.0.  It should be replaced by LowerCaseFilter in all examples.

Steve

On Sep 25, 2013, at 3:36 PM, Chris Hostetter <ho...@fucit.org> wrote:

> 
> Please vote to release the following artifacts as the Apache Solr Reference Guide for 4.5...
> 
> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC0/
> 
> $ cat apache-solr-ref-guide-4.5-RC0/apache-solr-ref-guide-4.5.pdf.sha1
> ee40215d30f264d663f723ea2196b72b8cc5effc  apache-solr-ref-guide-4.5.pdf
> 
> (When reviewing the PDF, please don't hesitate to point out any typos or formatting glitches or any other problems of subject matter. Re-spinning a new RC is trivial, So in my opinion the bar is very low in terms of what things are worth fixing before relase.)
> 
> 
> 
> 
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org