You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by to...@apache.org on 2016/06/16 10:00:24 UTC

svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Author: tommaso
Date: Thu Jun 16 10:00:24 2016
New Revision: 1748675

URL: http://svn.apache.org/viewvc?rev=1748675&view=rev
Log:
OAK-4368 - use postings highlighter whenever possible in Lucene property index to speedup excerpt generation

Modified:
    jackrabbit/oak/branches/1.2/   (props changed)
    jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java
    jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
    jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
    jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java

Propchange: jackrabbit/oak/branches/1.2/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Thu Jun 16 10:00:24 2016
@@ -1,4 +1,4 @@
 /jackrabbit/oak/branches/1.0:1665962
 /jackrabbit/oak/branches/1.4:1745750,1747354
-/jackrabbit/oak/trunk:1672350,1672468,1672537,1672603,1672611,1672642,1672644,1672834-1672835,1673351,1673410,1673414-1673415,1673436,1673644,1673662-1673664,1673669,1673695,1673713,1673738,1673787,1673791,1674046,1674065,1674075,1674107,1674228,1674780,1674880,1675054-1675055,1675319,1675332,1675354,1675357,1675382,1675555,1675566,1675593,1676198,1676237,1676407,1676458,1676539,1676670,1676693,1676703,1676725,1677579,1677581,1677609,1677611,1677774,1677788,1677797,1677804,1677806,1677939,1677991,1678023,1678095-1678096,1678124,1678171,1678173,1678202,1678211,1678323,1678758,1678938,1678954,1679144,1679165,1679191,1679232,1679235,1679503,1679958,1679961,1680170,1680172,1680182,1680222,1680232,1680236,1680461,1680633,1680643,1680747,1680805-1680806,1680903,1681282,1681767,1681918,1681921,1681955,1682042,1682218,1682235,1682437,1682488,1682494,1682555,1682855,1682904,1683059,1683089,1683213,1683249,1683259,1683278,1683323,1683687,1683700,1684174-1684175,1684186,1684376,1684442,1684561
 ,1684570,1684601,1684618,1684669,1684820,1684868,1684894,1685023,1685075,1685370,1685541,1685552,1685589-1685590,1685840,1685964,1685977,1685989,1685999,1686003,1686023,1686032,1686097,1686162,1686229,1686234,1686253,1686414,1686772,1686780,1686790,1686854,1686857,1686971,1687053-1687055,1687175,1687196,1687198,1687220,1687239-1687240,1687301,1687441,1687553,1688089-1688090,1688172,1688179,1688349,1688421,1688436,1688453,1688616,1688622,1688634,1688636,1688817,1689003-1689004,1689008,1689577,1689581,1689623,1689810,1689828,1689831,1689833,1689903,1690017,1690043,1690047,1690057,1690247,1690249,1690634-1690637,1690650,1690657,1690669,1690672,1690674,1690885,1690941,1691139,1691151,1691159,1691167,1691183,1691188,1691201,1691210,1691217-1691218,1691280,1691307,1691331-1691333,1691345,1691384-1691385,1691394,1691401,1691498,1691509,1692133-1692134,1692156,1692250,1692272,1692274,1692363,1692382,1692478,1692955,1693002,1693030,1693050,1693209,1693401,1693421,1693525-1693526,1694007,1694
 393-1694394,1694651,1694653-1694654,1695032,1695050,1695122,1695280,1695299,1695420,1695457,1695482,1695492,1695507,1695521,1695540,1695571,1695829-1695830,1695905,1696190,1696194,1696242,1696285,1696375,1696522,1696578,1696759,1696916,1697363,1697373,1697383,1697410,1697582,1697589,1697616,1697672,1697896,1700191,1700231,1700397,1700403,1700506,1700571,1700718,1700720,1700727,1700749,1700769,1700775,1701065,1701619,1701733,1701743,1701750,1701768,1701806,1701810,1701814,1701907,1701948,1701955,1701959,1701965,1701986,1702014,1702022,1702045,1702051,1702241,1702272,1702371,1702387,1702405,1702423,1702426,1702428,1702860,1702866,1702942,1702960,1703212,1703382,1703395,1703411,1703428,1703430,1703568,1703592,1703758,1703858,1703878,1704256,1704282,1704285,1704457,1704479,1704490,1704614,1704629,1704636,1704655,1704670,1704886,1705005,1705027,1705043,1705055,1705250,1705268,1705273,1705323,1705677,1705701,1705871,1705992,1705998,1706009,1706037,1706059,1706212,1706218,1706270,1706764,1
 706772,1707049,1707189,1707191,1707331,1707435,1707509,1707753,1708049,1708105,1708307,1708315,1708546,1708592,1708766,1709012,1709852,1709978,1710013,1710031,1710049,1710205,1710242,1710559,1710575,1710590,1710614,1710637,1710789,1710800,1710811,1710816,1710972,1711248,1711282,1711296,1711405,1711498,1711654,1712018,1712042,1712319,1712490,1712531,1712730,1712785,1712963,1713008,1713439,1713461,1713580,1713586,1713599-1713600,1713626,1713698,1713803,1713809,1714034,1714061,1714084,1714170,1714213,1714229,1714238,1714519-1714520,1714543-1714544,1714730,1714739,1714779,1714956,1714961,1715010,1715092,1715191,1715346,1715716,1715767,1715771,1715888,1715898,1716100,1716178,1716426,1716576,1716588-1716589,1716596,1716616,1716703,1716712,1716815,1716823,1716830,1716883,1717203,1717277,1717410,1717462,1717632,1717768-1717769,1717784,1717789,1717988,1718528,1718533,1718547-1718548,1718626,1718646,1718772,1718801-1718802,1718895,1719111,1719288,1719869,1720306,1720335,1720350,1720354,172050
 0,1721160,1721172,1721337,1722141,1722832,1723227,1723239,1723241,1723251,1723254,1723333,1723347,1723350,1723565,1723584,1723713,1723731,1724026,1724057,1724186,1724210,1724401,1724628,1724631,1725216,1725477,1725515,1725555,1725941,1725960,1726232,1726237,1726570,1726579,1726585-1726586,1726621,1726795,1726797,1726809,1726812,1726981,1726993,1727026,1727254,1727331,1727350,1727358,1727429,1727476,1727483,1727508,1727515-1727518,1727813,1727816,1727831-1727832,1727841,1727893,1727895,1727912-1727913,1727923,1727991,1728037,1728041,1728070,1728114,1728281,1728443,1728642,1729200,1729505,1729599,1729957,1729979,1730216,1730527,1730581,1730629,1730801,1731627,1731647-1731648,1731789,1731797,1732131,1732268,1732278,1732330,1732647-1732648,1732864,1733615,1733929,1734230,1734254,1735052,1735405,1735484,1735588,1736176,1737309-1737310,1737334,1737349,1738833,1738950,1738957,1739894,1740116,1740626,1740971,1741032,1741339,1741343,1742520,1742888,1742916,1743097,1743172,1743343,1744265,174
 4959,1745038,1745197,1746117,1746696,1746981,1747492,1748553
+/jackrabbit/oak/trunk:1672350,1672468,1672537,1672603,1672611,1672642,1672644,1672834-1672835,1673351,1673410,1673414-1673415,1673436,1673644,1673662-1673664,1673669,1673695,1673713,1673738,1673787,1673791,1674046,1674065,1674075,1674107,1674228,1674780,1674880,1675054-1675055,1675319,1675332,1675354,1675357,1675382,1675555,1675566,1675593,1676198,1676237,1676407,1676458,1676539,1676670,1676693,1676703,1676725,1677579,1677581,1677609,1677611,1677774,1677788,1677797,1677804,1677806,1677939,1677991,1678023,1678095-1678096,1678124,1678171,1678173,1678202,1678211,1678323,1678758,1678938,1678954,1679144,1679165,1679191,1679232,1679235,1679503,1679958,1679961,1680170,1680172,1680182,1680222,1680232,1680236,1680461,1680633,1680643,1680747,1680805-1680806,1680903,1681282,1681767,1681918,1681921,1681955,1682042,1682218,1682235,1682437,1682488,1682494,1682555,1682855,1682904,1683059,1683089,1683213,1683249,1683259,1683278,1683323,1683687,1683700,1684174-1684175,1684186,1684376,1684442,1684561
 ,1684570,1684601,1684618,1684669,1684820,1684868,1684894,1685023,1685075,1685370,1685541,1685552,1685589-1685590,1685840,1685964,1685977,1685989,1685999,1686003,1686023,1686032,1686097,1686162,1686229,1686234,1686253,1686414,1686772,1686780,1686790,1686854,1686857,1686971,1687053-1687055,1687175,1687196,1687198,1687220,1687239-1687240,1687301,1687441,1687553,1688089-1688090,1688172,1688179,1688349,1688421,1688436,1688453,1688616,1688622,1688634,1688636,1688817,1689003-1689004,1689008,1689577,1689581,1689623,1689810,1689828,1689831,1689833,1689903,1690017,1690043,1690047,1690057,1690247,1690249,1690634-1690637,1690650,1690657,1690669,1690672,1690674,1690885,1690941,1691139,1691151,1691159,1691167,1691183,1691188,1691201,1691210,1691217-1691218,1691280,1691307,1691331-1691333,1691345,1691384-1691385,1691394,1691401,1691498,1691509,1692133-1692134,1692156,1692250,1692272,1692274,1692363,1692382,1692478,1692955,1693002,1693030,1693050,1693209,1693401,1693421,1693525-1693526,1694007,1694
 393-1694394,1694651,1694653-1694654,1695032,1695050,1695122,1695280,1695299,1695420,1695457,1695482,1695492,1695507,1695521,1695540,1695571,1695829-1695830,1695905,1696190,1696194,1696242,1696285,1696375,1696522,1696578,1696759,1696916,1697363,1697373,1697383,1697410,1697582,1697589,1697616,1697672,1697896,1700191,1700231,1700397,1700403,1700506,1700571,1700718,1700720,1700727,1700749,1700769,1700775,1701065,1701619,1701733,1701743,1701750,1701768,1701806,1701810,1701814,1701907,1701948,1701955,1701959,1701965,1701986,1702014,1702022,1702045,1702051,1702241,1702272,1702371,1702387,1702405,1702423,1702426,1702428,1702860,1702866,1702942,1702960,1703212,1703382,1703395,1703411,1703428,1703430,1703568,1703592,1703758,1703858,1703878,1704256,1704282,1704285,1704457,1704479,1704490,1704614,1704629,1704636,1704655,1704670,1704886,1705005,1705027,1705043,1705055,1705250,1705268,1705273,1705323,1705677,1705701,1705871,1705992,1705998,1706009,1706037,1706059,1706212,1706218,1706270,1706764,1
 706772,1707049,1707189,1707191,1707331,1707435,1707509,1707753,1708049,1708105,1708307,1708315,1708546,1708592,1708766,1709012,1709852,1709978,1710013,1710031,1710049,1710205,1710242,1710559,1710575,1710590,1710614,1710637,1710789,1710800,1710811,1710816,1710972,1711248,1711282,1711296,1711405,1711498,1711654,1712018,1712042,1712319,1712490,1712531,1712730,1712785,1712963,1713008,1713439,1713461,1713580,1713586,1713599-1713600,1713626,1713698,1713803,1713809,1714034,1714061,1714084,1714170,1714213,1714229,1714238,1714519-1714520,1714543-1714544,1714730,1714739,1714779,1714956,1714961,1715010,1715092,1715191,1715346,1715716,1715767,1715771,1715888,1715898,1716100,1716178,1716426,1716576,1716588-1716589,1716596,1716616,1716703,1716712,1716815,1716823,1716830,1716883,1717203,1717277,1717410,1717462,1717632,1717768-1717769,1717784,1717789,1717988,1718528,1718533,1718547-1718548,1718626,1718646,1718772,1718801-1718802,1718895,1719111,1719288,1719869,1720306,1720335,1720350,1720354,172050
 0,1721160,1721172,1721337,1722141,1722832,1723227,1723239,1723241,1723251,1723254,1723333,1723347,1723350,1723565,1723584,1723713,1723731,1724026,1724057,1724186,1724210,1724401,1724628,1724631,1725216,1725477,1725515,1725555,1725941,1725960,1726232,1726237,1726570,1726579,1726585-1726586,1726621,1726795,1726797,1726809,1726812,1726981,1726993,1727026,1727254,1727331,1727350,1727358,1727429,1727476,1727483,1727508,1727515-1727518,1727813,1727816,1727831-1727832,1727841,1727893,1727895,1727912-1727913,1727923,1727991,1728037,1728041,1728070,1728114,1728281,1728443,1728642,1729200,1729505,1729599,1729957,1729979,1730216,1730527,1730581,1730629,1730801,1731627,1731647-1731648,1731789,1731797,1732131,1732268,1732278,1732330,1732647-1732648,1732864,1733615,1733929,1734230,1734254,1735052,1735405,1735484,1735588,1736176,1737309-1737310,1737334,1737349,1738833,1738950,1738957,1739894,1740116,1740626,1740971,1741032,1741339,1741343,1742520,1742888,1742916,1743097,1743172,1743343,1744265,174
 4959,1745038,1745197,1746117,1746696,1746981,1747492,1748505,1748553
 /jackrabbit/trunk:1345480

Modified: jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java
URL: http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java?rev=1748675&r1=1748674&r2=1748675&view=diff
==============================================================================
--- jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java (original)
+++ jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java Thu Jun 16 10:00:24 2016
@@ -34,6 +34,7 @@ import static org.apache.jackrabbit.oak.
 import static org.apache.jackrabbit.oak.plugins.index.lucene.FieldNames.FULLTEXT;
 import static org.apache.lucene.document.Field.Store.NO;
 import static org.apache.lucene.document.Field.Store.YES;
+import static org.apache.lucene.index.FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS;
 
 /**
  * {@code FieldFactory} is a factory for <code>Field</code> instances with
@@ -59,7 +60,7 @@ public final class FieldFactory {
         OAK_TYPE.setIndexed(true);
         OAK_TYPE.setOmitNorms(true);
         OAK_TYPE.setStored(true);
-        OAK_TYPE.setIndexOptions(DOCS_AND_FREQS_AND_POSITIONS);
+        OAK_TYPE.setIndexOptions(DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
         OAK_TYPE.setTokenized(true);
         OAK_TYPE.freeze();
 

Modified: jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
URL: http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java?rev=1748675&r1=1748674&r2=1748675&view=diff
==============================================================================
--- jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java (original)
+++ jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java Thu Jun 16 10:00:24 2016
@@ -58,7 +58,6 @@ import org.apache.jackrabbit.oak.spi.que
 import org.apache.jackrabbit.oak.spi.query.QueryIndex.AdvanceFulltextQueryIndex;
 import org.apache.jackrabbit.oak.spi.state.NodeState;
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.CachingTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
 import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
@@ -363,10 +362,20 @@ public class LuceneIndex implements Adva
 
                             PropertyRestriction restriction = filter.getPropertyRestriction(QueryImpl.REP_EXCERPT);
                             boolean addExcerpt = restriction != null && restriction.isNotNullRestriction();
+
+                            Analyzer analyzer = indexNode.getDefinition().getAnalyzer();
+
+                            if (addExcerpt) {
+                                // setup highlighter
+                                QueryScorer scorer = new QueryScorer(query);
+                                scorer.setExpandMultiTermQuery(true);
+                                highlighter.setFragmentScorer(scorer);
+                            }
+
                             for (ScoreDoc doc : docs.scoreDocs) {
                                 String excerpt = null;
                                 if (addExcerpt) {
-                                    excerpt = getExcerpt(indexNode, searcher, query, doc);
+                                    excerpt = getExcerpt(analyzer, searcher, doc);
                                 }
 
                                 LuceneResultRow row = convertToRow(doc, searcher, excerpt);
@@ -480,20 +489,17 @@ public class LuceneIndex implements Adva
         return new LucenePathCursor(itr, settings, sizeEstimator);
     }
 
-    private String getExcerpt(IndexNode indexNode, IndexSearcher searcher, Query query, ScoreDoc doc) throws IOException {
+    private String getExcerpt(Analyzer analyzer, IndexSearcher searcher, ScoreDoc doc) throws IOException {
         StringBuilder excerpt = new StringBuilder();
-        QueryScorer scorer = new QueryScorer(query);
-        scorer.setExpandMultiTermQuery(true);
-        highlighter.setFragmentScorer(scorer);
-        Analyzer analyzer = indexNode.getDefinition().getAnalyzer();
 
-        for (IndexableField field : searcher.getIndexReader().document(doc.doc).getFields())
-            if (!FieldNames.SUGGEST.equals(field.name())) {
+        for (IndexableField field : searcher.getIndexReader().document(doc.doc).getFields()) {
+            String name = field.name();
+            // only full text or analyzed fields
+            if (name.startsWith(FieldNames.FULLTEXT) || name.startsWith(FieldNames.ANALYZED_FIELD_PREFIX)) {
+                String text = field.stringValue();
+                TokenStream tokenStream = analyzer.tokenStream(name, text);
                 try {
-                    TokenStream tokenStream = analyzer.tokenStream(field.name(), field.stringValue());
-                    tokenStream.reset();
-                    CachingTokenFilter cachingTokenFilter = new CachingTokenFilter(tokenStream);
-                    TextFragment[] textFragments = highlighter.getBestTextFragments(cachingTokenFilter, field.stringValue(), true, 2);
+                    TextFragment[] textFragments = highlighter.getBestTextFragments(tokenStream, text, true, 2);
                     if (textFragments != null && textFragments.length > 0) {
                         for (TextFragment fragment : textFragments) {
                             if (excerpt.length() > 0) {
@@ -501,11 +507,13 @@ public class LuceneIndex implements Adva
                             }
                             excerpt.append(fragment.toString());
                         }
+                        break;
                     }
                 } catch (InvalidTokenOffsetsException e) {
                     LOG.error("higlighting failed", e);
                 }
             }
+        }
         return excerpt.toString();
     }
 

Modified: jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
URL: http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java?rev=1748675&r1=1748674&r2=1748675&view=diff
==============================================================================
--- jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java (original)
+++ jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java Thu Jun 16 10:00:24 2016
@@ -24,10 +24,13 @@ import javax.annotation.Nullable;
 import javax.jcr.PropertyType;
 import java.io.IOException;
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.Collection;
 import java.util.Deque;
 import java.util.Iterator;
+import java.util.LinkedList;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 import java.util.concurrent.atomic.AtomicReference;
 
@@ -67,11 +70,11 @@ import org.apache.jackrabbit.oak.spi.que
 import org.apache.jackrabbit.oak.spi.state.NodeState;
 import org.apache.jackrabbit.oak.util.PerfLogger;
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.CachingTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.document.Document;
 import org.apache.lucene.index.DirectoryReader;
 import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FieldInfos;
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.IndexableField;
 import org.apache.lucene.index.MultiFields;
@@ -105,6 +108,7 @@ import org.apache.lucene.search.highligh
 import org.apache.lucene.search.highlight.SimpleHTMLEncoder;
 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 import org.apache.lucene.search.highlight.TextFragment;
+import org.apache.lucene.search.postingshighlight.PostingsHighlighter;
 import org.apache.lucene.search.spell.SuggestWord;
 import org.apache.lucene.search.suggest.Lookup;
 import org.apache.lucene.util.Version;
@@ -113,7 +117,6 @@ import org.slf4j.LoggerFactory;
 
 import static com.google.common.base.Preconditions.checkNotNull;
 import static com.google.common.base.Preconditions.checkState;
-import static com.google.common.collect.Lists.newArrayList;
 import static com.google.common.collect.Lists.newArrayListWithCapacity;
 import static org.apache.jackrabbit.JcrConstants.JCR_MIXINTYPES;
 import static org.apache.jackrabbit.JcrConstants.JCR_PRIMARYTYPE;
@@ -121,8 +124,8 @@ import static org.apache.jackrabbit.oak.
 import static org.apache.jackrabbit.oak.api.Type.STRING;
 import static org.apache.jackrabbit.oak.commons.PathUtils.denotesRoot;
 import static org.apache.jackrabbit.oak.commons.PathUtils.getParentPath;
+import static org.apache.jackrabbit.oak.plugins.index.lucene.FieldNames.ANALYZED_FIELD_PREFIX;
 import static org.apache.jackrabbit.oak.plugins.index.lucene.FieldNames.PATH;
-import static org.apache.jackrabbit.oak.plugins.index.lucene.FieldNames.SUGGEST;
 import static org.apache.jackrabbit.oak.plugins.index.lucene.IndexDefinition.NATIVE_SORT_ORDER;
 import static org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexConstants.VERSION;
 import static org.apache.jackrabbit.oak.plugins.index.lucene.TermFactory.newAncestorTerm;
@@ -195,6 +198,8 @@ public class LucenePropertyIndex impleme
     private final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<strong>", "</strong>"),
             new SimpleHTMLEncoder(), null);
 
+    private final PostingsHighlighter postingsHighlighter = new PostingsHighlighter();
+
     public LucenePropertyIndex(IndexTracker tracker) {
         this.tracker = tracker;
         this.scorerProviderFactory = ScorerProviderFactory.DEFAULT;
@@ -387,10 +392,21 @@ public class LucenePropertyIndex impleme
                             restriction = filter.getPropertyRestriction(QueryImpl.OAK_SCORE_EXPLANATION);
                             boolean addExplain = restriction != null && restriction.isNotNullRestriction();
 
+                            Analyzer analyzer = indexNode.getDefinition().getAnalyzer();
+
+                            FieldInfos mergedFieldInfos = null;
+                            if (addExcerpt) {
+                                // setup highlighter
+                                QueryScorer scorer = new QueryScorer(query);
+                                scorer.setExpandMultiTermQuery(true);
+                                highlighter.setFragmentScorer(scorer);
+                                mergedFieldInfos = MultiFields.getMergedFieldInfos(searcher.getIndexReader());
+                            }
+
                             for (ScoreDoc doc : docs.scoreDocs) {
                                 String excerpt = null;
                                 if (addExcerpt) {
-                                    excerpt = getExcerpt(indexNode, searcher, query, doc);
+                                    excerpt = getExcerpt(query, analyzer, searcher, doc, mergedFieldInfos);
                                 }
 
                                 String explanation = null;
@@ -542,32 +558,69 @@ public class LucenePropertyIndex impleme
         return query;
     }
 
-    private String getExcerpt(IndexNode indexNode, IndexSearcher searcher, Query query, ScoreDoc doc) throws IOException {
+    private String getExcerpt(Query query, Analyzer analyzer, IndexSearcher searcher, ScoreDoc doc,
+                              FieldInfos fieldInfos) throws IOException {
         StringBuilder excerpt = new StringBuilder();
-        QueryScorer scorer = new QueryScorer(query);
-        scorer.setExpandMultiTermQuery(true);
-        highlighter.setFragmentScorer(scorer);
-
-        Analyzer analyzer = indexNode.getDefinition().getAnalyzer();
-        for (IndexableField field : searcher.getIndexReader().document(doc.doc).getFields())
-            if (!SUGGEST.equals(field.name())) {
-                try {
-                    TokenStream tokenStream = analyzer.tokenStream(field.name(), field.stringValue());
-                    tokenStream.reset();
-                    CachingTokenFilter cachingTokenFilter = new CachingTokenFilter(tokenStream);
-                    TextFragment[] textFragments = highlighter.getBestTextFragments(cachingTokenFilter, field.stringValue(), true, 2);
-                    if (textFragments != null && textFragments.length > 0) {
-                        for (TextFragment fragment : textFragments) {
-                            if (excerpt.length() > 0) {
-                                excerpt.append("...");
+        int docID = doc.doc;
+        List<String> names = new LinkedList<String>();
+
+        for (IndexableField field : searcher.getIndexReader().document(docID).getFields()) {
+            String name = field.name();
+            // postings highlighter can be used on analyzed fields with docs, freqs, positions and offsets stored.
+            if (name.startsWith(ANALYZED_FIELD_PREFIX) && fieldInfos.hasProx() && fieldInfos.hasOffsets()) {
+                names.add(name);
+            }
+        }
+
+        if (names.size() > 0) {
+            int[] maxPassages = new int[names.size()];
+            for (int i = 0; i < maxPassages.length; i++) {
+                maxPassages[i] = 1;
+            }
+            try {
+                Map<String, String[]> stringMap = postingsHighlighter.highlightFields(names.toArray(new String[names.size()]),
+                        query, searcher, new int[]{docID}, maxPassages);
+                for (Map.Entry<String, String[]> entry : stringMap.entrySet()) {
+                    String value = Arrays.toString(entry.getValue());
+                    if (value.contains("<b>")) {
+                        if (excerpt.length() > 0) {
+                            excerpt.append("...");
+                        }
+                        excerpt.append(value);
+                    }
+                }
+            } catch (Exception e) {
+                LOG.error("postings highlighting failed", e);
+            }
+        }
+
+        // fallback if no excerpt could be retrieved using postings highlighter
+        if (excerpt.length() == 0) {
+
+            for (IndexableField field : searcher.getIndexReader().document(doc.doc).getFields()) {
+                String name = field.name();
+                // only full text or analyzed fields
+                if (name.startsWith(FieldNames.FULLTEXT) || name.startsWith(FieldNames.ANALYZED_FIELD_PREFIX)) {
+                    String text = field.stringValue();
+                    TokenStream tokenStream = analyzer.tokenStream(name, text);
+
+                    try {
+                        TextFragment[] textFragments = highlighter.getBestTextFragments(tokenStream, text, true, 1);
+                        if (textFragments != null && textFragments.length > 0) {
+                            for (TextFragment fragment : textFragments) {
+                                if (excerpt.length() > 0) {
+                                    excerpt.append("...");
+                                }
+                                excerpt.append(fragment.toString());
                             }
-                            excerpt.append(fragment.toString());
+                            break;
                         }
+                    } catch (InvalidTokenOffsetsException e) {
+                        LOG.error("higlighting failed", e);
                     }
-                } catch (InvalidTokenOffsetsException e) {
-                    LOG.error("higlighting failed", e);
                 }
             }
+        }
         return excerpt.toString();
     }
 

Modified: jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java
URL: http://svn.apache.org/viewvc/jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java?rev=1748675&r1=1748674&r2=1748675&view=diff
==============================================================================
--- jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java (original)
+++ jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java Thu Jun 16 10:00:24 2016
@@ -21,9 +21,11 @@ package org.apache.jackrabbit.oak.plugin
 
 import java.io.InputStream;
 import java.io.IOException;
+import java.io.InputStream;
 import java.text.ParseException;
 import java.util.Calendar;
 import java.util.Collections;
+import java.util.LinkedList;
 import java.util.List;
 import java.util.Map;
 import java.util.Random;
@@ -653,7 +655,7 @@ public class LucenePropertyIndexTest ext
         t.setProperty(JcrConstants.JCR_PRIMARYTYPE, typeName, Type.NAME);
         return t;
     }
-    
+
     @Test
     public void orderByScore() throws Exception {
         Tree idx = createIndex("test1", of("propa"));
@@ -2013,6 +2015,36 @@ public class LucenePropertyIndexTest ext
 
     }
 
+    @Test
+    public void longRepExcerpt() throws Exception {
+        Tree luceneIndex = createFullTextIndex(root.getTree("/"), "lucene");
+
+        root.commit();
+
+        StringBuilder s = new StringBuilder();
+        for (int k = 0; k < 1000; k++) {
+            s.append("foo bar ").append(k).append(" ");
+        }
+        String text = s.toString();
+        List<String> names = new LinkedList<String>();
+        for (int j = 0; j < 30; j++) {
+            Tree test = root.getTree("/").addChild("ex-test-" + j);
+            for (int i = 0; i < 200; i++) {
+                String name = "cont" + i;
+                test.addChild(name).setProperty("text", text);
+                names.add("/" + test.getName() + "/" + name);
+            }
+        }
+
+        root.commit();
+
+        String query;
+
+        query = "SELECT [jcr:path],[rep:excerpt] from [nt:base] WHERE CONTAINS([text], 'foo')";
+        assertQuery(query, SQL2, names);
+
+    }
+
     private void assertPlanAndQuery(String query, String planExpectation, List<String> paths){
         assertThat(explain(query), containsString(planExpectation));
         assertQuery(query, paths);



Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Tommaso Teofili <to...@gmail.com>.
https://issues.apache.org/jira/browse/OAK-4524

Tommaso

Il giorno gio 30 giu 2016 alle ore 10:33 Tommaso Teofili <
tommaso.teofili@gmail.com> ha scritto:

> thanks, since OAK-4499 is closed I'll create a new one.
>
> Regards,
> Tommaso
>
> Il giorno mer 29 giu 2016 alle ore 08:14 Julian Reschke <
> julian.reschke@gmx.de> ha scritto:
>
>> On 2016-06-23 14:35, Tommaso Teofili wrote:
>> > fixed in OAK-4499, I'll have a look if Jenkins keeps complaining.
>> >
>> > Regards,
>> > Tommaso
>>
>> <
>> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1000/jdk=jdk1.8.0_11,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/testReport/junit/org.apache.jackrabbit.oak.plugins.index.lucene/LucenePropertyIndexTest/longRepExcerpt/
>> >
>>
>>

Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Tommaso Teofili <to...@gmail.com>.
thanks, since OAK-4499 is closed I'll create a new one.

Regards,
Tommaso

Il giorno mer 29 giu 2016 alle ore 08:14 Julian Reschke <
julian.reschke@gmx.de> ha scritto:

> On 2016-06-23 14:35, Tommaso Teofili wrote:
> > fixed in OAK-4499, I'll have a look if Jenkins keeps complaining.
> >
> > Regards,
> > Tommaso
>
> <
> https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1000/jdk=jdk1.8.0_11,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/testReport/junit/org.apache.jackrabbit.oak.plugins.index.lucene/LucenePropertyIndexTest/longRepExcerpt/
> >
>
>

Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Julian Reschke <ju...@gmx.de>.
On 2016-06-23 14:35, Tommaso Teofili wrote:
> fixed in OAK-4499, I'll have a look if Jenkins keeps complaining.
>
> Regards,
> Tommaso

<https://builds.apache.org/job/Apache%20Jackrabbit%20Oak%20matrix/1000/jdk=jdk1.8.0_11,label=Ubuntu,nsfixtures=DOCUMENT_NS,profile=unittesting/testReport/junit/org.apache.jackrabbit.oak.plugins.index.lucene/LucenePropertyIndexTest/longRepExcerpt/>


Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Tommaso Teofili <to...@gmail.com>.
fixed in OAK-4499, I'll have a look if Jenkins keeps complaining.

Regards,
Tommaso

Il giorno gio 23 giu 2016 alle ore 12:49 Julian Reschke <
julian.reschke@gmx.de> ha scritto:

> On 2016-06-22 23:27, Tommaso Teofili wrote:
> > right, I will have a look tomorrow.
> >
> >
> > Tommaso
>
> The same is the case for 1.4 as well...
>
> Best regards, Julian
>

Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Julian Reschke <ju...@gmx.de>.
On 2016-06-22 23:27, Tommaso Teofili wrote:
> right, I will have a look tomorrow.
>
>
> Tommaso

The same is the case for 1.4 as well...

Best regards, Julian

Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Tommaso Teofili <to...@gmail.com>.
right, I will have a look tomorrow.


Tommaso


Il giorno mer 22 giu 2016 alle ore 16:54 Julian Reschke <
julian.reschke@gmx.de> ha scritto:

> On 2016-06-16 12:00, tommaso@apache.org wrote:
> > Author: tommaso
> > Date: Thu Jun 16 10:00:24 2016
> > New Revision: 1748675
> >
> > URL: http://svn.apache.org/viewvc?rev=1748675&view=rev
> > Log:
> > OAK-4368 - use postings highlighter whenever possible in Lucene property
> index to speedup excerpt generation
> >
> > Modified:
> >     jackrabbit/oak/branches/1.2/   (props changed)
> >
>  jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java
> >
>  jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
> >
>  jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
> >
>  jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java
> > ...
>
> With this change, I'm starting to see unit test failures due to queries
> LucenePropertyIndexTest taking a bit longer than 10s (with RDBMK and
> Derby). I don't think I have seen this before. Maybe due to the test
> change we need a different threshold?
>
> Best regards, Julian
>

Re: svn commit: r1748675 - in /jackrabbit/oak/branches/1.2: ./ oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/ oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/

Posted by Julian Reschke <ju...@gmx.de>.
On 2016-06-16 12:00, tommaso@apache.org wrote:
> Author: tommaso
> Date: Thu Jun 16 10:00:24 2016
> New Revision: 1748675
>
> URL: http://svn.apache.org/viewvc?rev=1748675&view=rev
> Log:
> OAK-4368 - use postings highlighter whenever possible in Lucene property index to speedup excerpt generation
>
> Modified:
>     jackrabbit/oak/branches/1.2/   (props changed)
>     jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/FieldFactory.java
>     jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneIndex.java
>     jackrabbit/oak/branches/1.2/oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndex.java
>     jackrabbit/oak/branches/1.2/oak-lucene/src/test/java/org/apache/jackrabbit/oak/plugins/index/lucene/LucenePropertyIndexTest.java
> ...

With this change, I'm starting to see unit test failures due to queries 
LucenePropertyIndexTest taking a bit longer than 10s (with RDBMK and 
Derby). I don't think I have seen this before. Maybe due to the test 
change we need a different threshold?

Best regards, Julian