You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bloodhound.apache.org by Apache Bloodhound <bl...@incubator.apache.org> on 2013/02/14 12:33:40 UTC

[Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+---------------
  Reporter:  andrej       |    Owner:
      Type:  enhancement  |   Status:  new
  Priority:  major        |  Version:
Resolution:               |
--------------------------+---------------
 One possibility is to strip wiki formatting during indexing time.

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  assigned
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:              |  bep-0004-beta
-------------------------+-------------------------------------------------

Comment (by andrej):

 The primary source for indexing is DB. I we would need more data from wiki
 markup, we can just reindex DB and add more fields. As alternative we can
 store (not indexed) complete wiki fields but index and search stripped
 version.

 I suggest we proceed with index time stripping and change this if we will
 see any drawbacks. We can re-index things on new features. What do you
 think?

 Replying to [comment:9 olemis]:
 > Replying to [comment:4 jdreimann]:
 > > Wouldn't this mean that we lose the information provided by wiki
 formatting to rank results later? For example if a word appears styled as
 a heading via wiki formatting it probably has a higher score then a work
 that appears in a cell in a table (again via wiki formatting).

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:10>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  nobody
      Type:  enhancement  |     Status:  new
  Priority:  major        |  Milestone:
 Component:  search       |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------
Changes (by olemis):

 * component:  dashboard => search


-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:3>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  nobody
      Type:  enhancement  |     Status:  new
  Priority:  major        |  Milestone:
 Component:  search       |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------

Comment (by andrej):

 Nice point. Currently, heading is not taken in the account for boosting.
 But we make different ranking for matching in different fields, e.g. boost
 match in ticke summary higher than match in ticket description.

 I think that correct way to enable such functionality later is adding
 specific fields (e.g header![1-10]) to index schema, gather this fields
 during indexing time and apply different ranking for match in specifics
 fields during query time.

 In short, stripping wiki markup does not disable such functionality in
 future.

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:5>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  assigned
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:              |  bep-0004-beta
-------------------------+-------------------------------------------------

Comment (by olemis):

 Replying to [comment:4 jdreimann]:
 > Wouldn't this mean that we lose the information provided by wiki
 formatting to rank results later? For example if a word appears styled as
 a heading via wiki formatting it probably has a higher score then a work
 that appears in a cell in a table (again via wiki formatting).
 >
 > Just curious as this seems important for the quality of the results
 scoring.

 +1 . I was under the impression that this was just about showing the
 summary in search results . One of the indicators of result relevance is
 exactly related to this , not to mention incoming and outgoing links , etc
 ... but that's beyond the scope of this ticket

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:9>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  nobody
      Type:  enhancement  |     Status:  new
  Priority:  major        |  Milestone:
 Component:  dashboard    |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------
Changes (by andrej):

 * owner:   => nobody
 * keywords:   => search bep-0004 bhsearch
 * component:   => dashboard


Old description:

> One possibility is to strip wiki formatting during indexing time.

New description:

 Suggestion isis to strip wiki formatting during indexing time. That will
 give better free text scoring and simplify future highlighting.

--

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:1>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  closed
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:  fixed       |  bep-0004-beta
-------------------------+-------------------------------------------------
Changes (by andrej):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 r1447721 provides basic wiki-to-text formatting of the search result.
 Let's open a new ticket later for more advanced formatting.

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:13>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  assigned
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:              |  bep-0004-beta
-------------------------+-------------------------------------------------

Comment (by olemis):

 Replying to [comment:10 andrej]:
 [...]
 >
 > I suggest we proceed with index time stripping and change this if we
 will see any drawbacks. We can re-index things on new features. What do
 you think?
 >

 At this point I'd be ok with anything that can be built quickly . We can
 cope with enhancements later . Just fork this ticket so that we won't
 forget to do the things jdreimann mentioned in comment:4

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:11>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  andrej
      Type:  enhancement  |     Status:  assigned
  Priority:  major        |  Milestone:
 Component:  search       |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------
Changes (by andrej):

 * status:  new => assigned
 * owner:  nobody => andrej


-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:6>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  assigned
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:              |  bep-0004-beta
-------------------------+-------------------------------------------------
Changes (by andrej):

 * keywords:  search bep-0004 bhsearch => search bep-0004 bhsearch
     bep-0004-beta


-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:8>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
-------------------------+-------------------------------------------------
  Reporter:  andrej      |      Owner:  andrej
      Type:              |     Status:  assigned
  enhancement            |  Milestone:  Release 5
  Priority:  major       |    Version:
 Component:  search      |   Keywords:  search bep-0004 bhsearch
Resolution:              |  bep-0004-beta
-------------------------+-------------------------------------------------

Comment (by andrej):

 Recorded in #398.

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:12>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  nobody
      Type:  enhancement  |     Status:  new
  Priority:  major        |  Milestone:
 Component:  dashboard    |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------
Description changed by andrej:

Old description:

> Suggestion isis to strip wiki formatting during indexing time. That will
> give better free text scoring and simplify future highlighting.

New description:

 Suggestion is to strip wiki formatting during indexing time. That will
 give better free text scoring and simplify future highlighting.

--

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:2>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  nobody
      Type:  enhancement  |     Status:  new
  Priority:  major        |  Milestone:
 Component:  search       |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------

Comment (by jdreimann):

 Wouldn't this mean that we lose the information provided by wiki
 formatting to rank results later? For example if a word appears styled as
 a heading via wiki formatting it probably has a higher score then a work
 that appears in a cell in a table (again via wiki formatting).

 Just curious as this seems important for the quality of the results
 scoring.

-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:4>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker

Re: [Apache Bloodhound] #389: Strip wiki formatting from the Bloodhound Search results

Posted by Apache Bloodhound <bl...@incubator.apache.org>.
#389: Strip wiki formatting from the Bloodhound Search results
--------------------------+--------------------------------------
  Reporter:  andrej       |      Owner:  andrej
      Type:  enhancement  |     Status:  assigned
  Priority:  major        |  Milestone:  Release 5
 Component:  search       |    Version:
Resolution:               |   Keywords:  search bep-0004 bhsearch
--------------------------+--------------------------------------
Changes (by andrej):

 * milestone:   => Release 5


-- 
Ticket URL: <https://issues.apache.org/bloodhound/ticket/389#comment:7>
Apache Bloodhound <https://issues.apache.org/bloodhound/>
The Apache Bloodhound (incubating) issue tracker