You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Anatoly Konstantinov (JIRA)" <ji...@apache.org> on 2019/03/12 14:39:00 UTC

[jira] [Commented] (SOLR-1690) JSONKeyValueTokenizerFactory -- JSON Tokenizer

    [ https://issues.apache.org/jira/browse/SOLR-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790613#comment-16790613 ] 

Anatoly Konstantinov commented on SOLR-1690:
--------------------------------------------

+ 1. Guys, is there progress on committing this ticket into the repo?

> JSONKeyValueTokenizerFactory -- JSON Tokenizer
> ----------------------------------------------
>
>                 Key: SOLR-1690
>                 URL: https://issues.apache.org/jira/browse/SOLR-1690
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: SOLR-1690-JSONKeyValueTokenizerFactory.patch, noggit-1.0-A1.jar
>
>
> Sometimes it is nice to group structured data into a single field.
> This (rough) patch, takes JSON input and indexes tokens based on the key values pairs in the json.
> {code:xml|title=schema.xml}
> <!-- JSON Field Type -->
>     <fieldtype name="json" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
>       <analyzer type="index">
>         <tokenizer class="solr.JSONKeyValueTokenizerFactory" keepArray="true" hierarchicalKey="false"/>
>         <filter class="solr.TrimFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.TrimFilterFactory" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldtype>
> {code}
> Given text:
> {code}
>  { "hello": "world", "rank":5 }
> {code}
> indexed as two tokens:
> || term position | 	1 |	2 |
> || term text | 	hello:world	| rank:5 |
> || term type | 	word |	word |
> || source start,end | 	12,17	| 27,28 |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org