You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bibek Shakya <sh...@gmail.com> on 2018/08/01 17:01:41 UTC
how to flat object and apply to fieldtype in Apache Solr

Hello,

<https://stackoverflow.com/questions/51632368/how-to-flat-object-and-apply-to-fieldtype-in-apache-solr>
Ask Question <https://stackoverflow.com/questions/ask>
up vote 0 down vote favorite
<https://stackoverflow.com/questions/51632368/how-to-flat-object-and-apply-to-fieldtype-in-apache-solr#>

I am trying to migrate lucene tokenizer into apache solr. I have already
written TokenizerFactory for each fieldtype like title,body etc on lucene.
In lucene, there is a way to add TokenStream
<http://lucene.apache.org/core/6_0_1/core/org/apache/lucene/document/Field.html#Field-java.lang.String-org.apache.lucene.analysis.TokenStream-org.apache.lucene.document.FieldType->
to field in a document. In solr We have to make custom Tokenizer/Filter
inorder to work with lucene. I am having problem in given area and I have
already research on many blog and books which will not solved my problem.
In most of blogs and book, They are using string,int direct to the
fieldtype.

I have build custom TokenFilterFactory for apache solr and placed in my
schema.xml like following

<fieldType name="text_reversed" class="solr.TextField"><analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="analyzer.TextWithMarkUpTokenizerFactory"/>
  <filter class="analyzer.ReverseFilterFactory" /></analyzer>

When I am trying to index document on solr

 TextWithMarkUp textWithMarkUp = //get from method
 SolrInputDocument solrInputDocument = new SolrInputDocument();
 solrInputDocument.addField("id", new Random().nextDouble());
 solrInputDocument.addField("title", textWithMarkUp);

On Apache Solr admin panel result will look following

{
    "id":"0.4470506508669744",
    "title":"com.xyz.data:[text = Several disparities are highlighted
in the new report:\n\n74 percent of white male students said they felt
like they belonged at school., tokens.size = 24], tokens = [Several]
[disparities] [are] [highlighted] [in] [the] [new] [report] [:] [74]
[percent] [of] [white] [male] [students] [said] [they] [felt] [like]
[they] [belonged] [at] [school] [.] ",
    "_version_":1607597126134530048}

I am not able to get textWithMarkUp instance on my Custom TokenStream which
will blocked me from flatten given object as earlier I have used to do with
lucene. In lucene I have used to set instance of textWithMarkUp after
creating custom TokenStream instance. Below is my json version of
textWithMarkUp instance

{"text": "The law, which was passed by the Louisiana Legislature and
signed by Gov.","tokens": [
    {
        "category": "Determiner",
        "canonical": "The",
        "ids": null,
        "start": 0,
        "length": 3,
        "text": "The",
        "order": 0
    },
    //tokenized/stemmed/tagged all the words],"abbreviations":
[],"essentialTokenNumber": 12}

Following code is what I m trying to do

public class TextWithMarkUpTokenizer extends Tokenizer {
    private final PositionIncrementAttribute posIncAtt;
    protected int tokenIndex = -1; // index of the current token in
the    collection of metaQTokens
    protected List<MetaQToken> metaQTokens;
    protected TokenStream tokenTokenizer;

    public TextWithMarkUpTokenizer() {
        MetaQTokenTokenizer metaQTokenizer = new MetaQTokenTokenizer();
        tokenTokenizer = metaQTokenizer;
        posIncAtt = addAttribute(PositionIncrementAttribute.class);
    }

    public void setTextWithMarkUp(TextWithMarkUp text) {
      this.markup = text == null ? null : text.getTokens();
    }

    @Override
    public final boolean incrementToken() throws IOException {
      //get instance of TextWithMarkUp here
    }

    private void setCurrentToken(Token token) {
        ((IMetaQTokenAware) tokenTokenizer).setToken(token);
    }}

I have followed all implementation for TextWithMarkUpTokenizerFactory
class, But Solr will have full control on the factory class once we have
loaded jar under the lib folder on solr.

So Is there any ways to set given instance during indexing time on solr? I
have researched on Update Request Processors
<https://lucene.apache.org/solr/guide/6_6/update-request-processors.html>.
Is there anyway this could be solution for my problem?