You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Lewin Joy (TMS)" <Le...@Toyota.com> on 2015/09/11 21:05:14 UTC

Morphline for Indexing Nested Document Structure

Hi,

I am having a huge data of about 600 Million documents.
These documents are relational and I need to maintain the relation in solr.

So, I am Indexing them as nested documents. It has nested documents within nested documents.
Now, my problem is how to index them.

We are on Cloudera Solr 4.4 and using mapreduce Indexer.
Can we specify this nested structure in the morphline file? For the mapreduce or spark-submit, I need this handled through morphline.

If this can't be done, is there an alternative that I can try?

Thanks,
Lewin

Re: Morphline for Indexing Nested Document Structure

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
You need to override
org.apache.solr.morphlines.solr.LoadSolrBuilder.LoadSolr.doProcess(Record).
Now LoadSolrBuilder.LoadSolr.convert(Record) copies all record fields into
SolrInputDocuments fields.
SolrInputDocument.addChildDocument(SolrInputDocument) nests a doc.

On Fri, Sep 11, 2015 at 11:27 PM, Lewin Joy (TMS) <Le...@toyota.com>
wrote:

> Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block
> join feature.
> But, how do I index a nested document to use for block join for this huge
> a dataset?
> I could not find anyway to sculpt the morphline file for this use case.
>
> Thank you for the reply, Mikhail
>
> -Lewin
>
>
> -----Original Message-----
> From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com]
> Sent: Friday, September 11, 2015 2:13 PM
> To: solr-user <so...@lucene.apache.org>
> Subject: Re: Morphline for Indexing Nested Document Structure
>
> Hello Lewin,
>
> Block Join support is released in Solr 4.5.
>
> On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) <Le...@toyota.com>
> wrote:
>
> > Hi,
> >
> > I am having a huge data of about 600 Million documents.
> > These documents are relational and I need to maintain the relation in
> solr.
> >
> > So, I am Indexing them as nested documents. It has nested documents
> > within nested documents.
> > Now, my problem is how to index them.
> >
> > We are on Cloudera Solr 4.4 and using mapreduce Indexer.
> > Can we specify this nested structure in the morphline file? For the
> > mapreduce or spark-submit, I need this handled through morphline.
> >
> > If this can't be done, is there an alternative that I can try?
> >
> > Thanks,
> > Lewin
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mk...@griddynamics.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

RE: Morphline for Indexing Nested Document Structure

Posted by "Lewin Joy (TMS)" <Le...@Toyota.com>.
Oh Yes. We are upgrading Cloudera to get solr 4.10 just to get this block join feature.
But, how do I index a nested document to use for block join for this huge a dataset?
I could not find anyway to sculpt the morphline file for this use case.

Thank you for the reply, Mikhail

-Lewin


-----Original Message-----
From: Mikhail Khludnev [mailto:mkhludnev@griddynamics.com] 
Sent: Friday, September 11, 2015 2:13 PM
To: solr-user <so...@lucene.apache.org>
Subject: Re: Morphline for Indexing Nested Document Structure

Hello Lewin,

Block Join support is released in Solr 4.5.

On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) <Le...@toyota.com>
wrote:

> Hi,
>
> I am having a huge data of about 600 Million documents.
> These documents are relational and I need to maintain the relation in solr.
>
> So, I am Indexing them as nested documents. It has nested documents 
> within nested documents.
> Now, my problem is how to index them.
>
> We are on Cloudera Solr 4.4 and using mapreduce Indexer.
> Can we specify this nested structure in the morphline file? For the 
> mapreduce or spark-submit, I need this handled through morphline.
>
> If this can't be done, is there an alternative that I can try?
>
> Thanks,
> Lewin
>



--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: Morphline for Indexing Nested Document Structure

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Lewin,

Block Join support is released in Solr 4.5.

On Fri, Sep 11, 2015 at 9:05 PM, Lewin Joy (TMS) <Le...@toyota.com>
wrote:

> Hi,
>
> I am having a huge data of about 600 Million documents.
> These documents are relational and I need to maintain the relation in solr.
>
> So, I am Indexing them as nested documents. It has nested documents within
> nested documents.
> Now, my problem is how to index them.
>
> We are on Cloudera Solr 4.4 and using mapreduce Indexer.
> Can we specify this nested structure in the morphline file? For the
> mapreduce or spark-submit, I need this handled through morphline.
>
> If this can't be done, is there an alternative that I can try?
>
> Thanks,
> Lewin
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>