You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chamnap Chhorn <ch...@gmail.com> on 2012/03/10 12:42:50 UTC

Accessing other entities from DIH

Hi all,

I'm using DIH solr 3.5 to import data from mysql. In my document, I have
some fields: name, category, text_spell, ...
text_spell is a multi-valued field which combines from name and category
(category is a multi-value field as well).

<entity name="listing"
            query="SELECT uuid, name from listings" pk="uuid">
   <entity name="listing_categories"
              query="SELECT `categories`.`name` FROM categories INNER JOIN
`listing_categories` ON
`categories`.`uuid`=`listing_categories`.`category_uuid`) WHERE
`listing_categories`.`listing_uuid`='${listing.uuid}'">
        <field column="name" name="category" />
   </entity>
</entity>

In this case, I would use ScriptTransformer to produce a new array of
[name, category], but the from the example in solr
wiki<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer>,
it seems it could only access the current row in the current entity.
Is it possible to access other entities?

If not possible, how could i solve this problem. I know I could use UNION
statement, but it duplicates the query and it would degrade the performance
as well. Any idea?

-- 
Chamnap

Re: Accessing other entities from DIH

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Chamnap,

Context's way is kind of experimental as-is approach, and the only way to
explore it is use debugger or be ready to debug JavaScript manually. It is
not documented well.
Common approach is copyfield.

With Best Wishes.

On Sat, Mar 10, 2012 at 8:24 PM, Chamnap Chhorn <ch...@gmail.com>wrote:

> Thanks Mikhail.
>
> Yeah, in this case CopyField is better. I can combine multiple fields into
> a new field, right? Something like this:
> <copyField source="name" dest="text_spell"/>
> <copyField source="keyphrase" dest="text_spell"/>
> <copyField source="category" dest="text_spell"/>
>
> Anyway, I might need to access the child entity and parent entity. Can you
> provide me some examples on how to use context? I'm not a java developer,
> it's a little abstract to me in the solr wiki.
> Or, could you give some links that explain this into more details?
>
> Chamnap
>
> On Sat, Mar 10, 2012 at 7:11 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > Hello,
> >
> > First of all you can have an access to the context, where the parent
> entity
> > fields can be obtained from (following your link):
> >
> > The semantics of execution is same as that of a java transformer. The
> > method can have two arguments as in 'transformRow(Map<String,Object> ,
> > Context context) in the abstract class 'Transformer' . As it is
> javascript
> > the second argument may be omittted and it still works.
> >
> > then,
> >
> > generally it sounds like a copyfield
> > http://wiki.apache.org/solr/SchemaXml#Copy_Fields have you considered
> it?
> >
> > On Sat, Mar 10, 2012 at 3:42 PM, Chamnap Chhorn <chamnapchhorn@gmail.com
> > >wrote:
> >
> > > Hi all,
> > >
> > > I'm using DIH solr 3.5 to import data from mysql. In my document, I
> have
> > > some fields: name, category, text_spell, ...
> > > text_spell is a multi-valued field which combines from name and
> category
> > > (category is a multi-value field as well).
> > >
> > > <entity name="listing"
> > >            query="SELECT uuid, name from listings" pk="uuid">
> > >   <entity name="listing_categories"
> > >              query="SELECT `categories`.`name` FROM categories INNER
> JOIN
> > > `listing_categories` ON
> > > `categories`.`uuid`=`listing_categories`.`category_uuid`) WHERE
> > > `listing_categories`.`listing_uuid`='${listing.uuid}'">
> > >        <field column="name" name="category" />
> > >   </entity>
> > > </entity>
> > >
> > > In this case, I would use ScriptTransformer to produce a new array of
> > > [name, category], but the from the example in solr
> > > wiki<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer>,
> > > it seems it could only access the current row in the current entity.
> > > Is it possible to access other entities?
> > >
> > > If not possible, how could i solve this problem. I know I could use
> UNION
> > > statement, but it duplicates the query and it would degrade the
> > performance
> > > as well. Any idea?
> > >
> > > --
> > > Chamnap
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Lucid Certified
> > Apache Lucene/Solr Developer
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Accessing other entities from DIH

Posted by Chamnap Chhorn <ch...@gmail.com>.
Thanks Mikhail.

Yeah, in this case CopyField is better. I can combine multiple fields into
a new field, right? Something like this:
<copyField source="name" dest="text_spell"/>
<copyField source="keyphrase" dest="text_spell"/>
<copyField source="category" dest="text_spell"/>

Anyway, I might need to access the child entity and parent entity. Can you
provide me some examples on how to use context? I'm not a java developer,
it's a little abstract to me in the solr wiki.
Or, could you give some links that explain this into more details?

Chamnap

On Sat, Mar 10, 2012 at 7:11 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hello,
>
> First of all you can have an access to the context, where the parent entity
> fields can be obtained from (following your link):
>
> The semantics of execution is same as that of a java transformer. The
> method can have two arguments as in 'transformRow(Map<String,Object> ,
> Context context) in the abstract class 'Transformer' . As it is javascript
> the second argument may be omittted and it still works.
>
> then,
>
> generally it sounds like a copyfield
> http://wiki.apache.org/solr/SchemaXml#Copy_Fields have you considered it?
>
> On Sat, Mar 10, 2012 at 3:42 PM, Chamnap Chhorn <chamnapchhorn@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I'm using DIH solr 3.5 to import data from mysql. In my document, I have
> > some fields: name, category, text_spell, ...
> > text_spell is a multi-valued field which combines from name and category
> > (category is a multi-value field as well).
> >
> > <entity name="listing"
> >            query="SELECT uuid, name from listings" pk="uuid">
> >   <entity name="listing_categories"
> >              query="SELECT `categories`.`name` FROM categories INNER JOIN
> > `listing_categories` ON
> > `categories`.`uuid`=`listing_categories`.`category_uuid`) WHERE
> > `listing_categories`.`listing_uuid`='${listing.uuid}'">
> >        <field column="name" name="category" />
> >   </entity>
> > </entity>
> >
> > In this case, I would use ScriptTransformer to produce a new array of
> > [name, category], but the from the example in solr
> > wiki<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer>,
> > it seems it could only access the current row in the current entity.
> > Is it possible to access other entities?
> >
> > If not possible, how could i solve this problem. I know I could use UNION
> > statement, but it duplicates the query and it would degrade the
> performance
> > as well. Any idea?
> >
> > --
> > Chamnap
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>

Re: Accessing other entities from DIH

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

First of all you can have an access to the context, where the parent entity
fields can be obtained from (following your link):

The semantics of execution is same as that of a java transformer. The
method can have two arguments as in 'transformRow(Map<String,Object> ,
Context context) in the abstract class 'Transformer' . As it is javascript
the second argument may be omittted and it still works.

then,

generally it sounds like a copyfield
http://wiki.apache.org/solr/SchemaXml#Copy_Fields have you considered it?

On Sat, Mar 10, 2012 at 3:42 PM, Chamnap Chhorn <ch...@gmail.com>wrote:

> Hi all,
>
> I'm using DIH solr 3.5 to import data from mysql. In my document, I have
> some fields: name, category, text_spell, ...
> text_spell is a multi-valued field which combines from name and category
> (category is a multi-value field as well).
>
> <entity name="listing"
>            query="SELECT uuid, name from listings" pk="uuid">
>   <entity name="listing_categories"
>              query="SELECT `categories`.`name` FROM categories INNER JOIN
> `listing_categories` ON
> `categories`.`uuid`=`listing_categories`.`category_uuid`) WHERE
> `listing_categories`.`listing_uuid`='${listing.uuid}'">
>        <field column="name" name="category" />
>   </entity>
> </entity>
>
> In this case, I would use ScriptTransformer to produce a new array of
> [name, category], but the from the example in solr
> wiki<http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer>,
> it seems it could only access the current row in the current entity.
> Is it possible to access other entities?
>
> If not possible, how could i solve this problem. I know I could use UNION
> statement, but it duplicates the query and it would degrade the performance
> as well. Any idea?
>
> --
> Chamnap
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>