You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nick80 <ni...@gmail.com> on 2008/10/23 00:07:57 UTC

How to search a DataImportHandler solr index

Hi,

I'm using a couple of Solr 1.1 powered indexes and have relied on my "old"
Solr installation for more than two years now. I'm working on a new project
that is a bit complexer than my previous ones and I thought I had a look at
all the new goodies in Solr. One item that caught my attention is the
DataImportHandler.

According to the documentation I read, it allows you among other things to
very easily index one-to-many and many-to-many relationships. Right? What I
cann't find is, how do you search the index? Is it still possible to do
faceting on all the fields? Or isn't that possible? Any information on
searching a fairly complex index build by DataImportHandler is very welcome.
Thanks.

Kind regards,

Nick
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20120698.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
oh. There is nothing wrong with indexing or querying.
Solr cannot store or return a document like

 <arr name="banner_type">
    <str>flash
         <arr name="size">
            <str>50x50</str>
            <str>100x100</str>
        </arr>
    </str>
    <str>gif
        <arr name="size">
            <str>50x50</str>
            <str>100x100</str>
        </arr>
    </str>
 </arr>

Solr/Lucene Document is not really an object tree. It is a flat object where
the values can be a single valued or it can be a collection type

But you can do something as follows

have fields like size_flash, size_gif and size_jpg and depending on the
banner type you can store them in appropriate fields

BTW
 <field name="size" column="size" />
can be shortened to
 <field column="size" />



On Fri, Oct 24, 2008 at 6:48 PM, Nick80 <ni...@gmail.com> wrote:

>
> Hi,
>
> below is a simplified copy of my data-config file:
>
> <dataConfig>
> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost/campaign" user="root" password=""/>
>        <document name="campaigns">
>                <entity name="campaign" query="SELECT * FROM campaigns WHERE
> deleted_at IS
> NULL">y
>                        <field column="id" name="id" />
>      <field column="name" name="campaign_name" />
>
>      <entity name="banner" query="SELECT * FROM banners WHERE
> campaign_id=${campaign.id}">
>         <field name="banner_type" column="banner_type" />
>         <entity name="size" query="SELECT * FROM banner_sizes WHERE
> banner_id=${banner.id}">
>                 <field name="size" column="size" />
>              </entity>
>      </entity>
>                </entity>
>         </document>
> </dataConfig>
>
> I have defined the following fields in schema.xml:
>
> <field name="id" type="string" indexed="true" stored="true" />
> <field name="campaign_name" type="string" indexed="true" stored="true" />
> <field name="banner_type" type="string" indexed="true" stored="true"
> multiValued="true" omitNorms="true" termVectors="true" />
> <field name="size" type="string" indexed="true" stored="true"
> multiValued="true" omitNorms="true" termVectors="true" />
>
> Hope that makes it a bit clearer. Thanks.
>
> Kind regards,
>
> Nick
> --
> View this message in context:
> http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20149960.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
--Noble Paul

Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
Hi,

below is a simplified copy of my data-config file:

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/campaign" user="root" password=""/>
	<document name="campaigns">
		<entity name="campaign" query="SELECT * FROM campaigns WHERE deleted_at IS
NULL">
			<field column="id" name="id" />
      <field column="name" name="campaign_name" />

      <entity name="banner" query="SELECT * FROM banners WHERE
campaign_id=${campaign.id}">
      	<field name="banner_type" column="banner_type" />
      	<entity name="size" query="SELECT * FROM banner_sizes WHERE
banner_id=${banner.id}">
	      	<field name="size" column="size" />
	      </entity>
      </entity>
		</entity>
	</document>
</dataConfig>

I have defined the following fields in schema.xml:

<field name="id" type="string" indexed="true" stored="true" />
<field name="campaign_name" type="string" indexed="true" stored="true" />
<field name="banner_type" type="string" indexed="true" stored="true"
multiValued="true" omitNorms="true" termVectors="true" />
<field name="size" type="string" indexed="true" stored="true"
multiValued="true" omitNorms="true" termVectors="true" />

Hope that makes it a bit clearer. Thanks.

Kind regards,

Nick
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20149960.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
probably u can paste your data-config.xml with the queries etc

--Noble

On Fri, Oct 24, 2008 at 1:33 PM, Nick80 <ni...@gmail.com> wrote:
>
> Hi Paul,
>
> thanks for the answer but unfortunately it doesn't work. I have the
> following:
>
> <entity name="campaign">
>    <field name="id" column="id" />
>    <field name="campaign_name" column="campaign_name" />
>
>    <entity name="banner">
>       <field name="banner_type" column="banner_type" />
>
>        <entity name="size">
>            <field name="size" column="size" />
>        </entity>
>     </entity>
>  </entity>
>
> I have defined banner_type and size as:
>
> <field name="banner_type" type="string" indexed="true" stored="true"
> multiValued="true" omitNorms="true" termVectors="true" />
>
> Now when I do a search with Solr, I get:
>
> <result name="response" numFound="1" start="0">
> <doc>
>  <str name="id">1</str>
>  <str name="campaign_name">Campaign Name</str>
>
>  <arr name="banner_type">
>     <str>flash</str>
>     <str>gif</str>
>  </arr>
>
>  <arr name="size">
>     <str>50x50</str>
>     <str>100x100</str>
>     <str>50x50</str>
>     <str>100x100</str>
>  </arr>
> </doc>
> </result>
>
> While I was expecting that the size tags were inside the banner_type tags,
> something like:
>
>  <arr name="banner_type">
>     <str>flash
>          <arr name="size">
>             <str>50x50</str>
>             <str>100x100</str>
>         </arr>
>     </str>
>     <str>gif
>         <arr name="size">
>             <str>50x50</str>
>             <str>100x100</str>
>         </arr>
>     </str>
>  </arr>
>
> Am I doing something wrong or is it just not possible? Because with the
> output it generates now I cann't accurately find a campaign that has a flash
> banner of size 50x50 for example, because the size 50x50 could as well be
> from a gif banner. With the nested structure, I think it would be possible.
> At least if Solr can search this type of structure. Any tips are welcome.
> Thanks.
>
> Kind regards,
>
> Nick
> --
> View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20145974.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
Hi Paul,

thanks for the answer but unfortunately it doesn't work. I have the
following:

<entity name="campaign">
    <field name="id" column="id" />     
    <field name="campaign_name" column="campaign_name" />

    <entity name="banner">
       <field name="banner_type" column="banner_type" />
       
        <entity name="size">
            <field name="size" column="size" />
        </entity>
     </entity>
 </entity>

I have defined banner_type and size as:

<field name="banner_type" type="string" indexed="true" stored="true"
multiValued="true" omitNorms="true" termVectors="true" />

Now when I do a search with Solr, I get:

<result name="response" numFound="1" start="0">
<doc>
  <str name="id">1</str>
  <str name="campaign_name">Campaign Name</str>

  <arr name="banner_type">
     <str>flash</str>
     <str>gif</str>
  </arr>

  <arr name="size">
     <str>50x50</str>
     <str>100x100</str>
     <str>50x50</str>
     <str>100x100</str>
  </arr>
</doc>
</result>

While I was expecting that the size tags were inside the banner_type tags,
something like:

  <arr name="banner_type">
     <str>flash
          <arr name="size">
             <str>50x50</str>
             <str>100x100</str>
         </arr>
     </str>
     <str>gif
         <arr name="size">
             <str>50x50</str>
             <str>100x100</str>
         </arr>
     </str>
  </arr>

Am I doing something wrong or is it just not possible? Because with the
output it generates now I cann't accurately find a campaign that has a flash
banner of size 50x50 for example, because the size 50x50 could as well be
from a gif banner. With the nested structure, I think it would be possible.
At least if Solr can search this type of structure. Any tips are welcome.
Thanks.

Kind regards,

Nick
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20145974.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
You must have your entities nested like these
 <entity name="campaign">
    <entity name="banner">
    <entity name="size">
     </entity>
 </entity>
 </entity>

banner and size must be multivalued




On Thu, Oct 23, 2008 at 11:29 PM, Nick80 <ni...@gmail.com> wrote:
>
> I did some more testing and encountered another problem. I have three tables:
> campaign, banner and size. A campaign can have multiple banners of different
> types (flash, gif, ...). And each type of banner can be of multiple sizes
> (50x50, 100x100, ...). So I did the following in data-config.xml
>
> <entity name="campaign">
> </entity>
>
> <entity name="banner">
>  <entity name="size">
>  </entity>
> </entity>
>
> I nested the size data inside the banner data, because I want to do the
> following search: which campaign has a banner of type flash with a size of
> 50x50.
>
> But once I load the data in Solr, I get:
>
> campaign
> banner_array
> size_array
>
> The size is not associated with the correct banner, all the data is just
> flattened. So the search I intended to do, is not possible. How can I change
> my thinking, so that I'm able to search for which campaign has a banner of
> type flash with a size of 50x50? Thanks.
>
> Kind regards,
>
> Nick
> --
> View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20136424.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: multicore admin interface

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: multicore admin interface
: Date: Thu, 23 Oct 2008 18:29:54 -0700
: In-Reply-To: <20...@talk.nabble.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking

: I have two cores.  When each core references the same dataDir, I could
: access the core admin interface.  However, when core1 dirData is
: referencing one directory, and core2 another directory, I could not
: access the admin interface.

1) i can't think of any reason why two cores could/should use the same 
dataDir -- i'm amazed it even works.

2) in the case where you "could not access the admin interface" can you 
please post your full solr.xml file, the URL you are trying to access, and 
what shows up in the logs when you do so.


-Hoss


multicore admin interface

Posted by "Nguyen, Joe" <jn...@automotive.com>.
Hi,
I have two cores.  When each core references the same dataDir, I could
access the core admin interface.  However, when core1 dirData is
referencing one directory, and core2 another directory, I could not
access the admin interface.

Any idea?

//each core references a different dir
<!-- <dataDir>${solr.data.dir:./solr/multicore/myCore1/data}</dataDir>
-->

//both cores reference the same dir
<dataDir>${solr.data.dir:./solr/data}</dataDir>

Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
I did some more testing and encountered another problem. I have three tables:
campaign, banner and size. A campaign can have multiple banners of different
types (flash, gif, ...). And each type of banner can be of multiple sizes
(50x50, 100x100, ...). So I did the following in data-config.xml

<entity name="campaign">
</entity> 

<entity name="banner">
  <entity name="size">
  </entity> 
</entity> 

I nested the size data inside the banner data, because I want to do the
following search: which campaign has a banner of type flash with a size of
50x50.

But once I load the data in Solr, I get:

campaign
banner_array
size_array

The size is not associated with the correct banner, all the data is just
flattened. So the search I intended to do, is not possible. How can I change
my thinking, so that I'm able to search for which campaign has a banner of
type flash with a size of 50x50? Thanks.

Kind regards,

Nick
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20136424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
On Thu, Oct 23, 2008 at 10:01 PM, Nick80 <ni...@gmail.com> wrote:
>
> It was actually very easy. I followed the tutorial at
> http://wiki.apache.org/solr/DataImportHandler . The only thing I forgot was
> that I had to define the fields that I have in data-config.xml also in
> solrconfig.xml. Another issue I'm having with the wiki article is that it
> doesn't mention where you need to store a custom transformer file. I'm no
> Java wiz and I have no clue.
Compile your Transformer, jar it and put it in the $SOLR_HOME}/lib or
in the solr webapp's WEB-INF/lib directory
>
> I have also a lot of multiValued fields, but I'm not sure if I can write a
> custom transformer that can turn a string into a multivalued field. I have a
> string that I want to split and store in an array. All the values of this
> array need to be stored in a multivalued field. Is this possible?
There is a RegexTransformer which can cut split a string into
multivalued fields. use the 'splitby' attribute
>
> I have now expanded my data-config file with more records and I will be
> doing some more extensive testing in the near future. But I'm happy so far.
>
> Kind regards,
>
> Nick
>
>
> Matthew Runo wrote:
>>
>> So you were able to get things working? What was your experience with
>> the DataImportHandler like?
>>
>> Thanks for your time!
>>
>> Matthew Runo
>> Software Engineer, Zappos.com
>> mruno@zappos.com - 702-943-7833
>>
>> On Oct 23, 2008, at 6:50 AM, Nick80 wrote:
>>
>>>
>>> Never mind. I needed to specify in schema.xml that the field is
>>> multiValued.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20131412.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20134681.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
It was actually very easy. I followed the tutorial at
http://wiki.apache.org/solr/DataImportHandler . The only thing I forgot was
that I had to define the fields that I have in data-config.xml also in
solrconfig.xml. Another issue I'm having with the wiki article is that it
doesn't mention where you need to store a custom transformer file. I'm no
Java wiz and I have no clue. 

I have also a lot of multiValued fields, but I'm not sure if I can write a
custom transformer that can turn a string into a multivalued field. I have a
string that I want to split and store in an array. All the values of this
array need to be stored in a multivalued field. Is this possible?

I have now expanded my data-config file with more records and I will be
doing some more extensive testing in the near future. But I'm happy so far.

Kind regards,

Nick


Matthew Runo wrote:
> 
> So you were able to get things working? What was your experience with  
> the DataImportHandler like?
> 
> Thanks for your time!
> 
> Matthew Runo
> Software Engineer, Zappos.com
> mruno@zappos.com - 702-943-7833
> 
> On Oct 23, 2008, at 6:50 AM, Nick80 wrote:
> 
>>
>> Never mind. I needed to specify in schema.xml that the field is  
>> multiValued.
>> -- 
>> View this message in context:
>> http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20131412.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20134681.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Matthew Runo <mr...@zappos.com>.
So you were able to get things working? What was your experience with  
the DataImportHandler like?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mruno@zappos.com - 702-943-7833

On Oct 23, 2008, at 6:50 AM, Nick80 wrote:

>
> Never mind. I needed to specify in schema.xml that the field is  
> multiValued.
> -- 
> View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20131412.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
Never mind. I needed to specify in schema.xml that the field is multiValued.
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20131412.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Nick80 <ni...@gmail.com>.
Hi Matthew,

thanks for the reply, but I did some testing and it isn't working like a
normal index (or maybe I'm doing something wrong). For testing purposes I
have two tables, a person table and a hobby table. A person can have many
hobbies. I have set up the dataimporthandler and imported the data from the
database in solr. But now when I go to the admin console and search for id:1
. I only get the person and one of his hobbies. I don't get to see all of
his hobbies that I just indexed. What am I doing wrong? Thanks.

Kind regards,

Nick
-- 
View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20129895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search a DataImportHandler solr index

Posted by Matthew Runo <mr...@zappos.com>.
DataImportHandler is only a way to get data into your index, from a  
relational database of some sort. It won't affect your Solr reads in  
any way - so everything that Solr normally does will still work the  
same.

(I have not had a chance to look at it in depth, but searching the  
index would be the same as it 'normally' is).

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mruno@zappos.com - 702-943-7833

On Oct 22, 2008, at 3:07 PM, Nick80 wrote:

>
> Hi,
>
> I'm using a couple of Solr 1.1 powered indexes and have relied on my  
> "old"
> Solr installation for more than two years now. I'm working on a new  
> project
> that is a bit complexer than my previous ones and I thought I had a  
> look at
> all the new goodies in Solr. One item that caught my attention is the
> DataImportHandler.
>
> According to the documentation I read, it allows you among other  
> things to
> very easily index one-to-many and many-to-many relationships. Right?  
> What I
> cann't find is, how do you search the index? Is it still possible to  
> do
> faceting on all the fields? Or isn't that possible? Any information on
> searching a fairly complex index build by DataImportHandler is very  
> welcome.
> Thanks.
>
> Kind regards,
>
> Nick
> -- 
> View this message in context: http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20120698.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>