You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vincent Pérès <vi...@gmail.com> on 2009/04/13 15:13:08 UTC

DataImportHandler with multiple values

Hello,

I'm trying to import a simple book table with the full-import command. The
datas are stored in mysql.
It worked well when I tried to import few fields from the 'book' table :
title, author, publisher etc.
Now I would like to create a facet (multi valued field) with the categories
which belong to the book.

There is my sql request to get the list of categories from a book
(009959241X for example, return 7 categories) :
SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT
JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
ab.isbn = '009959241X'

I tried to integrate it on my dataconfig :
<dataConfig>
  <dataSource
    type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:33061/completelynovel" user="root" password=""
/>
  <document name="books">
    <entity name="book" pk="ID" query="select isbn, listing_id AS id, title,
publisher_name, author_name AS author_name_s from amazon_books where
publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10">
      <field column="ID" name="id" />
      <field column="ISBN" name="isbn" />
      <field column="TITLE" name="title" />
      <field column="PUBLISHER_NAME" name="publisher_name" />
      <field column="AUTHOR_NAME_S" name="author_name_s" />
      <entity name="book_category" pk="id" query="SELECT abn.name AS cat,
ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN
amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
ab.isbn = '${book.ISBN}'">
        <field column="cat" name="cat" />
      </entity>
    </entity>    
  </document>
</dataConfig>

And my solr schema :
<field name="id" type="sint" indexed="true" stored="true" required="true" />
 <field name="isbn" type="string" indexed="true" stored="true" />
   <field name="title" type="string" indexed="true" stored="true" />
   <field name="publisher_name" type="string" indexed="true" stored="true"/>
   <field name="cat" type="text_ws" indexed="true" stored="true"
multiValued="true" omitNorms="true" termVectors="true" />
And below standart solr 1.4 dynamics fields...


Ten fields are well created... but without the 'cat' multi value field.
<doc>
<arr name="author_name_s">
<str>Terry Pratchett</str>
</arr>
<int name="id">47</int>
<str name="isbn">0552124753</str>
<str name="publisher_name">Corgi Books</str>
<date name="timestamp">2009-04-13T12:54:38.553Z</date>
<str name="title">The Colour of Magic (Discworld Novel)</str>
</doc>

I guess I missed something, could you help me or redirect me to the right
doc?

Thank you !
Vincent

-- 
View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler with multiple values

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
it is likely that your query did not return any data. just run the
query separately and see if it reallly works.

Or try it out in debug mode. it will tell you which query was run and
what got returned.

--Noble

2009/4/13 Vincent Pérès <vi...@gmail.com>:
>
> Hello,
>
> I'm trying to import a simple book table with the full-import command. The
> datas are stored in mysql.
> It worked well when I tried to import few fields from the 'book' table :
> title, author, publisher etc.
> Now I would like to create a facet (multi valued field) with the categories
> which belong to the book.
>
> There is my sql request to get the list of categories from a book
> (009959241X for example, return 7 categories) :
> SELECT abn.name AS cat, ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT
> JOIN amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
> JOIN amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
> ab.isbn = '009959241X'
>
> I tried to integrate it on my dataconfig :
> <dataConfig>
>  <dataSource
>    type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost:33061/completelynovel" user="root" password=""
> />
>  <document name="books">
>    <entity name="book" pk="ID" query="select isbn, listing_id AS id, title,
> publisher_name, author_name AS author_name_s from amazon_books where
> publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10">
>      <field column="ID" name="id" />
>      <field column="ISBN" name="isbn" />
>      <field column="TITLE" name="title" />
>      <field column="PUBLISHER_NAME" name="publisher_name" />
>      <field column="AUTHOR_NAME_S" name="author_name_s" />
>      <entity name="book_category" pk="id" query="SELECT abn.name AS cat,
> ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
> amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT JOIN
> amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
> ab.isbn = '${book.ISBN}'">
>        <field column="cat" name="cat" />
>      </entity>
>    </entity>
>  </document>
> </dataConfig>
>
> And my solr schema :
> <field name="id" type="sint" indexed="true" stored="true" required="true" />
>  <field name="isbn" type="string" indexed="true" stored="true" />
>   <field name="title" type="string" indexed="true" stored="true" />
>   <field name="publisher_name" type="string" indexed="true" stored="true"/>
>   <field name="cat" type="text_ws" indexed="true" stored="true"
> multiValued="true" omitNorms="true" termVectors="true" />
> And below standart solr 1.4 dynamics fields...
>
>
> Ten fields are well created... but without the 'cat' multi value field.
> <doc>
> <arr name="author_name_s">
> <str>Terry Pratchett</str>
> </arr>
> <int name="id">47</int>
> <str name="isbn">0552124753</str>
> <str name="publisher_name">Corgi Books</str>
> <date name="timestamp">2009-04-13T12:54:38.553Z</date>
> <str name="title">The Colour of Magic (Discworld Novel)</str>
> </doc>
>
> I guess I missed something, could you help me or redirect me to the right
> doc?
>
> Thank you !
> Vincent
>
> --
> View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23022195.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: DataImportHandler with multiple values

Posted by Vincent Pérès <vi...@gmail.com>.
I changed the ISBN to lowercase (and the other fields as well) and it works !

Thanks very much !
-- 
View this message in context: http://www.nabble.com/DataImportHandler-with-multiple-values-tp23022195p23023374.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler with multiple values

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
2009/4/13 Vincent Pérès <vi...@gmail.com>

>
> <dataConfig>
>  <dataSource
>    type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost:33061/completelynovel" user="root" password=""
> />
>  <document name="books">
>    <entity name="book" pk="ID" query="select isbn, listing_id AS id, title,
> publisher_name, author_name AS author_name_s from amazon_books where
> publisher_name IS NOT NULL AND author_name IS NOT NULL LIMIT 0, 10">
>      <field column="ID" name="id" />
>      <field column="ISBN" name="isbn" />
>      <field column="TITLE" name="title" />
>      <field column="PUBLISHER_NAME" name="publisher_name" />
>      <field column="AUTHOR_NAME_S" name="author_name_s" />
>      <entity name="book_category" pk="id" query="SELECT abn.name AS cat,
> ab.isbn AS isbn_temp FROM (amazon_books AS ab LEFT JOIN
> amazon_book_browse_nodes AS abbn ON ab.isbn = abbn.amazon_book_id) LEFT
> JOIN
> amazon_browse_nodes AS abn ON abbn.amazon_browse_node_id = abn.id WHERE
> ab.isbn = '${book.ISBN}'">
>        <field column="cat" name="cat" />
>      </entity>
>    </entity>
>  </document>
> </dataConfig>
>
>
> Ten fields are well created... but without the 'cat' multi value field.


Just a guess, try ${book.isbn} instead

Does you sql return the column names in capitals? If you are using trunk,
you do not need to specify upper-case to lower-case mapping in data-config.
Infact the "<field>" mapping is not required at all if your schema has a
field with the same name as returned by SQL. DataImportHandler will populate
it with the value, irrespective of case.

Also, if you intend to facet on 'cat', you should probably use a
non-tokenized field type in the schema such as string. Faceting is performed
on the indexed value rather than the stored value.

-- 
Regards,
Shalin Shekhar Mangar.